For Abilities Involving Visual Grounding:
- Grounding: CLICK Send to generate a grounded image description.
- Refer: Input a referring object and CLICK Send.
- Detection: Write a caption or phrase, and CLICK Send.
- Identify: Draw the bounding box on the uploaded image window and CLICK Send to generate the bounding box. (CLICK "clear" button before re-drawing next time).
- VQA: Input a visual question and CLICK Send.
- No Tag: Input whatever you want and CLICK Send without any tagging
You can also simply chat in free form!