Gradio

For Abilities Involving Visual Grounding:

Grounding: CLICK Send to generate a grounded image description.
Refer: Input a referring object and CLICK Send.
Detection: Write a caption or phrase, and CLICK Send.
Identify: Draw the bounding box on the uploaded image window and CLICK Send to generate the bounding box. (CLICK "clear" button before re-drawing next time).
VQA: Input a visual question and CLICK Send.
No Tag: Input whatever you want and CLICK Send without any tagging

You can also simply chat in free form!

MiniGPT-v2 Demo