| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Region-level instance understanding | ViP-Bench Synthesized visual prompts | Recognition Accuracy58.9 | 11 | |
| Region-level instance understanding | ViP-Bench Visual prompts from human | Rec57.7 | 8 | |
| Region Understanding | ViP-Bench | Recognition Score0.402 | 4 |