| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Referring Expression Grounding | Ferret-Bench | Accuracy (avg)77.1 | 18 | |
| Multimodal Referring and Grounding | Ferret-Bench v1.0 (test) | Referring Description79.9 | 10 | |
| Spatial Understanding | Ferret Bench (test) | Referring Description Accuracy77.5 | 7 | |
| Referring Reasoning | Ferret-Bench (val) | Accuracy67.8 | 4 | |
| Referring Description | Ferret-Bench (val) | Accuracy72.2 | 4 |