| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Referring Expression Comprehension | RefCOCO+ (val) | Accuracy90.4 | 345 | |
| Referring Expression Comprehension | RefCOCO+ (testA) | Accuracy91.81 | 207 | |
| Referring Image Segmentation | RefCOCO+ (test B) | mIoU78.59 | 200 | |
| Referring Expression Segmentation | RefCOCO+ (testA) | cIoU83.5 | 190 | |
| Referring Expression Segmentation | RefCOCO+ (testB) | cIoU76.6 | 188 | |
| Referring Expression Comprehension | RefCOCO+ (test-A) | Accuracy94.7 | 172 | |
| Visual Grounding | RefCOCO+ (val) | Accuracy91.4 | 171 | |
| Visual Grounding | RefCOCO+ (testA) | Accuracy94.7 | 168 | |
| Referring Expression Comprehension | RefCOCO+ (test-B) | Accuracy85.6 | 167 | |
| Referring Image Segmentation | RefCOCO+ (test A) | oIoU78.7 | 89 | |
| Referring Image Segmentation | RefCOCO+ (testA) | mIoU2,982 | 45 | |
| Referring Segmentation | refCOCO+ (val) | cIoU80.1 | 44 | |
| Localization | RefCOCO+ (val) | Accuracy85.05 | 32 | |
| Referring Segmentation | refCOCO+ (testA) | cIoU0.842 | 30 | |
| Localization | RefCOCO+ (testB) | Accuracy78.77 | 26 | |
| Localization | RefCOCO+ (testA) | Accuracy91.56 | 26 | |
| Referring Expression Grounding | RefCOCO+ (testB) | Accuracy83.5 | 23 | |
| Referring Expression Grounding | RefCOCO+ (testA) | Accuracy92.8 | 23 | |
| Referring Expression Segmentation | RefCOCO+ UNC (val) | cIoU70.3 | 18 | |
| Referring Expression Comprehension | RefCOCO+ 80 (val) | Accuracy87.43 | 17 | |
| Referring Expression Grounding | RefCOCO+ (val) | Acc@0.584.7 | 14 | |
| Referring Image Segmentation | RefCOCO+ medium (val) | oIoU68.1 | 14 | |
| Referring Expression Comprehension | RefCOCO+ UNC (test-A) | Prec@0.5 IoU84.45 | 14 | |
| Referring Expression Comprehension | RefCOCO+ UNC (val) | Precision@0.5 IoU79.74 | 14 | |
| Localization | RefCOCO+ | Accuracy67 | 13 |