| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Referring Image Segmentation | ReferIt (test) | IoU76.18 | 59 | |
| Phrase grounding | ReferIt (test) | Pointing Accuracy80.58 | 18 | |
| Visual Grounding | ReferIt | Pointing Game Accuracy73.17 | 16 | |
| Weakly Supervised Grounding | ReferIt (test) | Accuracy (Pointing Game)62.24 | 14 | |
| Phrase Grounding | ReferIt | Accuracy65.15 | 14 | |
| Phrase Localization | ReferIt (test) | Pointing Game Accuracy62.76 | 11 | |
| Referring Expression Segmentation | ReferIt (test) | Precision @ 0.5 IoU34.02 | 11 | |
| WWbL | ReferIt (test) | Point Accuracy65.95 | 10 | |
| Natural Language Object Retrieval | ReferIt 100 EdgeBox proposals (test) | R@159.38 | 7 | |
| Referring Image Segmentation | ReferIt standard (test) | oIoU73.36 | 6 | |
| Natural Language Object Retrieval | ReferIt | P@172.74 | 6 | |
| Referring Expression Segmentation | ReferIt | Inference Time (s)0.169 | 5 | |
| Referring Expression Comprehension | ReferIt (test) | Accuracy81.5 | 4 |