| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multi-label recognition | VG-200 | Avg OF147.9 | 66 | |
| Scene Graph Classification | VG150 (test) | mR@5028.01 | 66 | |
| Scene Graph Detection | VG150 (test) | ng-mR@5019.8 | 41 | |
| Scene Graph Detection | VG150 | R@5033.1 | 31 | |
| Predicate Classification | VG 50 (test) | Mean Recall@5037 | 29 | |
| Scene Graph Detection | VG 50 (test) | mR@5022 | 27 | |
| Scene Graph Classification | VG 50 (test) | R@5039.3 | 25 | |
| Multi-label image recognition | VG-200 | Average mAP49.4 | 24 | |
| Predicate Classification | VG-1800 (test) | Accuracy90.07 | 21 | |
| Predicate Classification | VG150 (test) | ng-mR@5045.8 | 18 | |
| Scene Graph Generation | VG (test) | mR@5017.07 | 17 | |
| Predicate Classification | VG (test) | mR@10063.02 | 17 | |
| Scene Graph Generation | VG150 | mR@5011.7 | 17 | |
| Dense Captioning | VG V1.0 | mAP0.424 | 16 | |
| Multi-label image recognition with partial labels | VG-200 | mAP @ IoU=0.1040.6 | 15 | |
| Dense Captioning | VG V1.2 | mAP42.8 | 13 | |
| Relation Prediction | VG8K-LT v1.4 (test) | Accuracy (many)37.3 | 13 | |
| Relation Prediction | VG200 | R@5069.7 | 13 | |
| Multi-label Recognition | VG-200 (test) | Avg. OF145.1 | 13 | |
| Scene Graph Detection | VG | R@5030.26 | 12 | |
| Predicate Classification | VG | Recall@5065.46 | 12 | |
| Scene Graph Classification | VG150 | mR@5026.8 | 12 | |
| Multi-label Image Classification | VG500 | mAP40.5 | 11 | |
| Scene Graph Classification | VG | R@5044.15 | 10 | |
| WWbL | VG (test) | Point Accuracy62.31 | 10 |