| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Active Speaker Detection | AVA-ActiveSpeaker (val) | mAP98.8 | 107 | |
| Action Detection | AVA v2.2 (val) | mAP46.2 | 99 | |
| Image Aesthetic Assessment | AVA | SRCC0.899 | 53 | |
| Action Detection | AVA v2.1 (val) | mAP32 | 48 | |
| Action Detection | AVA v2.2 | mAP43.3 | 42 | |
| Aesthetic Assessment | AVA (test) | SRCC0.822 | 39 | |
| Photo Aesthetics Classification | AVA (test) | Accuracy79.08 | 33 | |
| Action Localization | AVA 2.2 | mAP (center)34.4 | 25 | |
| Spatiotemporal Action Localization | AVA 2.2 | mAP41.7 | 21 | |
| Action Detection | AVA v2.2 (test) | mAP39.8 | 20 | |
| Aesthetic Quality Assessment | AVA v1 (test) | Kendall's Tau0.883 | 18 | |
| Action Recognition | AVA 2.2 | mAP45.1 | 16 | |
| Dominant parallel line detection | AVA landscape (test) | AUC_A0.5631 | 15 | |
| Action Recognition | AVA v2.1 (val) | mAP27.8 | 14 | |
| 3D head avatar reconstruction | Ava 256 | PSNR22.5 | 13 | |
| Active Speaker Detection | AVA-ActiveSpeaker v1.0 (test) | mAP94.5 | 13 | |
| Spatio-temporal Action Localization | AVA v2.1 (val) | mAP28.4 | 13 | |
| Active Speaker Detection | AVA-ActiveSpeaker | mAP95.2 | 11 | |
| Action Detection | AVA v2.1 (train/val) | mAP27.4 | 11 | |
| Action Detection | AVA | Frame mAP42.6 | 11 | |
| Spatio-temporal Action Detection | AVA v2.2 (val) | mAP32.3 | 10 | |
| Video Compression | AVA Actions | PSNR30.6 | 9 | |
| Spatio-temporal Action Detection | AVA V2.2 | Frame mAP33.6 | 9 | |
| Video Frame Classification | AVA temporal annotation-only | mAP14.9 | 8 | |
| Action Localization | AVA (split 1) | mAP23.4 | 7 |