| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Grounded Multi-Video Question Answering | WikiVideo | Ref Precision94 | 11 | |
| Article Generation | WikiVideo (test) | InfoP Score94.5 | 10 | |
| Multimodal Retrieval | WikiVideo (test) | Alpha-nDCG62.8 | 10 | |
| UMUI Judgment Calibration | WIKIVIDEO (v) | MSE0.0784 | 8 | |
| Report Generation | WikiVideo | ROUGE-L30.14 | 5 | |
| Scalar probability judgment | WikiVideo Audio-only | MSE (x100)3.4 | 5 | |
| Scalar probability judgment | WikiVideo Vision-only | MSE0.078 | 5 | |
| Binary Judgment | WikiVideo Audio-only | Accuracy71.5 | 5 | |
| Binary Judgment | WikiVideo Vision-only | Accuracy81.1 | 5 | |
| UMUI Judgment Calibration | WIKIVIDEO (A) | MSE0.0335 | 5 | |
| Extrinsic Quality Judgment | WikiVideo | INFOP (Reference) Score19.7 | 4 | |
| Citation Quality Evaluation | WikiVideo 1.0 (test) | CITEP R33 | 4 | |
| Intrinsic Claim Judgment | WikiVideo | INFOP (Reference)40 | 3 | |
| Video-grounded Information Synthesis | WikiVideo | Avg. F187.9 | 3 | |
| Scalar probability judgment | WikiVideo Audio-Visual | MSE (x100)7.9 | 3 | |
| Binary Judgment | WikiVideo Audio-Visual | Accuracy70.1 | 3 |