| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Image-to-Text Retrieval | diagnostic (test-unseen) | Accuracy@5085.21 | 9 | |
| Image-to-Text Retrieval | diagnostic seen (test) | Acc@5084.97 | 9 | |
| Text-to-Image Retrieval | diagnostic (test-unseen) | Acc@5080.23 | 9 | |
| Text-to-Image Retrieval | diagnostic (test-seen) | Accuracy@5082.02 | 9 | |
| Consistency Evaluation | Diagnostic (Avg. YouCook2, COIN, CrossTask) (test) | State Accuracy76.92 | 8 | |
| Consistent Video Retrieval | Diagnostic Average of YouCook2, COIN, CrossTask | State Accuracy53.81 | 5 |