| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Procedural Multimedia Reasoning | PMR (test) | Accuracy84.7 | 15 | |
| Pairwise classification | PMR-Synth (Total) | Accuracy73 | 14 | |
| Pairwise classification | PMR-Synth Med | Accuracy77 | 14 | |
| Pairwise classification | PMR-Reddit (Med) | Accuracy86 | 14 | |
| Pairwise classification | PMR-Reddit Easy | Accuracy98 | 14 | |
| Inbox Sorting | PMR-Real (test) | T-NDCG@1077 | 14 | |
| Pairwise classification | PMR-Real (Total) | Accuracy77 | 13 | |
| Pairwise classification | PMR-Real (Hard) | Accuracy0.72 | 13 | |
| Pairwise classification | PMR-Real Med | Accuracy82 | 13 | |
| Pairwise classification | PMR-Real (Easy) | Accuracy92 | 13 | |
| Procedural Multimedia Reasoning | PMR (val) | Accuracy85.8 | 8 |