| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Machine Reading Comprehension | DREAM (test) | Accuracy91.8 | 23 | |
| Multiple Choice Question Answering | DREAM | Accuracy98.77 | 22 | |
| Machine Reading Comprehension | DREAM (dev) | Accuracy90 | 21 | |
| Dialogue-based Multiple-choice Question Answering | DREAM (test) | Accuracy91.8 | 21 | |
| Dialogue Comprehension | DREAM | Accuracy69.2 | 15 | |
| Safety evaluation against dynamic adversarial chains | DREAM (test) | Overall Defense Score67.3 | 12 | |
| Robot Pose Estimation | DREAM-real Panda ORB | AUC87.6 | 12 | |
| Robot Pose Estimation | DREAM-real Panda 3CAM-RS | AUC91.9 | 12 | |
| Robot Pose Estimation | DREAM-real Panda 3CAM-AK | AUC90.2 | 12 | |
| Fine-grained captioning | Dream1k | F1 Score29.5 | 11 | |
| Video Description | DREAM-1K Overall 1.0 (test) | F1 Score40.1 | 11 | |
| Video Description | DREAM-1K Stock 1.0 (test) | F1 Score44 | 11 | |
| Video Description | DREAM-1K Shorts 1.0 (test) | F1 Score40.9 | 11 | |
| Video Description | DREAM-1K YouTube 1.0 (test) | F1 Score34.5 | 11 | |
| Video Description | DREAM-1K Animation 1.0 (test) | F137.1 | 11 | |
| Video Captioning | Dream-1K | Precision36 | 10 | |
| Dialogue-based Multiple-choice Question Answering | DREAM (dev) | Accuracy89.9 | 10 | |
| Quality-Penalized Efficiency Evaluation | Dream Base | QPS (gamma=4)2.03 | 9 | |
| Question Answering | DREAM | Accuracy69.51 | 9 | |
| Query-based dialogue summarization | DREAM (test) | Accuracy (Multi-Choice)65.9 | 8 | |
| Robot Pose Estimation | DREAM Panda Photo | AUC82 | 5 | |
| Robot Pose Estimation | DREAM Panda DR | AUC82.9 | 5 | |
| Robot Pose Estimation | DREAM-real (All) | AUC85.962 | 5 | |
| Video Description | DREAM-1K (300 randomly sampled videos) | Tarsier Wins Rate0.717 | 4 | |
| Panda Arm Pose Estimation | DREAM Mini panda_orb_full_view | Average Error0.416 | 3 |