| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reach | Human study | Human-likeness Win Rate89 | 6 | |
| Steering | Human study | Human-likeness Win Rate93 | 6 | |
| Direction | Human study | Human-likeness Win Rate99 | 6 | |
| Long Jump | Human study | Human-likeness Win Rate96 | 4 | |
| Strike | Human study | Human-likeness Win Rate85 | 4 | |
| Physical Simulation Script Generation | Human Study 6 diverse text prompts | Text Fidelity80.2 | 4 | |
| 3D World Generation | Human Study | Zoom-in Accuracy83.2 | 4 | |
| Video Description | Human Study GEST prompts | Similarity Score56.64 | 3 | |
| Digital Human Animation | Human Study 100 videos as of October 31 (test) | Overall Naturalness4.17 | 3 | |
| 3D Object Arrangement | Human Study Evaluation Set | Win % (Instruction Following)62.1 | 3 | |
| Backdoor Stealthiness Detection | Human Study (NLP Group) 100 clean code snippets, 25 injected | Precision96 | 3 | |
| Backdoor Stealthiness Detection | Human Study CV Group (100 clean code snippets, 25 injected) | Precision82 | 3 | |
| Text-to-Video Generation | Human Study 4K Video Generation (test) | Video Quality71.25 | 2 | |
| Image-to-Image Generation | Human Study | Layout Score0.98 | 2 | |
| Track-Conditioned Video Generation | Human Study Evaluation Set (val) | Metric- | 0 |