| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reasoning | LiveBench Reasoning | Accuracy92 | 80 | |
| Reasoning | LiveBench | Accuracy22.3 | 25 | |
| Code Generation | LiveBench | Avg@842.9 | 22 | |
| General Reasoning | LiveBench | Accuracy53.47 | 20 | |
| Coding | LiveBench | Accuracy40.23 | 15 | |
| Single-event Scene Revisit (Different Pose) | LiveBench | DINO Feature Similarity (FG)0.691 | 8 | |
| Single-event Scene Revisit (Same Pose) | LiveBench | PSNR (Background)20.132 | 8 | |
| General Tasks | LiveBench 2024-11-25 | Accuracy75.9 | 5 | |
| Mathematical Reasoning | LiveBench Math (test) | Score51.95 | 5 | |
| Examination | LiveBench 2024-11-25 | Score70.79 | 5 | |
| General Tasks | LiveBench 0831 | Accuracy0.57 | 5 |