| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Embodied vision-language reasoning | Original benchmark B | Score61.26 | 13 | |
| Video Customization | 70-example benchmark 1.0 (test) | FaceSim Arc0.59 | 9 | |
| Class-conditional video generation | Benchmark 17x256x256 resolution (test) | gFVD210.9 | 9 | |
| Theorem Proving | Small-scale benchmark Overall | VR33 | 8 | |
| Text-driven Style Transfer | Benchmark of 52 prompts and 20 style images 1.0 (test) | Text Alignment0.235 | 8 | |
| Intent Classification | Benchmark 03 | In-Scope Accuracy84 | 8 | |
| Image Classification | 8-task benchmark | ID Score94.8 | 6 | |
| Robot parameter extraction and forward kinematics calculation | Benchmark 1 (test) | M_C (Completeness/Score)97 | 6 | |
| 3D face reconstruction | benchmark High-Quality (HQ) 1.0 | Median Error (mm)1.58 | 6 | |
| Electric Vehicle Routing Problem (ECVRP) | benchmark Small Instances | Objective Value263.33 | 5 | |
| Speculative Decoding | Benchmark Second Turn | Block Efficiency2.32 | 5 | |
| Speculative Decoding | Benchmark First Turn | Block Efficiency2.32 | 5 | |
| Object Detection | 100-image benchmark Brighten | AFFC1 | 4 | |
| Object Detection | 100-image benchmark Snow | AFFC0.365 | 4 | |
| Object Detection | 100-image benchmark Rain | AFFC62.1 | 4 | |
| Object Detection | 100-image benchmark Fog | AFFC0.7 | 4 | |
| Large Model Performance Prediction | Benchmark Chinese pattern shift | RMSE16.94 | 3 | |
| Large Model Performance Prediction | Benchmark OCR pattern shift | RMSE25.18 | 3 | |
| Visual Forward Kinematics | Benchmark 10 visual problem instances 2 1.0 (test) | Consistency Score93 | 2 | |
| Entity Linking | Benchmark Skills | Top-1 Accuracy39.69 | 2 | |
| Protein-Ligand Binding Affinity Prediction | benchmark1k2101 (test) | Correlation (R)0.883 | 1 |