| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Tabular Data Synthesis | Small Benchmark | Shape12.407 | 13 | |
| Embodied vision-language reasoning | Original benchmark B | Score61.26 | 13 | |
| Camera Pose Estimation | Zero-shot cross-domain benchmark (test) | Mean5.94 | 12 | |
| Video Customization | 70-example benchmark 1.0 (test) | FaceSim Arc0.59 | 9 | |
| Class-conditional video generation | Benchmark 17x256x256 resolution (test) | gFVD210.9 | 9 | |
| Survival Prediction | 33-task benchmark Survival prediction | C-index58 | 8 | |
| Theorem Proving | Small-scale benchmark Overall | VR33 | 8 | |
| Text-driven Style Transfer | Benchmark of 52 prompts and 20 style images 1.0 (test) | Text Alignment0.235 | 8 | |
| Intent Classification | Benchmark 03 | In-Scope Accuracy84 | 8 | |
| Educational Video Generation | 200-task benchmark | Success Rate599 | 6 | |
| Commonsense Triple Validation | Benchmark ¬ATOMIC | Valid Precision83 | 6 | |
| Image Classification | 8-task benchmark | ID Score94.8 | 6 | |
| Robot parameter extraction and forward kinematics calculation | Benchmark 1 (test) | M_C (Completeness/Score)97 | 6 | |
| 3D face reconstruction | benchmark High-Quality (HQ) 1.0 | Median Error (mm)1.58 | 6 | |
| ODE Discovery | Benchmark 2 | Model Complexity17.2 | 5 | |
| Classification | Benchmark (BM) 10 clients, pathological non-IID | AUC-ROC91 | 5 | |
| Electric Vehicle Routing Problem (ECVRP) | benchmark Small Instances | Objective Value263.33 | 5 | |
| Speculative Decoding | Benchmark Second Turn | Block Efficiency2.32 | 5 | |
| Speculative Decoding | Benchmark First Turn | Block Efficiency2.32 | 5 | |
| Continuous-time policy evaluation | benchmark Medium scale | Mean Integrated RMSE0.024 | 4 | |
| Continuous-time policy evaluation | Benchmark Small scale | Mean Integrated RMSE0.014 | 4 | |
| Object Detection | 100-image benchmark Brighten | AFFC1 | 4 | |
| Object Detection | 100-image benchmark Snow | AFFC0.365 | 4 | |
| Object Detection | 100-image benchmark Rain | AFFC62.1 | 4 | |
| Object Detection | 100-image benchmark Fog | AFFC0.7 | 4 |