| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Pixel-level manipulation detection | MEAN Across datasets | F1 Score72.8 | 20 | |
| Code Generation | Mean Across MBPP, CodeAlpacaPy, HumanEval, LiveCodeBench | Speedup4.04 | 14 | |
| Offline Reinforcement Learning | Mean Medium-Replay | Normalized Return76.45 | 7 | |
| Offline Reinforcement Learning | Mean Medium | Normalized Return71.33 | 7 | |
| Offline Reinforcement Learning | Mean Medium-Expert | Normalized Return98.5 | 7 | |
| Physically-based rendering | Mean All scenes | PSNR31.8 | 4 | |
| Mathematical Reasoning | Mean across benchmarks | Speedup2.12 | 2 |