| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| General speculative decoding performance | Mean (MT-Bench, HumanEval, GSM8K) | Average Acceptance Length (τ)6.52 | 112 | |
| General Reasoning and Coding | Mean GSM8K, HumanEval, MBPP | Speed4 | 26 | |
| Pixel-level manipulation detection | MEAN Across datasets | F1 Score72.8 | 20 | |
| Code Generation | Mean Across MBPP, CodeAlpacaPy, HumanEval, LiveCodeBench | Speedup4.04 | 14 | |
| Medical Image Classification | Mean | Accuracy73 | 13 | |
| Visual Place Recognition | Mean Across Datasets | R@180.9 | 12 | |
| Pick-and-place | Mean Across T1, T2, T3 | Mean Grasp Success Rate99 | 10 | |
| AI-generated video detection | Mean Across Frontier Commercial Generators | Accuracy87.25 | 7 | |
| Mathematical Reasoning and Code Generation | Mean (GSM8K, MATH, HumanEval, MBPP) | Accuracy52.06 | 7 | |
| Offline Reinforcement Learning | Mean Medium-Replay | Normalized Return76.45 | 7 | |
| Offline Reinforcement Learning | Mean Medium | Normalized Return71.33 | 7 | |
| Offline Reinforcement Learning | Mean Medium-Expert | Normalized Return98.5 | 7 | |
| Physically-based rendering | Mean All scenes | PSNR31.8 | 4 | |
| Mathematical Reasoning | Mean across benchmarks | Speedup2.12 | 2 |