| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | QA Zero-shot Average | QA Zero-shot Average73.45 | 57 | |
| Question Answering | QA | Speedup Factor3.66 | 47 | |
| Question Answering | QA | Average Rank1.73 | 40 | |
| Question Answering | QA OOD StrQA SciQA | StrQA Accuracy98.3 | 28 | |
| Question Answering | QA (OBQA, ARC-E, ARC-C, CQA) | OBQA Accuracy53.6 | 20 | |
| Legal Text Classification | QA | Accuracy85.72 | 18 | |
| Question Answering | QA ExAnte (test) | 1d Leakage Rate1.6 | 15 | |
| Question Answering | QA | Performance Score63.31 | 12 | |
| Question Answering | QA | ASR Score (Before)70 | 12 | |
| Question Answering | QA | Accuracy59.5 | 12 | |
| Question Answering | QA 8-objective | EM37.6 | 11 | |
| Steering | QA | Steering Success62.5 | 11 | |
| Text Generation | QA | Throughput (tokens/s)117.17 | 10 | |
| Question Answering | QA benchmarks | ReCoRD Score80.86 | 9 | |
| Question Answering | QA domain average | Best Accuracy85.2 | 8 | |
| Critique Quality Evaluation | QA | Win Rate75 | 6 | |
| Question Answering | QA 12 languages | Score72.9 | 5 | |
| Question Answering | QA Qwen2-7B-Instruct v1 (test) | Acceptance Length (τ)2.57 | 4 | |
| Speculative Decoding | Qa | Speedup2.23 | 3 |