| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Chain-of-Thought Reasoning | Reasoning Dataset | Accuracy (Acc)86.9 | 21 | |
| Reasoning | 7 reasoning datasets | Reasoning Accuracy65.74 | 15 | |
| Natural Language Generation | Reasoning | ROUGE-174.23 | 8 | |
| System Performance Evaluation | Reasoning | Throughput194.21 | 8 | |
| Visual Reasoning | Reasoning | Average Score72.2 | 5 | |
| Clustering | Reasoning | Spearman's Rho0.76 | 5 | |
| Tokenizer compression | Reasoning | Bits per Token3.51 | 5 | |
| Zero-shot transfer attack | Reasoning | Attack Success Rate (ASR)0.6 | 4 |