| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Pairwise Preference Comparison | Qwen2.5-3B responses (test) | Avg Preference Score82.7 | 30 | |
| Jailbreak Defense | Qwen2.5-7B Adaptive AutoDAN-T attack | ASR30 | 6 | |
| PrefixLM Attention | Qwen2.5 72B (q=64, k=8) (1k) | PrefixLM Attention Throughput (TFLOPS)103.61 | 4 | |
| Language Modeling Inference | Qwen2.5-7B 8K context length | Decode Latency (ms/token)7.1 | 4 |