| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| AlpacaEval | Winrate98 | 25 | 1mo ago | ||
| VicunaEval | Winrate96.3 | 21 | 1mo ago | ||
| Overall | UltraMix-190k | Overall Score62.05 | 19 | 1mo ago | |
| General Evaluation Suite | Qwen3 8B | Accuracy73.8 | 17 | 1mo ago | |
| Aggregate Across Math, Code, Chat | DFlash | Speedup4.91 | 12 | 1mo ago | |
| Aggregated LLM Evaluation Suite | BTX | Average Score47.9 | 10 | 1mo ago | |
| Performance Bench Reasoning & Knowledge | DeepSeek-R1-Distill-Qwen-14B (Reasoning) | Average Score78.37 | 9 | 1mo ago | |
| Aggregated MMLU, HellaSwag, TruthfulQA, GSM8K, MATH, MBPP, HumanEval | Sens-Merging (DARE) | Average Score40.35 | 9 | 1mo ago |