| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| AlpacaEval | Winrate98 | 25 | 4d ago | ||
| VicunaEval | Winrate96.3 | 21 | 4d ago | ||
| General Evaluation Suite | Qwen3 8B | Accuracy73.8 | 17 | 4d ago | |
| Aggregate Across Math, Code, Chat | DFlash | Speedup4.91 | 12 | 4d ago | |
| Aggregated LLM Evaluation Suite | BTX | Average Score47.9 | 10 | 4d ago | |
| Performance Bench Reasoning & Knowledge | DeepSeek-R1-Distill-Qwen-14B (Reasoning) | Average Score78.37 | 9 | 2d ago | |
| Aggregated MMLU, HellaSwag, TruthfulQA, GSM8K, MATH, MBPP, HumanEval | Sens-Merging (DARE) | Average Score40.35 | 9 | 4d ago |