| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| IFEval | IFEval Accuracy95 | 625 | 2d ago | ||
| AlpacaEval 2.0 | Attention-MoA | Win Rate95.87 | 507 | 2d ago | |
| AlpacaEval | BFPO | Win Rate97.2 | 227 | 3d ago | |
| InstructBench | SYTTA-8 | Dolly (BLEU)75.27 | 224 | 23d ago | |
| MT-Bench | GPT-4-1106-preview | MT-Bench Score9.32 | 215 | 10d ago | |
| UnNI | MINILLM | Rouge-L40.2 | 160 | 25d ago | |
| S-NI | Adversarial Moment-Matching Distillation | Rouge-L38.7 | 119 | 25d ago | |
| DollyEval | Rouge-L32.5 | 114 | 1mo ago | ||
| Alpaca | EAGLE3 + DM | Speedup (x)5.27 | 111 | 2d ago | |
| Arena Hard | Win Rate98.11 | 103 | 2d ago | ||
| AdvancedIF | BRAID | Accuracy71 | 102 | 12d ago | |
| Natural Instructions (test) | CoLoRA | Rouge-L97.9 | 90 | 1mo ago | |
| Vicuna | Rouge-L20.93 | 83 | 12d ago | ||
| ALFWorld | M2CL | Accuracy89.3 | 82 | 1mo ago | |
| AlpacaEval 2.0 (test) | Offline+Humanline (G2-9B Completions) | LC Win Rate (%)67.45 | 81 | 19d ago | |
| DomainBench | SYTTA | Agriculture Score21.85 | 80 | 23d ago | |
| VicunaEval | IOA | VicunaEval Score40.75 | 80 | 1mo ago | |
| SelfInst | Adversarial Moment-Matching Distillation | Rouge-L21.7 | 73 | 12d ago | |
| IFBench | Nemotron-Cascade-2 30B-A3B | Pass@1 (Strict)82.9 | 72 | 29d ago | |
| VicunaEval | Goal Prioritization | Rouge-L35 | 72 | 1mo ago | |
| MT-Bench zh | Qwen2.5-14B-SFT-TaP | Score6.83 | 60 | 1mo ago | |
| AlignBench | Qwen2.5-14B-SFT-TaP | Reasoning Score7.42 | 60 | 12d ago | |
| IF-Eval 0-shot | UltraMix-190k | Score81.13 | 55 | 9d ago | |
| ReasonIF synthesized v1.0 | FLEx | IFS96.3 | 55 | 1mo ago | |
| IFEval (test) | SPOT | IFEval Score84.8 | 55 | 4d ago |