| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| IFEval | IFEval Accuracy95 | 836 | 17h ago | ||
| AlpacaEval 2.0 | Attention-MoA | Win Rate95.87 | 722 | 19h ago | |
| AlpacaEval | Win Rate98.4 | 420 | 19h ago | ||
| MT-Bench | GPT-4-1106-preview | MT-Bench Score9.32 | 287 | 4d ago | |
| Arena Hard | Win Rate98.11 | 263 | 19h ago | ||
| InstructBench | SYTTA-8 | Dolly (BLEU)75.27 | 224 | 2mo ago | |
| UnNI | MINILLM | Rouge-L40.2 | 178 | 12d ago | |
| Alpaca | EAGLE3 + DM | Speedup (x)5.27 | 173 | 19h ago | |
| AlpacaEval 2 | ξ-DPO | LC (%)75.4 | 137 | 7d ago | |
| S-NI | Adversarial Moment-Matching Distillation | Rouge-L38.7 | 119 | 2mo ago | |
| DollyEval | Rouge-L32.5 | 114 | 2mo ago | ||
| AdvancedIF | BRAID | Accuracy71 | 102 | 1mo ago | |
| Vicuna | Rouge-L20.93 | 101 | 12d ago | ||
| AlpacaEval 2.0 (test) | Offline+Humanline (G2-9B Completions) | LC Win Rate (%)67.45 | 95 | 14d ago | |
| Natural Instructions (test) | CoLoRA | Rouge-L97.9 | 90 | 3mo ago | |
| IFEval | Self Consistency (Best on Validation) | Accuracy (IFEval)90.39 | 89 | 2d ago | |
| IFEval (test) | SPOT | IFEval Score84.8 | 88 | 22d ago | |
| IFEval | IFEval Score94.64 | 87 | 19h ago | ||
| FollowBench | ImpRIF-32B | HSR79 | 85 | 4d ago | |
| ALFWorld | M2CL | Accuracy89.3 | 82 | 3mo ago | |
| DomainBench | SYTTA | Agriculture Score21.85 | 80 | 2mo ago | |
| VicunaEval | IOA | VicunaEval Score40.75 | 80 | 3mo ago | |
| SelfInst | Adversarial Moment-Matching Distillation | Rouge-L21.7 | 73 | 1mo ago | |
| IFBench | Accuracy77.8 | 72 | 1mo ago | ||
| IFBench | Nemotron-Cascade-2 30B-A3B | Pass@1 (Strict)82.9 | 72 | 2mo ago |