| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MT Bench | Llama-3.1-Nemotron-70B-Instruct | MT-Bench Score (GPT-4)9.16 | 129 | 5d ago | |
| SysBench | ImpRIF-32B | CSR92.31 | 38 | 1mo ago | |
| MT-Bench High-Variance (Top 20%) | Reward Score7.54 | 26 | 2mo ago | ||
| StructFlowBench | +GraphIF | CSR89.46 | 20 | 2mo ago | |
| MT-Eval | +GraphIF | CSR93.62 | 20 | 2mo ago | |
| Multi-IF | Turn 1 Score95.02 | 18 | 12d ago | ||
| MultiIF | Normalized Score68.93 | 5 | 2mo ago |