| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MT Bench | Llama-3.1-Nemotron-70B-Instruct | MT-Bench Score (GPT-4)9.16 | 44 | 9d ago | |
| MT-Bench High-Variance (Top 20%) | Reward Score7.54 | 26 | 1mo ago | ||
| SysBench | AGD_LRPe | CSR74.3 | 21 | 1mo ago | |
| StructFlowBench | +GraphIF | CSR89.46 | 20 | 23d ago | |
| MT-Eval | +GraphIF | CSR93.62 | 20 | 23d ago | |
| MultiIF | Normalized Score68.93 | 5 | 24d ago |