| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| AlpacaEval 2 | SDPO | LC Win Rate51.9 | 86 | 4d ago | |
| Arena-Hard | AAO | Win Rate42.7 | 73 | 4d ago | |
| AlpacaEval 2.0 (test) | OTPO | LC Win Rate30.35 | 51 | 1mo ago | |
| Qwen2.5-14B-Instruct High-Variance (Top 20%) | Base (Best-of-K) | Average Reward (μ)5.67 | 6 | 1mo ago | |
| Qwen2.5-14B-Instruct Overall | Base (Best-of-K) | Reward (Avg μ)6.31 | 6 | 1mo ago |