| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MT-bench | +RL (Skywork-Reward-V2-Llama-3.1-8B) | MT-Bench Score81.1 | 68 | 22d ago | |
| MT-bench | Eagle3+LTD | Speedup3.82 | 25 | 3mo ago | |
| IMDb | COALA | MT-Bench Score5.6 | 20 | 9d ago | |
| EduFeedback alternate | MT-Bench Score8.3 | 20 | 9d ago | ||
| UltraFeedback | COALA | MT-Bench Score6.1 | 20 | 9d ago | |
| Lost-in-Conversation (test) | Full† | Actions Score95 | 9 | 7d ago | |
| MT-Bench 1.0 (test) | DAR | GPT-4 Score8.538 | 5 | 3mo ago |