Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-turn Instruction Following on MT-Bench High-Variance (Top 20%)

7.54Reward Score

cDPO (Best-of-K)

4.55525.33016.1056.8799Mar 9, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
7.548.39-9.24312
2026.03
7.518.11-8.71323
2026.03
6.587.98-9.38337
2026.03
6.417.16-7.91316
2026.03
5.969.71-13.46374
2026.03
5.799.91-14.03382
2026.03
5.795.34-4.89304
2026.03
5.758.43-11.11316
2026.03
5.565.03-4.5301
2026.03
5.525.67-5.82317
2026.03
5.497-8.51382
2026.03
5.457.9-10.35382
2026.03
5.437.9-10.37382
2026.03
5.425.17-4.92316
2026.03
5.417.42-9.43382
2026.03
5.327.33-9.34382
2026.03
5.287.3-9.32372
2026.03
5.27.48-9.76384
2026.03
5.126-7.12308
2026.03
5.034.77-4.51306
2026.03
4.995.05-5.13307
2026.03
4.95.45-6300
2026.03
4.85.39-5.98307
2026.03
4.784.85-4.92294
2026.03
4.755.25-5.25306
2026.03
4.675.12-5.57307