Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM Alignment Evaluation on Arena-Hard v0.1

50Win Rate

BASE (GPT-4-0314)

25.35231.75138.1544.549May 17, 2026May 18, 2026May 20, 2026May 22, 2026May 23, 2026May 25, 2026May 27, 2026
Updated 5d ago

Evaluation Results

MethodLinks
2026.05
50---
2026.05
50---
2026.05
46.8---
2026.05
45.5---
2026.05
44.7---
2026.05
44.6---
2026.05
43.7---
2026.05
43.6---
2026.05
42.944.641.1525
2026.05
42.1---
2026.05
42---
2026.05
41.543.139.7530
2026.05
41.4---
2026.05
41.2---
2026.05
41.2---
2026.05
40.9---
2026.05
40.9---
2026.05
39.7---
2026.05
39.3---
2026.05
37.8---
2026.05
36.7---
2026.05
35.2---
2026.05
34.4---
2026.05
34.4---
2026.05
34.1---
2026.05
32.9---
2026.05
31.6---
2026.05
31.5---
2026.05
29.9---
2026.05
29.9---
2026.05
26.328.124.7589