Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Open-ended Generation on Arena-Hard v2.0
Loading...
47.8
Score
Hybrid Reward
0.896
13.073
25.25
37.427
May 28, 2026
Score
Updated 5d ago
Evaluation Results
Method
Method
Links
Score
Hybrid Reward
Backbone=GLM-4.7-Flash
2026.05
47.8
Hybrid Reward
Backbone=Qwen3-30B-A3B
2026.05
38.5
Qwen3-30B-A3B
RL Training=Baseline
2026.05
30.6
GLM-4.7-Flash
RL Training=Baseline
2026.05
28.3
Hybrid Reward
Backbone=Qwen3-4B
2026.05
22
Qwen3-4B
RL Training=Baseline
2026.05
14.7
Hybrid Reward
Backbone=DeepSeek-R1-D...
2026.05
4.1
DeepSeek-R1-Distill-Qwen-7B
RL Training=Baseline
2026.05
2.7
Feedback
Search any
task
Search any
task