Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Chat on ArenaHard v1.0
Loading...
82.75
Win Rate
Direct-Likert
28.722
42.7485
56.775
70.8015
Mar 4, 2026
Win Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Win Rate
Direct-Likert
Model Backbone=Qwen-3-8B
2026.03
82.75
Critique-GRPO
Model Backbone=Qwen-3-8B
2026.03
81.95
Rubric-as-Reward
Model Backbone=Qwen-3-8B
2026.03
81.9
Pairwise-GRPO
Model Backbone=Qwen-3-8B
2026.03
81.2
GOLF
Model Backbone=Qwen-3-8B
2026.03
80.9
Qwen-3-8B
Model Backbone=Qwen-3-8B
2026.03
70.7
GOLF
Model Backbone=Llama-3...
2026.03
52.4
Rubric-as-Reward
Model Backbone=Llama-3...
2026.03
52.1
Direct-Likert
Model Backbone=Llama-3...
2026.03
51.55
Critique-GRPO
Model Backbone=Llama-3...
2026.03
50.15
Pairwise-GRPO
Model Backbone=Llama-3...
2026.03
49.2
Llama-3.1-8B-Instruct
Model Backbone=Llama-3...
2026.03
30.8
Feedback
Search any
task
Search any
task