Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Open-ended Generation on Arena-Hard
Loading...
84.6
Score
AR-MAP
39.3392
51.0896
62.84
74.5904
Feb 2, 2026
Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Score
AR-MAP
Backbone=Dream-7B-Inst...
2026.02
84.6
VRPO
Backbone=Dream-7B-Inst...
2026.02
81.8
AR-MAP
Backbone=SDAR-8B-Instr...
2026.02
73.55
SimPO
Backbone=Dream-7B-Inst...
2026.02
72.2
DPO
Backbone=Dream-7B-Inst...
2026.02
71.23
VRPO
Backbone=SDAR-8B-Instr...
2026.02
68.14
DPO
Backbone=Qwen2.5-7B, A...
2026.02
67.8
DPO
Backbone=SDAR-8B-Instr...
2026.02
61.12
SimPO
Backbone=SDAR-8B-Instr...
2026.02
61.08
DPO
Backbone=Qwen3-8B-Base...
2026.02
60.4
Qwen3-8B-Base
Backbone=Qwen3-8B-Base...
2026.02
57.4
Dream-7B-Instruct
Backbone=Dream-7B-Inst...
2026.02
55.04
Qwen2.5-7B
Backbone=Qwen2.5-7B, A...
2026.02
50.2
SDAR-8B-Instruct
Backbone=SDAR-8B-Instr...
2026.02
41.08
Feedback
Search any
task
Search any
task