Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Alignment on ArenaHard (pass@1)
Loading...
95.7
pass@1
Gemini-2.5 Flash-Thinking
73.756
79.453
85.15
90.847
Dec 15, 2025
pass@1
Updated 4d ago
Evaluation Results
Method
Method
Links
pass@1
Gemini-2.5 Flash-Thinking
Thinking Mode=true
2025.12
95.7
DeepSeek-R1 0528 671B
Parameters=671B, Think...
2025.12
95.1
Qwen3 14B
Parameters=14B
2025.12
91.7
Nemotron-Cascade 14B-Thinking
Parameters=14B, Thinki...
2025.12
89.5
Nemotron Cascade-8B
Parameters=8B, Thinkin...
2025.12
87.9
Qwen3 8B
Parameters=8B
2025.12
85.8
Nemotron-Nano 9B-v2
Parameters=9B-v2
2025.12
74.6
Feedback
Search any
task
Search any
task