Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General Reasoning on GPQA Diamond (pass@1 accuracy)
Loading...
40.5
Pass@1 Accuracy
R-Zero
25.732
29.566
33.4
37.234
Feb 3, 2026
Pass@1 Accuracy
Updated 3d ago
Evaluation Results
Method
Method
Links
Pass@1 Accuracy
R-Zero
Backbone Model=Qwen3-8...
2026.02
40.5
AERO
Backbone Model=Qwen3-8...
2026.02
38.4
AERO
Backbone Model=Qwen3-8...
2026.02
36.9
Absolute Zero
Backbone Model=Qwen3-8...
2026.02
36.8
R-Zero
Backbone Model=Qwen3-4...
2026.02
36.4
AERO
Backbone Model=Qwen3-8...
2026.02
35.9
Absolute Zero
Backbone Model=Qwen3-4...
2026.02
35.3
AERO
Backbone Model=Qwen3-4...
2026.02
34.3
AERO
Backbone Model=Qwen3-4...
2026.02
34.3
AERO
Backbone Model=Qwen3-8...
2026.02
34.3
AERO
Backbone Model=Qwen3-8...
2026.02
33.8
AERO
Backbone Model=Qwen3-4...
2026.02
33.3
Qwen3-8B-Base
Backbone Model=Qwen3-8...
2026.02
33.3
AERO
Backbone Model=Qwen3-4...
2026.02
32.3
Qwen3-4B-Base
Backbone Model=Qwen3-4...
2026.02
26.3
AERO
Backbone Model=Qwen3-4...
2026.02
26.3
Feedback
Search any
task
Search any
task