Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General Reasoning on GPQA diamond
Loading...
30.4
Avg@8 Accuracy
GPS
16.776
20.313
23.85
27.387
Feb 2, 2026
Avg@8 Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Avg@8 Accuracy
GPS
Backbone=DeepSeek-R1-D...
2026.02
30.4
PCL
Backbone=DeepSeek-R1-D...
2026.02
28.5
Uniform Sampling
Backbone=DeepSeek-R1-D...
2026.02
27.5
MoPPS
Backbone=DeepSeek-R1-D...
2026.02
27.5
Dynamic Sampling (Oracle)
Backbone=DeepSeek-R1-D...
2026.02
26.8
GRESO
Backbone=DeepSeek-R1-D...
2026.02
26.4
GRESO
Backbone=DeepSeek-R1-D...
2026.02
25
MoPPS
Backbone=DeepSeek-R1-D...
2026.02
23.4
DeepSeek-R1-Distill-1.5B
Backbone=DeepSeek-R1-D...
2026.02
22.8
PCL
Backbone=DeepSeek-R1-D...
2026.02
22.4
GPS
Backbone=DeepSeek-R1-D...
2026.02
22.2
Dynamic Sampling (Oracle)
Backbone=DeepSeek-R1-D...
2026.02
19.6
DeepSeek-R1-Distill-7B
Backbone=DeepSeek-R1-D...
2026.02
19.2
Uniform Sampling
Backbone=DeepSeek-R1-D...
2026.02
17.3
Feedback
Search any
task
Search any
task