Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Knowledge on GPQA Diamond
Loading...
65
Pass@1
Claude 3.5 Sonnet-1022
48.464
52.757
57.05
61.343
Mar 2, 2026
Pass@1
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
Claude 3.5 Sonnet-1022
Model Backbone=Claude...
2026.03
65
MiMo-RL 7B-R-TAP
Model Backbone=7B, Rea...
2026.03
60.7
OpenAI o1-mini
Model Backbone=o1-mini
2026.03
60
R1-Distill-Qwen-14B
Model Backbone=Qwen-14...
2026.03
59.1
QwQ-32B Preview
Model Backbone=QwQ-32B
2026.03
54.5
MiMo 7B-RL
Model Backbone=7B, Rea...
2026.03
54.4
GPT-4o 0513
Model Backbone=GPT-4o
2026.03
49.9
R1-Distill-Qwen-7B
Model Backbone=Qwen-7B...
2026.03
49.1
Feedback
Search any
task
Search any
task