Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Knowledge on GPQA Diamond
Loading...
65
Pass@1
Claude 3.5 Sonnet-1022
22.984
33.892
44.8
55.708
Mar 2, 2026
Mar 11, 2026
Mar 20, 2026
Mar 29, 2026
Apr 7, 2026
Apr 16, 2026
Apr 26, 2026
Pass@1
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
Claude 3.5 Sonnet-1022
Model Backbone=Claude...
2026.03
65
MiMo-RL 7B-R-TAP
Model Backbone=7B, Rea...
2026.03
60.7
OpenAI o1-mini
Model Backbone=o1-mini
2026.03
60
R1-Distill-Qwen-14B
Model Backbone=Qwen-14...
2026.03
59.1
QwQ-32B Preview
Model Backbone=QwQ-32B
2026.03
54.5
MiMo 7B-RL
Model Backbone=7B, Rea...
2026.03
54.4
GPT-4o 0513
Model Backbone=GPT-4o
2026.03
49.9
R1-Distill-Qwen-7B
Model Backbone=Qwen-7B...
2026.03
49.1
SRFT [13]
Base Model=Qwen2.5-Mat...
2026.04
46.4
ReLIFT [12]
Base Model=Qwen2.5-Mat...
2026.04
43.1
HPT [15]
Base Model=Qwen2.5-Mat...
2026.04
42.9
LUFFY [11]
Base Model=Qwen2.5-Mat...
2026.04
41.8
SFT→RL
Base Model=Qwen2.5-Mat...
2026.04
40.6
LUFFY [11]
Base Model=Qwen2.5-Mat...
2026.04
39.9
Prefix-RFT [14]
Base Model=Qwen2.5-Mat...
2026.04
39.1
ReLIFT [12]
Base Model=Qwen2.5-Mat...
2026.04
36.7
SFT
Base Model=Qwen2.5-Mat...
2026.04
24.6
Feedback
Search any
task
Search any
task