Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Knowledge on SuperGPQA
Loading...
48.2
pass@1
Claude 3.5 Sonnet-1022
15.9808
24.3454
32.71
41.0746
Jan 8, 2026
Jan 16, 2026
Jan 25, 2026
Feb 3, 2026
Feb 12, 2026
Feb 21, 2026
Mar 2, 2026
pass@1
Updated 1mo ago
Evaluation Results
Method
Method
Links
pass@1
Claude 3.5 Sonnet-1022
Model Backbone=Claude...
2026.03
48.2
MiMo-RL 7B-R-TAP
Model Backbone=7B, Rea...
2026.03
47.3
OpenAI o1-mini
Model Backbone=o1-mini
2026.03
45.2
QwQ-32B Preview
Model Backbone=QwQ-32B
2026.03
43.6
GPT-4o 0513
Model Backbone=GPT-4o
2026.03
42.4
R1-Distill-Qwen-14B
Model Backbone=Qwen-14...
2026.03
40.6
MiMo 7B-RL
Model Backbone=7B, Rea...
2026.03
40.5
Qwen3-8B
Backbone=Qwen3-8B
2026.01
36.21
RelayLLM (Difficulty-Aware)
Student Model Backbone...
2026.01
29.93
RelayLLM (Simple)
Student Model Backbone...
2026.01
29.85
R1-Distill-Qwen-7B
Model Backbone=Qwen-7B...
2026.03
28.9
CITER
Student Model Backbone...
2026.01
28.25
GRPO
Student Model Backbone...
2026.01
26.01
Base Model
Student Model Backbone...
2026.01
24.46
RelayLLM (Simple)
Student Model Backbone...
2026.01
21.35
RelayLLM (Difficulty-Aware)
Student Model Backbone...
2026.01
20.88
CITER
Student Model Backbone...
2026.01
20.34
GRPO
Student Model Backbone...
2026.01
19.91
Base Model
Student Model Backbone...
2026.01
17.22
Feedback
Search any
task
Search any
task