Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Expert Scientific Reasoning on GPQA-D
Loading...
8.7
Full Length
Minimal-core extraction
7.764
8.007
8.25
8.493
May 14, 2026
Full Length
Core Length
CR
RM
Top-3 Mass
Retention
Updated 19d ago
Evaluation Results
Method
Method
Links
Full Length
Core Length
CR
RM
Top-3 Mass
Retention
Minimal-core extraction
Model=GPT-5
2026.05
8.7
4.4
51
49
66
84
Minimal-core extraction
Model=DeepSeek-R1-Dist...
2026.05
8.3
4.7
57
43
62
82
Minimal-core extraction
Model=Qwen3-32B
2026.05
8.1
4.8
59
41
60
81
Minimal-core extraction
Model=DeepSeek-R1-Dist...
2026.05
7.8
5
64
36
55
76
Feedback
Search any
task
Search any
task