Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
STEM Reasoning on GPQA Diamond (Accuracy, Token Count)
Loading...
61.62
Accuracy
COPT
56.368
57.7315
59.095
60.4585
May 19, 2026
Accuracy
Average Tokens
Updated 14d ago
Evaluation Results
Method
Method
Links
Accuracy
Average Tokens
COPT
Backbone=Qwen3-8B, Rea...
2026.05
61.62
6,851
CoT
Backbone=Qwen3-8B
2026.05
59.6
8,123
CoT (Greedy)
Backbone=Qwen3-8B
2026.05
56.57
7,909
Feedback
Search any
task
Search any
task