Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Science on GPQA Diamond (Pass@1, Avg Token Length)
Loading...
71.5
Pass@1 Score
DeepSeek-R1
61.724
64.262
66.8
69.338
Mar 6, 2025
Pass@1 Score
Average Output Token Length
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1 Score
Average Output Token Length
DeepSeek-R1
2025.03
71.5
5,300
DeepSeek-R1-Distill-Llama-70B
Parameters=70B, Backbo...
2025.03
65.2
-
TinyR1-32B-Preview
Parameters=32B
2025.03
65
8,600
DeepSeek-R1-Distill-Qwen-32B
Parameters=32B, Backbo...
2025.03
62.1
5,300
Feedback
Search any
task
Search any
task