Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Expert Knowledge QA on GPQA
Loading...
35.55
Pass@1
SPINE
2.8004
11.3027
19.805
28.3073
Nov 22, 2025
Pass@1
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
SPINE
Base Model=Qwen3-1.7B
2025.11
35.55
TTRL
Base Model=Qwen3-1.7B
2025.11
29.94
SPINE
Base Model=Qwen2.5-Mat...
2025.11
28.93
TTRL
Base Model=Qwen2.5-Mat...
2025.11
25.38
LMSI
Base Model=Qwen2.5-Mat...
2025.11
19.19
SEALONG
Base Model=Qwen2.5-Mat...
2025.11
18.69
LMSI
Base Model=Qwen3-1.7B
2025.11
18.18
SEALONG
Base Model=Qwen3-1.7B
2025.11
13.64
No adaptation
Base Model=Qwen3-1.7B
2025.11
9.09
Self-Consistency
Base Model=Qwen3-1.7B
2025.11
8.75
Self-Consistency
Base Model=Qwen2.5-Mat...
2025.11
6.15
No adaptation
Base Model=Qwen2.5-Mat...
2025.11
4.06
Feedback
Search any
task
Search any
task