Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Science Reasoning on GPQA (Score)
Loading...
81.4
Score
DeepSeek v3.2
28.984
42.592
56.2
69.808
May 26, 2025
Jun 29, 2025
Aug 2, 2025
Sep 6, 2025
Oct 10, 2025
Nov 13, 2025
Dec 18, 2025
Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score
DeepSeek v3.2
2025.12
81.4
GLM-4.6
2025.12
78.8
DeepSeek R1 0528
2025.12
77.5
GPT-OSS
Parameters=120B
2025.12
77.3
GLM-4.5
2025.12
77
INTELLECT-3
Parameters=100B+
2025.12
74.4
GLM-4.5-Air
2025.12
73.3
TPO
Model=LLaMA-3.2-3B
2025.05
39
GRPO
Model=LLaMA-3.2-3B
2025.05
38
CPO
Model=LLaMA-3.2-3B
2025.05
35.5
KTO
Model=LLaMA-3.2-3B
2025.05
35
TI-DPO
Model=LLaMA-3.2-3B
2025.05
34.5
DPO
Model=LLaMA-3.2-3B
2025.05
34
TDPO
Model=LLaMA-3.2-3B
2025.05
34
SIMPO
Model=LLaMA-3.2-3B
2025.05
33.5
SFT
Model=LLaMA-3.2-3B
2025.05
33
IPO
Model=LLaMA-3.2-3B
2025.05
31
Feedback
Search any
task
Search any
task