Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
STEM Reasoning on MMLU STEM
Loading...
73.7
Accuracy (STEM)
TaH+
41.356
49.753
58.15
66.547
Nov 11, 2025
Dec 12, 2025
Jan 13, 2026
Feb 14, 2026
Mar 17, 2026
Apr 18, 2026
May 20, 2026
Accuracy (STEM)
Updated 13d ago
Evaluation Results
Method
Method
Links
Accuracy (STEM)
TaH+
Param=1.7B
2025.11
73.7
Standard
Param=1.7B
2025.11
70.8
SoftThink
Param=1.7B
2025.11
70.6
AlwaysThink
Param=1.7B
2025.11
63.8
AGPO
clipping=adaptive, ATS...
2026.05
58.1
AGPO
clipping=adaptive, ATS...
2026.05
57.9
GRPO
clipping=fixed ε
2026.05
57.6
GRPO
ATS=true
2026.05
57.1
TaH+
Param=0.6B
2025.11
56.3
Adaptive-KL PPO
2026.05
56
Standard
Param=0.6B
2025.11
51.6
SoftThink
Param=0.6B
2025.11
51.4
PPO
2026.05
48.8
DPO
2026.05
46.9
AlwaysThink
Param=0.6B
2025.11
42.6
Feedback
Search any
task
Search any
task