Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning on BigBenchHard
Loading...
100
Accuracy (BigBenchHard)
Ll4-Sct
29.072
47.486
65.9
84.314
Feb 2, 2026
Feb 20, 2026
Mar 11, 2026
Mar 30, 2026
Apr 18, 2026
May 7, 2026
May 26, 2026
Accuracy (BigBenchHard)
Updated 7d ago
Evaluation Results
Method
Method
Links
Accuracy (BigBenchHard)
Ll4-Sct
2026.05
100
Gm2.5fl
2026.05
99.19
Gemma 3 27B
Parameters=27B
2026.02
82.4
General-Reasoner-14B
Model Category=Frontie...
2026.04
78.2
TTRL
Base Model=Qwen3-8B-Base
2026.04
74.9
TEMPO
Base Model=Qwen3-8B-Base
2026.04
74.2
DictaLM 3.0 24B-Think
Parameters=24B, Varian...
2026.02
73
Mistral Small 3.1
2026.02
71.91
DictaLM 3.0 12B-Inst
Parameters=12B, Varian...
2026.02
71.38
Gemma 3 12B
Parameters=12B
2026.02
70.1
Zero-RL (PPO)
Base Model=Qwen3-8B-Base
2026.04
69.9
TEMPO
Base Model=OLMO3-7B-Base
2026.04
68.2
EMPO
Base Model=Qwen3-8B-Base
2026.04
66.7
General-Reasoner-7B
Model Category=Frontie...
2026.04
65.6
MiMo-Zero-RL-7B
Model Category=Frontie...
2026.04
61.4
Olmo-3-7B-RL-Zero-General
Model Category=Frontie...
2026.04
56.5
EMPO
Base Model=OLMO3-7B-Base
2026.04
52.9
Zero-RL (PPO)
Base Model=OLMO3-7B-Base
2026.04
46.8
TTRL
Base Model=OLMO3-7B-Base
2026.04
45.4
MobileMoE-L
Active Parameters=922M...
2026.05
37.8
MobileMoE-M
Active Parameters=528M...
2026.05
37.7
MobileMoE-S
Active Parameters=272M...
2026.05
31.8
Feedback
Search any
task
Search any
task