Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on GSM-8K (Score)
Loading...
89.11
Accuracy
Qwen3-8B
49.4756
59.7653
70.055
80.3447
Jan 27, 2026
Feb 7, 2026
Feb 19, 2026
Mar 3, 2026
Mar 14, 2026
Mar 26, 2026
Apr 7, 2026
Accuracy
Updated 9d ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen3-8B
Variant=Fine-tune
2026.03
89.11
Qwen3-8B
Variant=SFA (k = 16)
2026.03
87.99
Qwen3-8B
Variant=Base
2026.03
87.62
Qwen3-4B
Variant=Fine-tune
2026.03
76.18
Qwen3-4B
Variant=SFA (k = 16)
2026.03
75.56
Qwen3-4B
Variant=Base
2026.03
75.44
S³
Backbone=LLaDA-8B-Inst...
2026.04
70.21
Best-of-K
Backbone=LLaDA-8B-Inst...
2026.04
69.56
KEEL
Training=SFT, Evaluati...
2026.01
68.8
Baseline Diffusion
Backbone=LLaDA-8B-Inst...
2026.04
68.16
Qwen3-0.6B
Variant=Fine-tune
2026.03
63.42
Qwen3-0.6B
Variant=SFA (k = 16)
2026.03
61.46
KEEL
Architecture=512 Layer...
2026.01
60.9
Qwen3-0.6B
Variant=Base
2026.03
59.59
Pre-LN
Training=SFT, Evaluati...
2026.01
58.7
Pre-LN
Architecture=512 Layer...
2026.01
51
Feedback
Search any
task
Search any
task