Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (test) (Acc, Time)
Loading...
82.34
Accuracy
Latent-GRPO
52.3672
60.1486
67.93
75.7114
Jan 13, 2026
Accuracy
Inference Time
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Inference Time
Latent-GRPO
Model Scale=Qwen3-4B
2026.01
82.34
658.21
Rule-based
Model Scale=Qwen3-4B
2026.01
79.87
651.45
Latent-GRPO
Model Scale=Qwen3-1.7B
2026.01
73.88
492.34
LLM-as-Judge
Model Scale=Qwen3-4B
2026.01
72.12
1,411.72
Rule-based
Model Scale=Qwen3-1.7B
2026.01
71.55
488.63
LLM-as-Judge
Model Scale=Qwen3-1.7B
2026.01
64.2
1,032.55
Latent-GRPO
Model Scale=Qwen3-0.6B
2026.01
61.25
431.18
Rule-based
Model Scale=Qwen3-0.6B
2026.01
58.41
434.61
LLM-as-Judge
Model Scale=Qwen3-0.6B
2026.01
53.52
768.42
Feedback
Search any
task
Search any
task