Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Natural Language Reasoning on Big-GSM
Loading...
54.4
Accuracy
TCR
52.424
52.937
53.45
53.963
Jan 29, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
TCR
Backbone Model=Qwen2.5...
2026.01
54.4
TCR
Backbone Model=Phi-3-I...
2026.01
53.9
Base Model
Backbone Model=Qwen2.5...
2026.01
52.7
Base Model
Backbone Model=Phi-3-I...
2026.01
52.5
Feedback
Search any
task
Search any
task