Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Math Problem Solving on GSM8k, SAT-Math, and MATH (AGIEval sampled test)
Loading...
32.22
GSM8k Accuracy
CRITIQ
23.1512
25.5056
27.86
30.2144
Feb 26, 2025
GSM8k Accuracy
SAT-Math Accuracy
MATH Accuracy
Overall Accuracy (AGIEval Composite)
Updated 4d ago
Evaluation Results
Method
Method
Links
GSM8k Accuracy
SAT-Math Accuracy
MATH Accuracy
Overall Accuracy (AGIEval Composite)
CRITIQ
Backbone=Llama-3.2-3B,...
2025.02
32.22
39.55
6.34
26.04
OWM
Backbone=Llama-3.2-3B,...
2025.02
28.51
32.27
5.8
22.19
Raw
Backbone=Llama-3.2-3B,...
2025.02
27.6
35
5.5
22.7
QR-Edu
Backbone=Llama-3.2-3B,...
2025.02
23.5
36.36
6.2
22.02
Feedback
Search any
task
Search any
task