Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (Base Metrics Set)
Loading...
78.8
Base Accuracy
Llama-3-8B-Instruct
74.86
76.83
78.8
80.77
Apr 15, 2026
Base Accuracy
A-Pos Accuracy
A-Neg Accuracy
C-Pos Accuracy
C-Neg Accuracy
E-Pos Accuracy
E-Neg Accuracy
N-Pos Accuracy
N-Neg Accuracy
O-Pos Accuracy
O-Neg Accuracy
Updated 3d ago
Evaluation Results
Method
Method
Links
Base Accuracy
A-Pos Accuracy
A-Neg Accuracy
C-Pos Accuracy
C-Neg Accuracy
E-Pos Accuracy
E-Neg Accuracy
N-Pos Accuracy
N-Neg Accuracy
O-Pos Accuracy
O-Neg Accuracy
Llama-3-8B-Instruct
Setting=Base
2026.04
78.8
-
-
-
-
-
-
-
-
-
-
IRIS
Backbone=Llama-3-8B-In...
2026.04
-
72.4
79.8
77.9
70.4
73.5
75.4
76.7
79.7
77.8
69.3
Feedback
Search any
task
Search any
task