Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (maj@2, maj@4, maj@8)
Loading...
64.52
Accuracy (maj@2)
DLE (top-p+top-k)
42.4408
48.1729
53.905
59.6371
Apr 22, 2026
Accuracy (maj@2)
Accuracy (maj@4)
Accuracy (maj@8)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy (maj@2)
Accuracy (maj@4)
Accuracy (maj@8)
DLE (top-p+top-k)
Model=Llama3.2-3B-Inst...
2026.04
64.52
69.45
71.49
DLE (ε-sampling)-RANDBRANCH
Model=Llama3.2-3B-Inst...
2026.04
64.52
68.31
72.33
DLE (min-p)
Model=Llama3.2-3B-Inst...
2026.04
64.44
68.84
71.57
DLE (ε-sampling)-PROBFIRST
Model=Llama3.2-3B-Inst...
2026.04
64.44
68.92
71.65
DLE (ε-sampling)-DIVFIRST
Model=Llama3.2-3B-Inst...
2026.04
62.62
67.4
69.52
Self-consistency (min-p)
Model=Llama3.2-3B-Inst...
2026.04
59.89
68.39
73.69
Self-consistency (ε-sampling)
Model=Llama3.2-3B-Inst...
2026.04
58.38
62.33
69.14
Self-consistency (top-p+top-k)
Model=Llama3.2-3B-Inst...
2026.04
56.25
65.81
72.48
Self-consistency
Model=Llama3.2-3B-Inst...
2026.04
43.29
54.36
65.58
Feedback
Search any
task
Search any
task