Share your thoughts, 1 month free Claude Pro on usSee more

Mathematical Reasoning on GSM8K (maj@2, maj@4, maj@8)

64.52Accuracy (maj@2)

DLE (top-p+top-k)

Updated 3mo ago

Evaluation Results

Method	Links
DLE (top-p+top-k) 2026.04		64.52	69.45	71.49
DLE (ε-sampling)-RANDBRANCH 2026.04		64.52	68.31	72.33
DLE (min-p) 2026.04		64.44	68.84	71.57
DLE (ε-sampling)-PROBFIRST 2026.04		64.44	68.92	71.65
DLE (ε-sampling)-DIVFIRST 2026.04		62.62	67.4	69.52
Self-consistency (min-p) 2026.04		59.89	68.39	73.69
Self-consistency (ε-sampling) 2026.04		58.38	62.33	69.14
Self-consistency (top-p+top-k) 2026.04		56.25	65.81	72.48
Self-consistency 2026.04		43.29	54.36	65.58