Share your thoughts, 1 month free Claude Pro on usSee more

Mathematical Reasoning on PolyMath

20.9Accuracy

self-cons (ours)

Updated 1mo ago

Evaluation Results

Method	Links
self-cons (ours) 2026.05		20.9
self-cons (ours) 2026.05		20.5
S1 2026.05		20.2
S1 2026.05		19.5
base 2026.05		18
base 2026.05		17.5
LIDR 2026.05		16.2
LIDR 2026.05		15.2
self-cons (ours) 2026.05		8.1
S1 2026.05		8
base 2026.05		7.5
LIDR 2026.05		5.9