Share your thoughts, 1 month free Claude Pro on usSee more

Natural Language Reasoning on Big-GSM

54.4Accuracy

TCR

Updated 4mo ago

Evaluation Results

Method	Links
TCR 2026.01		54.4
TCR 2026.01		53.9
Base Model 2026.01		52.7
Base Model 2026.01		52.5