Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Natural Language Reasoning on Big-GSM

54.4Accuracy

TCR

52.42452.93753.4553.963Jan 29, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.01
54.4
2026.01
53.9
2026.01
52.7
2026.01
52.5