Share your thoughts, 1 month free Claude Pro on usSee more

Complex Reasoning on SCoRE (test)

16.26Accuracy

SelfBudgeter

Updated 3mo ago

Evaluation Results

Method	Links
SelfBudgeter 2025.05		16.26	4,491.35
L1 2025.05		13.69	5,145.91
E1-Math 2025.05		12.23	3,327.37
DeepSeek-R1-Distill-Qwen 2025.05		10.14	11,695.94
E1-Math 2025.05		6.69	1,272.04