Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scientific Reasoning on TheoremQA

42.3Accuracy

TAB

37.93239.06640.241.334Apr 6, 2026
Updated 11d ago

Evaluation Results

MethodLinks
2026.04
42.36,566
2026.04
42.110,754
2026.04
41.54,987
2026.04
40.93,141
2026.04
40.85,375
2026.04
40.35,988
2026.04
403,975
2026.04
39.92,320
2026.04
39.85,916
2026.04
39.52,578
2026.04
38.11,778