Share your thoughts, 1 month free Claude Pro on usSee more

Graduate-level Science Reasoning on GPQA (test)

34.3Accuracy

E1-Math

Updated 3mo ago

Evaluation Results

Method	Links
E1-Math 2025.05		34.3	2,758.77
DeepSeek-R1-Distill-Qwen 2025.05		33.04	11,780.87
L1 2025.05		31.92	3,892.47
SelfBudgeter 2025.05		30.65	3,326.83
E1-Math 2025.05		26.34	1,278.19