Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Reasoning Evaluation on Average (AVG)

71.25Accuracy

Vanilla

9.962825.873941.78557.6961Apr 4, 2026
Updated 11d ago

Evaluation Results

MethodLinks
2026.04
71.2510.423,57110.5
2026.04
66.8510.341,0552.7
2026.04
66.7917.839843.2
2026.04
63.1311.021,8264.8
2026.04
62.913.684,33616.6
2026.04
62.810.171,2893.7
2026.04
61.211.186750.2
2026.04
61.1220.619724.4
2026.04
60.541.617900.4
2026.04
59.0412.866591.9
2026.04
58.8910.621,8387.2
2026.04
54.038.363,00311.3
2026.04
52.122.911,3324.4
2026.04
51.9212.141,5924.6
2026.04
17.1134.981,0248
2026.04
12.3236.321,0248.9