Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BeyondBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Math Reasoning (coding tools)BeyondBench Easy
Execution Time (min)3.4
25
Math ReasoningBeyondBench Hard
Accuracy58.86
25
Math ReasoningBeyondBench Easy
Accuracy96.67
25
Showing 3 of 3 rows