Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AIME, GPQA, MMLU-Pro, ToolBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Single-turn ReasoningAIME, GPQA, MMLU-Pro, ToolBench Aggregate
Average Score65
28
Showing 1 of 1 rows