Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GPQA & MMLU

Benchmarks

Task NameDataset NameSOTA ResultTrend
General ReasoningGPQA-Diamond & MMLU-Pro
Accuracy53.6
35
Showing 1 of 1 rows