Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Out-of-Distribution Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
General ReasoningOut-of-Distribution Benchmarks MMLU-P, ARC-c, GPQA
MMLU-P Score52.1
16
Reasoning and KnowledgeOut-of-Distribution Benchmarks Summary
Average Score75.3
12
Showing 2 of 2 rows