Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Out-of-Distribution

Benchmarks

Task NameDataset NameSOTA ResultTrend
General Reasoning and CodingOut-of-Distribution (ARC-c, GPQA, MMLU-Pro, LiveCodeBench)
ARC-c (pass@1)96.6
16
Debiasing EffectivenessOut-of-Distribution (OOD) Split
Mean Ratio29.31
16
Reasoning GeneralizationOut-of-Distribution Avg
Avg Score (OOD)59.7
15
model jamming with fixed melodiesOut of Distribution (test)
Harmony Ratio78.4
5
Metasurface inverse designOut-of-Distribution (test)
SG92
2
Showing 5 of 5 rows