Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OOD Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Out-of-distribution DetectionOOD Suite Mean
FPR@9543.74
27
OOD detectionOOD Suite Average
AUROC0.9142
15
Out-of-Domain GeneralizationOOD Suite BBH, HumanEval, MMLU, TruthfulQA
BBH Score59.1
4
Showing 3 of 3 rows