Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

OOD Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Out-of-distribution DetectionOOD Suite Mean
FPR@9543.74
27
OOD detectionOOD Suite Average
AUROC0.9011
6
Out-of-Domain GeneralizationOOD Suite BBH, HumanEval, MMLU, TruthfulQA
BBH Score59.1
4
Showing 3 of 3 rows