Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OOD

Benchmarks

Task NameDataset NameSOTA ResultTrend
Out-of-Domain Reasoning AggregationOOD Average
Accuracy63.57
22
Speech Emotion RecognitionFour OOD (test)
Macro-F1 Delta1.57
21
Speculative decoding evaluationOOD Mean
Speedup5.21
20
Out-of-Distribution DetectionOOD datasets
pAUROC@2094.2
17
Unsupervised Object SegmentationOOD 1.0 (test)
FG-ARI7,824
16
LLM RoutingOOD
Accuracy89
11
OOD DetectionOOD
AUC (Confidence)0.822
9
Mathematical and Scientific ReasoningOOD AIME, HMMT, GPQA, MMLU-Pro, MMLU-Redux 2.0
Pass@189.5
8
Language ModelingOOD
Loss1.285
7
Diffusion-generated time series detectionAvg. OOD Aggregate of TSDiff, Diffusion-TS, WaveStitch (summary)
F1 Score84.8
6
DetoxificationOOD
TP Score54
6
ClassificationOOD
Accuracy65.71
6
Speculative DecodingOOD
Block Efficiency2.13
5
Defective Dialog DetectionOOD Shopping n = 105 (test)
Precision48
5
Unsupervised image annotationOOD set
NMI0.54
5
Referential CommunicationOOD set
Accuracy92.7
5
Safe Robot NavigationOOD Case II: high obstacle density (30 obstacles)
SR44.2
4
Image DenoisingOOD Average
PSNR39.94
4
MR Image Quality TransferOOD
SSIM82.19
4
STL-conditioned Robotic PlanningOOD-3 Layout
Success Rate (OOD-3 All)23
4
STL-conditioned Robotic PlanningOOD-2 Layout
OOD-2 Success Rate (All)12.88
4
Open-ended DialogueOOD Average
Win Rate60.5
4
Table UnderstandingOOD Table S2 (test)
ROUGE-L40.38
4
Table UnderstandingOOD Table S1 (test)
Accuracy80.2
4
Synthetic Face DetectionOOD (Out-of-Distribution)
ECE0.0516
3
Showing 25 of 35 rows