Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OOD

Benchmarks

Task NameDataset NameSOTA ResultTrend
Speech Emotion RecognitionFour OOD (test)
Macro-F1 Delta1.57
21
Speculative decoding evaluationOOD Mean
Speedup5.21
20
Unsupervised Object SegmentationOOD 1.0 (test)
FG-ARI7,824
16
LLM RoutingOOD
Accuracy89
11
OOD DetectionOOD
AUC (Confidence)0.822
9
Language ModelingOOD
Loss1.285
7
ClassificationOOD
Accuracy65.71
6
Speculative DecodingOOD
Block Efficiency2.13
5
Defective Dialog DetectionOOD Shopping n = 105 (test)
Precision48
5
Unsupervised image annotationOOD set
NMI0.54
5
Referential CommunicationOOD set
Accuracy92.7
5
Image DenoisingOOD Average
PSNR39.94
4
MR Image Quality TransferOOD
SSIM82.19
4
STL-conditioned Robotic PlanningOOD-3 Layout
Success Rate (OOD-3 All)23
4
STL-conditioned Robotic PlanningOOD-2 Layout
OOD-2 Success Rate (All)12.88
4
Open-ended DialogueOOD Average
Win Rate60.5
4
Table UnderstandingOOD Table S2 (test)
ROUGE-L40.38
4
Table UnderstandingOOD Table S1 (test)
Accuracy80.2
4
Mapless NavigationOOD Physical Track (a) 1/10th scale (test)
Lap Time (s)9.56
3
Binary classification (Human vs Machine speech)Overall OOD (test)
Accuracy97.4
1
Showing 20 of 20 rows