Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

IMBD

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination DetectionIMBD (test)
AuROC0.5458
10
Dataset AdditionIMBD
Accuracy (5%)62
5
Dataset RemovalIMBD
Accuracy (5% Removal)77
5
Noisy Label DetectionIMBD
F1 Score (5% Noise)0.18
5
Showing 4 of 4 rows