Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Pool

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination DetectionPool (test)
AuROC0.8598
12
Model RoutingSmall Pool
Oracle Accuracy92
6
Model RoutingSmall pool
Mean per-model AUC82.6
6
Content LocalizationPool HumanEdit and AiEdit average
Accuracy98.46
5
Speech Editing DetectionPool HumanEdit and AiEdit average
Acc98.46
5
Trajectory-controlled video generationPool
Interaction Realism4.4
2
Showing 6 of 6 rows