Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

UMWP

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination DetectionUMWP
ROC-AUC89.6
28
Hallucination Detection (Math Word Problems)UMWP
F1 Score89.1
12
Math ReasoningUMWP
False Positive Rate0.62
5
Showing 3 of 3 rows