Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mistake

Benchmarks

Task NameDataset NameSOTA ResultTrend
Dishonesty EvaluationMistake medical (test)
Dishonesty Accuracy67.52
32
Data RankingMistake medical
AUROC81
28
Showing 2 of 2 rows