Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Liars' Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Deception DetectionLiars' Bench Harm-Pressure Knowledge (test)
AUROC0.91
3
Deception DetectionLiars' Bench Insider Trading (test)
AUROC0.953
3
Deception DetectionLiars' Bench Convincing Game (test)
AUROC1
3
Deception DetectionLiars' Bench Harm-Pressure Choice (test)
AUROC0.949
3
Deception DetectionLiars' Bench Instructed Deception (test)
AUROC0.939
3
Showing 5 of 5 rows