Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Deceptive Alignment Benchmark (DAB)

Benchmarks

Task NameDataset NameSOTA ResultTrend
Latent Knowledge ElicitationDeceptive Alignment Benchmark (DAB) 400 scenarios
Elicitation Accuracy81.2
12
Showing 1 of 1 rows