Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Backdoor Jailbreaking

Benchmarks

Task NameDataset NameSOTA ResultTrend
Defense against Harmful Fine-tuningBackdoor Jailbreaking With Trigger
HS Score67.2
6
Defense against Harmful Fine-tuningBackdoor Jailbreaking No Trigger
Harm Score1.6
6
Showing 2 of 2 rows