Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Harmful Query Evaluation Set

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak AttackHarmful Query Evaluation Set Claude Haiku 4.5 N=750
Toxicity1.01
10
Jailbreak AttackN=750 Harmful Query Evaluation Set Gemini-3.1-Flash-Lite
Toxicity2.75
10
Jailbreak AttackHarmful Query Evaluation Set Gemini-2.5-Flash N=750
Toxicity2.18
10
Jailbreak AttackHarmful Query Evaluation Set GPT-5.4-mini N=750
Toxicity Score1.01
10
Jailbreak AttackHarmful Query Evaluation Set N=750 GPT-5.4-nano
Toxicity Score1.05
10
Showing 5 of 5 rows