Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MaliciousAgentSkillsBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Malicious skill detectionMaliciousAgentSkillsBench (404 malicious, 502 benign)
Recall100
9
Malicious Instruction DetectionMaliciousAgentSkillsBench traditional IPI baselines
Precision63.93
4
Showing 2 of 2 rows