Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Fine-grained Safety Control on CoSApien Disallowed instructions
Loading...
97.9
Response Accuracy (GD)
Base
19.9
40.15
60.4
80.65
May 22, 2026
Response Accuracy (GD)
Response Accuracy (AB)
Response Accuracy (PP)
Average Utility
Updated 8d ago
Evaluation Results
Method
Method
Links
Response Accuracy (GD)
Response Accuracy (AB)
Response Accuracy (PP)
Average Utility
Base
Base Model=LLaMA2-7B-Chat
2026.05
97.9
34.5
100
0.365
AutoDAN
Base Model=LLaMA2-7B-Chat
2026.05
95.8
31
100
0.168
PALETTE
Base Model=LLaMA2-7B-Chat
2026.05
95.8
86.2
100
0.353
CAST
Base Model=LLaMA2-7B-Chat
2026.05
22.9
70.7
47.9
0.326
Feedback
Search any
task
Search any
task