Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FORTRESS

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak Attack DefenseFORTRESS
ASR9.8
24
Overrefusal EvaluationFortress OR
Helpfulness Score97.6
12
Jailbreaking Safety EvaluationFortress
Safety Score86.84
12
Non-Agentic Performance EvaluationFortress (test)
Mean Score78.75
4
Safety EvaluationFortress
Cost per Accuracy Point ($)0.0016
4
Showing 5 of 5 rows