Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

S-Eval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety Risk EvaluationS-Eval (Risk)
ASR0
72
Jailbreak Attack EvaluationS-Eval Aattack
Attack Success Rate (ASR)92
72
Safety EvaluationS-Eval attack
Safety Score93.2
9
Safety EvaluationS-Eval (base)
Safety Score98.3
9
Showing 4 of 4 rows