Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

HEX-PHI

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety EvaluationHEx-PHI
HEx-PHI Score97.2
148
Backdoor Poisoning AttackHEx-PHI (150 questions)
ASR (No Trigger)18.94
20
Identity Shifting AttackHEx-PHI Identity Shifting Attack (300 questions)
ASR22.83
20
Jailbreak AttackHEX-PHI
ASR54.9
16
Jailbreak DefenseHEX-PHI
Harmful Score1.74
16
Prosocial AlignmentHEx-PHI (test)
MIP76.3
14
Jailbreak Attack Success RateHEx-PHI (test)
ASR Category 186.67
12
Safety EvaluationHEX-PHI (test)
ASR7.333
12
Safety EvaluationHEx-PHI Direct
Safety Score (1-ASR)99.67
8
Showing 9 of 9 rows