Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HEX-PHI

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety EvaluationHEx-PHI
HEx-PHI Score97.2
162
Backdoor Poisoning AttackHEx-PHI (150 questions)
ASR (No Trigger)18.94
20
Identity Shifting AttackHEx-PHI Identity Shifting Attack (300 questions)
ASR22.83
20
Safety EvaluationHEx-PHI
Harmfulness Score2.06
16
Jailbreak AttackHEX-PHI
ASR54.9
16
Jailbreak DefenseHEX-PHI
Harmful Score1.74
16
Prosocial AlignmentHEx-PHI (test)
MIP76.3
14
Safety AlignmentHEx-PHI
HEx-PHI Score98.8
12
Jailbreak Attack Success RateHEx-PHI (test)
ASR Category 186.67
12
Safety EvaluationHEX-PHI (test)
ASR7.333
12
Safety EvaluationHEx-PHI
Safety Score (HEx-PHI)69.87
10
Safety EvaluationHEx-PHI Direct
Safety Score (1-ASR)99.67
8
Win rate evaluationHEx-PHI
Win Rate87.54
2
Showing 13 of 13 rows