Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HEX-PHI

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety EvaluationHEx-PHI
HEx-PHI Score97.2
162
Safety EvaluationHEx-PHI
Attack Success Rate (ASR)5.17
87
Safety EvaluationHEX-PHI (test)
Harmfulness Score (Llama-Guard-3B)2
56
Attack Success RateHEx-PHI
Attack Success Rate0
48
Safety EvaluationHEx-PHI Alpaca risk-ranked subsets
S1 ASR (%)13.1
21
Safety EvaluationHEx-PHI Dolly risk-ranked
S1 ASR8.97
21
Backdoor Poisoning AttackHEx-PHI (150 questions)
ASR (No Trigger)18.94
20
Identity Shifting AttackHEx-PHI Identity Shifting Attack (300 questions)
ASR22.83
20
Safety Alignment EvaluationHEx-PHI
Harmful Response Rate0.7
18
Safety AlignmentHEx-PHI
HEx-PHI Score98.8
18
Safety EvaluationHEx-PHI
Harmfulness Score2.06
16
Jailbreak AttackHEX-PHI
ASR54.9
16
Jailbreak DefenseHEX-PHI
Harmful Score1.74
16
Prosocial AlignmentHEx-PHI (test)
MIP76.3
14
Jailbreak Attack Success RateHEx-PHI (test)
ASR Category 186.67
12
Safety EvaluationHEx-PHI
Safety Score (HEx-PHI)69.87
10
Safety EvaluationHEx-PHI
Accuracy99.06
9
Safety EvaluationHEx-PHI Direct
Safety Score (1-ASR)99.67
8
Safety EvaluationHEx-PHI
HEx-PHI Score98.49
6
Safety AlignmentHEx-PHI
Accuracy99.06
6
Win rate evaluationHEx-PHI
Win Rate87.54
2
Showing 21 of 21 rows