| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Safety Evaluation | HEx-PHI | HEx-PHI Score97.2 | 162 | |
| Safety Evaluation | HEx-PHI | Attack Success Rate (ASR)5.17 | 87 | |
| Safety Evaluation | HEX-PHI (test) | Harmfulness Score (Llama-Guard-3B)2 | 56 | |
| Attack Success Rate | HEx-PHI | Attack Success Rate0 | 48 | |
| Safety Evaluation | HEx-PHI Alpaca risk-ranked subsets | S1 ASR (%)13.1 | 21 | |
| Safety Evaluation | HEx-PHI Dolly risk-ranked | S1 ASR8.97 | 21 | |
| Backdoor Poisoning Attack | HEx-PHI (150 questions) | ASR (No Trigger)18.94 | 20 | |
| Identity Shifting Attack | HEx-PHI Identity Shifting Attack (300 questions) | ASR22.83 | 20 | |
| Safety Alignment Evaluation | HEx-PHI | Harmful Response Rate0.7 | 18 | |
| Safety Alignment | HEx-PHI | HEx-PHI Score98.8 | 18 | |
| Safety Evaluation | HEx-PHI | Harmfulness Score2.06 | 16 | |
| Jailbreak Attack | HEX-PHI | ASR54.9 | 16 | |
| Jailbreak Defense | HEX-PHI | Harmful Score1.74 | 16 | |
| Prosocial Alignment | HEx-PHI (test) | MIP76.3 | 14 | |
| Jailbreak Attack Success Rate | HEx-PHI (test) | ASR Category 186.67 | 12 | |
| Safety Evaluation | HEx-PHI | Safety Score (HEx-PHI)69.87 | 10 | |
| Safety Evaluation | HEx-PHI | Accuracy99.06 | 9 | |
| Safety Evaluation | HEx-PHI Direct | Safety Score (1-ASR)99.67 | 8 | |
| Safety Evaluation | HEx-PHI | HEx-PHI Score98.49 | 6 | |
| Safety Alignment | HEx-PHI | Accuracy99.06 | 6 | |
| Win rate evaluation | HEx-PHI | Win Rate87.54 | 2 |