| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Safety Evaluation | HEx-PHI | HEx-PHI Score97.2 | 162 | |
| Backdoor Poisoning Attack | HEx-PHI (150 questions) | ASR (No Trigger)18.94 | 20 | |
| Identity Shifting Attack | HEx-PHI Identity Shifting Attack (300 questions) | ASR22.83 | 20 | |
| Safety Evaluation | HEx-PHI | Harmfulness Score2.06 | 16 | |
| Jailbreak Attack | HEX-PHI | ASR54.9 | 16 | |
| Jailbreak Defense | HEX-PHI | Harmful Score1.74 | 16 | |
| Prosocial Alignment | HEx-PHI (test) | MIP76.3 | 14 | |
| Safety Alignment | HEx-PHI | HEx-PHI Score98.8 | 12 | |
| Jailbreak Attack Success Rate | HEx-PHI (test) | ASR Category 186.67 | 12 | |
| Safety Evaluation | HEX-PHI (test) | ASR7.333 | 12 | |
| Safety Evaluation | HEx-PHI | Safety Score (HEx-PHI)69.87 | 10 | |
| Safety Evaluation | HEx-PHI Direct | Safety Score (1-ASR)99.67 | 8 | |
| Win rate evaluation | HEx-PHI | Win Rate87.54 | 2 |