| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Principle-based evaluation dataset | G Score8.68 | 12 | 3mo ago | ||
| Toxicity | Steering Success64 | 11 | 2mo ago | ||
| QA | Steering Success62.5 | 11 | 2mo ago | ||
| Jailbreak | Steering Success82.5 | 11 | 2mo ago | ||
| Emotion | SpotLight | Steering Success92.7 | 11 | 2mo ago | |
| AI Persona | InstABoost | Steering Success92.5 | 11 | 2mo ago | |
| Corrigibility | DISCO-Q | LLM Judge Score3.22 | 10 | 26d ago | |
| Wealth | Comm Steer | LLM Judge Score2.25 | 10 | 26d ago | |
| Power | Comm Steer | LLM Judge Score2.91 | 10 | 26d ago | |
| Human study | Task Tokens | Human-likeness Win Rate93 | 6 | 2mo ago |