| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Open-Ended Dialogue (out-of-distribution) | CausalRM | MT-Bench68.3 | 4 | 4d ago | |
| Open-Ended Dialogue (in-distribution) | CausalRM | Helpful Score69.4 | 4 | 4d ago | |
| OOD Average | CausalRM | Win Rate60.5 | 4 | 4d ago | |
| TruthfulQA OOD | CausalRM | Wins52.2 | 4 | 4d ago | |
| SHP OOD | CausalRM | Win Rate77.5 | 4 | 4d ago | |
| PKU-SafeRLHF OOD | CausalRM | Win Rate67.8 | 4 | 4d ago | |
| MT-Bench OOD | CausalRM | Win Rate44.3 | 4 | 4d ago | |
| ID Average | CausalRM | Win Rate72.2 | 4 | 4d ago | |
| Anthropic-Harmless ID | CausalRM | Win Rate68.1 | 4 | 4d ago | |
| Anthropic-Helpful (ID) | CausalRM | Win Rate0.762 | 4 | 4d ago |