| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Overall OOD (test) | interpretable AI judge | Accuracy97.4 | 1 | 1mo ago | |
| MultiDialog (Human-Human) OOD (test) | interpretable AI judge | Accuracy95.31 | 1 | 1mo ago | |
| Fisher (Human-Human) OOD (test) | interpretable AI judge | Accuracy98.44 | 1 | 1mo ago | |
| CosyVoice2 Pseudo Human OOD (test) | interpretable AI judge | Accuracy98.44 | 1 | 1mo ago | |
| Inner In-Domain (test) | interpretable AI judge | Accuracy96.05 | 1 | 1mo ago |