| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| SimpleToM | GPT-5 | Accuracy99.24 | 29 | 1mo ago | |
| TactfulToM | DeepSeek-R1 | Accuracy69.69 | 26 | 1mo ago | |
| Hi-ToM | SocialR1-8B | Accuracy70.83 | 26 | 1mo ago | |
| MotiveBench | Accuracy94 | 26 | 1mo ago | ||
| EmoBench | Accuracy80.39 | 26 | 1mo ago | ||
| ToMBench Hard (val) | SocialR1-8B | Accuracy62.79 | 26 | 1mo ago | |
| ToMBench | Accuracy78.34 | 26 | 1mo ago | ||
| Sotopia hard | Rel Score2.4 | 17 | 5d ago | ||
| MotiveBench OOD (test) | GPT-4o | Amazon Score0.9011 | 17 | 1mo ago | |
| Sotopia (all) | Rel Score2.73 | 15 | 5d ago | ||
| SIQA | Autoregressive | Performance (%)15.2 | 6 | 1mo ago | |
| When2Call | AutoAdapt | Accuracy54.5 | 5 | 1mo ago |