| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMToM-QA | UserHarness | Overall Accuracy98.5 | 44 | 6d ago | |
| MuMA-ToM | UserHarness | Accuracy95.89 | 40 | 6d ago | |
| BigTOM (All) | gpt-4 | Accuracy95.5 | 24 | 3mo ago | |
| ToMI False Belief | MeTHanol | Accuracy98.2 | 18 | 1mo ago | |
| BigTOM False Belief | MeTHanol | Accuracy99.4 | 18 | 1mo ago | |
| MMToM-QA Text-only | SymbolicToM | Belief Inference 1.11 | 17 | 22d ago | |
| MMToM-QA Multimodal | Belief Inference 1.195.8 | 14 | 22d ago | ||
| MMToM-QA Video-only | Belief Inference 1.169.1 | 13 | 22d ago | ||
| ToMI (All) | gpt-4 | Accuracy87.8 | 12 | 3mo ago | |
| Theory-of-Mind (ToM) classic tasks battery | OSL | Confidence Score100 | 7 | 3mo ago | |
| Theory-of-Mind Scenarios (test) | OSL | Confidence Score1 | 7 | 3mo ago | |
| COMMON-TOM 1.0 (test) | Total Accuracy80 | 7 | 3mo ago | ||
| medieval castle | 8B ↬ 405B | Belief Inference Accuracy85.6 | 6 | 22d ago | |
| wild west | 8B ↬ 405B | Belief Inference Acc85.3 | 6 | 22d ago | |
| outer space | 8B ↬ 405B | Belief Inference Accuracy87.2 | 6 | 22d ago | |
| ancient Egyptian | 8B ↬ 405B | Belief Inference Accuracy86 | 6 | 22d ago | |
| Andersen tales | 8B ↬ 405B | Belief Inference Accuracy85.8 | 6 | 22d ago | |
| apartment seen | 8B ↬ 405B | Belief Inference Accuracy87 | 6 | 22d ago |