Overcoming Multi-step Complexity in Multimodal Theory-of-Mind Reasoning: A Scalable Bayesian Planner
About
Theory-of-Mind (ToM) enables humans to infer mental states-such as beliefs, desires, and intentions-forming the foundation of social cognition. However, existing computational ToM methods rely on structured workflows with ToM-specific priors or deep model fine-tuning, which struggle with scalability in multimodal environments and fail to generalize as task complexity increases. To address these limitations, we propose a scalable Bayesian ToM planner that decomposes ToM reasoning into stepwise Bayesian updates. Our framework introduces weak-to-strong control, allowing smaller language models (LMs) to specialize in ToM-specific likelihood estimation and transfer their reasoning behaviors to larger LMs (7B to 405B) for integration with social and world knowledge. This synergistic approach aligns large-model inference of human mental states with Bayesian principles. Extensive experiments show that our method achieves a 4.6% accuracy improvement over state-of-the-art techniques on multimodal ToM benchmarks, including challenging unseen scenarios, thereby establishing a new standard for modeling human mental states in complex environments.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Theory of Mind reasoning | MMToM-QA Text-only | Belief Inference 1.10.901 | 17 | |
| Theory of Mind reasoning | MMToM-QA Multimodal | Belief Inference 1.192.1 | 14 | |
| Theory of Mind reasoning | MMToM-QA Video-only | -- | 13 | |
| Social interaction reasoning | MuMa-ToM | Belief Score94 | 11 | |
| Theory of Mind reasoning | apartment seen | Belief Inference Accuracy87 | 6 | |
| Theory of Mind reasoning | Andersen tales | Belief Inference Accuracy85.8 | 6 | |
| Theory of Mind reasoning | ancient Egyptian | Belief Inference Accuracy86 | 6 | |
| Theory of Mind reasoning | outer space | Belief Inference Accuracy87.2 | 6 | |
| Theory of Mind reasoning | wild west | Belief Inference Acc85.3 | 6 | |
| Theory of Mind reasoning | medieval castle | Belief Inference Accuracy85.6 | 6 |