Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AutoToM: Scaling Model-based Mental Inference via Automated Agent Modeling

About

Theory of Mind (ToM), the ability to understand people's minds based on their behavior, is key to developing socially intelligent agents. Current approaches to ToM reasoning either rely on prompting Large Language Models (LLMs), which are prone to systematic errors, or use handcrafted, rigid agent models for model-based inference, which are more robust but fail to generalize across domains. In this work, we introduce AutoToM, an automated agent modeling method for scalable, robust, and interpretable mental inference. Given a ToM problem, AutoToM first proposes an initial agent model and then performs automated Bayesian inverse planning based on this model, leveraging an LLM backend. Guided by inference uncertainty, it iteratively refines the model by introducing additional mental variables and/or incorporating more timesteps in the context. Across five diverse benchmarks, AutoToM outperforms existing ToM methods and even large reasoning models. Additionally, we show that AutoToM can produce human-like confidence estimates and enable online mental inference for embodied decision-making.

Zhining Zhang, Chuanyang Jin, Mung Yao Jia, Shunchi Zhang, Tianmin Shu• 2025

Related benchmarks

TaskDatasetResultRank
Theory of MindHiToM
Accuracy72.5
64
Theory of MindToMi
Accuracy88.3
55
Theory of MindBigToM
Accuracy86.92
48
Theory of Mind reasoningMMToM-QA
Overall Accuracy83
44
Theory of Mind reasoningMuMa-ToM
Accuracy81.44
40
Question AnsweringHousehold (full)
Accuracy80.2
25
Question AnsweringGridWorld (full)
Accuracy57.3
22
Theory of Mind Question AnsweringMMToM-QA
Accuracy83
6
Theory of Mind Question AnsweringMuMa-ToM
Accuracy81.4
5
Showing 9 of 9 rows

Other info

Follow for update