Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Theory of Mind reasoning on MMToM-QA

98.5Overall Accuracy

UserHarness

31.4248.83566.2583.665Mar 25, 2026Apr 4, 2026Apr 14, 2026Apr 25, 2026May 5, 2026May 15, 2026May 26, 2026
Updated 6d ago

Evaluation Results

MethodLinks
2026.05
98.5--
2026.05
98.33--
2026.05
98.17--
2026.05
98.17--
2026.05
98--
2026.05
97.83--
2026.05
97.83--
2026.05
97.67--
2026.05
97.5--
2026.05
97.33--
2026.05
97--
2026.05
96.67--
2026.05
83--
2026.05
69.83--
2026.05
67.5--
2026.03
66.370.762
2026.05
64.17--
2026.05
63.33--
2026.05
62.83--
2026.03
61.26458.3
2026.03
60.564.356.7
2026.05
56.17--
2026.05
55.33--
2026.05
51--
2026.05
48.83--
2026.05
48.67--
2026.05
48--
2026.05
47.33--
2026.05
47.33--
2026.05
47--
2026.05
47--
2026.05
46.67--
2026.05
45.67--
2026.05
45.17--
2026.05
44.33--
2026.05
43.5--
2026.05
43.33--
2026.05
39.67--
2026.03
38.24729.3
2026.05
37.5--
2026.05
37.17--
2026.05
37.17--
2026.05
35--
2026.05
34--