Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Theory-of-Mind Reasoning on Theory-of-Mind (ToM) classic tasks battery
Loading...
100
Confidence Score
OSL
92.2
94.225
96.25
98.275
Mar 2, 2026
Confidence Score
OSL Accuracy
Updated 3mo ago
Evaluation Results
Method
Method
Links
Confidence Score
OSL Accuracy
OSL
Scenario=Sally–Anne (b...
2026.03
100
-
OSL
Scenario=Sally–Anne wi...
2026.03
100
-
OSL
Scenario=Multiple objects
2026.03
100
-
OSL
Scenario=False photograph
2026.03
100
-
OSL
Scenario=Temporal beli...
2026.03
97.5
-
OSL
Scenario=Nested belief...
2026.03
95
-
OSL
Scenario=Appearance–re...
2026.03
92.5
-
Feedback
Search any
task
Search any
task