Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Theory-of-Mind Reasoning on Theory-of-Mind Scenarios (test)
Loading...
1
Confidence Score
OSL
0.922
0.94225
0.9625
0.98275
Mar 2, 2026
Confidence Score
Accuracy
Updated 3mo ago
Evaluation Results
Method
Method
Links
Confidence Score
Accuracy
OSL
Test Scenario=Sally-An...
2026.03
1
-
OSL
Test Scenario=Sally-An...
2026.03
1
-
OSL
Test Scenario=Multiple...
2026.03
1
-
OSL
Test Scenario=False Ph...
2026.03
1
-
OSL
Test Scenario=Temporal...
2026.03
0.975
-
OSL
Test Scenario=Nested B...
2026.03
0.95
-
OSL
Test Scenario=Appearan...
2026.03
0.925
-
Feedback
Search any
task
Search any
task