Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Theory of Mind reasoning on BigTOM (All)

95.5Accuracy

gpt-4

46.744859.402472.0684.7176Nov 16, 2023Feb 18, 2024May 22, 2024Aug 25, 2024Nov 27, 2024Mar 1, 2025Jun 4, 2025
Updated 3mo ago

Evaluation Results

MethodLinks
2023.11
95.5
2023.11
95
2023.11
92.5
2025.06
85.2
2025.06
84.8
2025.06
84.4
2025.06
82.4
81.8
2023.11
81.62
2025.06
81.6
2025.06
80.8
2025.06
80.6
2025.06
76.6
2025.06
76.2
2023.11
75.88
2025.06
75.4
2025.06
73.8
2023.11
66.38
2023.11
58
2023.11
57.25
2023.11
56
2023.11
53.62
2023.11
51.38
2023.11
48.62