Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BigTOM

Benchmarks

Task NameDataset NameSOTA ResultTrend
Theory of MindBigToM
Accuracy98.67
48
Theory of Mind reasoningBigTOM (All)
Accuracy95.5
24
Theory of Mind reasoningBigTOM False Belief
Accuracy99.4
18
Showing 3 of 3 rows