Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ToMBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Social ReasoningToMBench Hard (val)
Accuracy62.79
26
Social ReasoningToMBench
Accuracy78.34
26
Theory of MindTomBench OOD
Emotion75.24
17
Theory of MindToMBench
Accuracy81.8
9
Showing 4 of 4 rows