Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Theory of Mind Reasoning on COMMON-TOM 1.0 (test)
Loading...
80
Total Accuracy
Human Performance
49.216
57.208
65.2
73.192
Mar 4, 2024
Total Accuracy
First Order Accuracy
Second Order Accuracy
Third Order Accuracy
Updated 3mo ago
Evaluation Results
Method
Method
Links
Total Accuracy
First Order Accuracy
Second Order Accuracy
Third Order Accuracy
Human Performance
2024.03
80
85
80
75
ReCoG
Base model=FLAN-T5
2024.03
71
70.4
71.3
71.2
Mistral-7B
Evaluation Protocol=Fi...
2024.03
64
64.8
63.9
63.2
gpt-4-0613
Evaluation Protocol=Ze...
2024.03
63.4
65.5
62.5
62.1
Mistral-7B-Instruct
Evaluation Protocol=Ze...
2024.03
60.6
63.3
60.5
58
gpt-3.5-turbo-0613
Evaluation Protocol=Ze...
2024.03
57
60.7
57.7
53
Random Baseline
2024.03
50.4
50.3
50.5
50.4
Feedback
Search any
task
Search any
task