Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Social Reasoning on MotiveBench OOD (test)

0.9011Amazon Score

GPT-4o

0.002020.2354350.468850.702265Jan 22, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
0.90110.77440.77440.8166-
2026.01
0.82220.70330.70270.74273.3
2026.01
0.81880.66550.67270.719-
2026.01
0.79880.69110.68660.72540.89
2026.01
0.79660.69550.71380.73532.26
2026.01
0.76770.67660.68660.71034.3
2026.01
0.75440.69110.66880.70473.48
2026.01
0.74880.67110.66610.69532.1
2026.01
0.74330.63660.66330.681-
2026.01
0.74220.66440.63110.67925.77
2026.01
0.73550.73880.63110.701819.76
2026.01
0.730.72440.60610.6868-
2026.01
0.72440.60880.59330.6421-
2026.01
0.67440.65880.58330.63886.64
2026.01
0.65110.57110.580.599-
2026.01
0.58660.64990.52160.586-
2026.01
0.03660.04880.02660.0373-