Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SOTOPIA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Social DialogueSOTOPIA Interaction with GPT-4o
Goal Score8.21
28
Social DialogueSOTOPIA Self-Chat
GOAL8.56
28
Social Interaction EvaluationSOTOPIA-Hard GPT-4o-as-Partner
Goal Score7.68
24
Social Interaction EvaluationSOTOPIA GPT-4o-as-Partner
Goal Score8.75
24
Social Interaction EvaluationSOTOPIA-Hard (Self-Play)
GOAL Score8.06
24
Social Interaction EvaluationSOTOPIA (Self-Play)
Goal Score9.08
24
Social ReasoningSotopia hard
Rel Score2.4
17
Social InteractionSOTOPIA all social scenarios
Goal Score8.95
17
Social Intelligence AssessmentSOTOPIA hard episodes (test)
GOA Score7.21
16
Social ReasoningSotopia (all)
Rel Score2.73
15
Social DialogueSOTOPIA Overall (AVG)
AVG Score5.63
11
Social DialogueSOTOPIA Interaction with GPT-4o-mini
GOAL Score7.53
11
Social SimulationSotopia hard (test)
Goal Score7.59
8
Social SimulationSotopia standard (test)
Goal Score8.92
8
Showing 14 of 14 rows