Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SOTOPIA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Social DialogueSOTOPIA Interaction with GPT-4o
Goal Score8.21
28
Social DialogueSOTOPIA Self-Chat
GOAL8.56
28
Social DialogueSOTOPIA Overall (AVG)
AVG Score5.63
11
Social DialogueSOTOPIA Interaction with GPT-4o-mini
GOAL Score7.53
11
Social InteractionSOTOPIA all social scenarios
Goal Score7.62
5
Social Intelligence AssessmentSOTOPIA hard episodes (test)
SOC Score-0.02
4
Showing 6 of 6 rows