| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Social Dialogue | SOTOPIA Interaction with GPT-4o | Goal Score8.21 | 28 | |
| Social Dialogue | SOTOPIA Self-Chat | GOAL8.56 | 28 | |
| Social Interaction Evaluation | SOTOPIA-Hard GPT-4o-as-Partner | Goal Score7.68 | 24 | |
| Social Interaction Evaluation | SOTOPIA GPT-4o-as-Partner | Goal Score8.75 | 24 | |
| Social Interaction Evaluation | SOTOPIA-Hard (Self-Play) | GOAL Score8.06 | 24 | |
| Social Interaction Evaluation | SOTOPIA (Self-Play) | Goal Score9.08 | 24 | |
| Social Reasoning | Sotopia hard | Rel Score2.4 | 17 | |
| Social Interaction | SOTOPIA all social scenarios | Goal Score8.95 | 17 | |
| Social Intelligence Assessment | SOTOPIA hard episodes (test) | GOA Score7.21 | 16 | |
| Social Reasoning | Sotopia (all) | Rel Score2.73 | 15 | |
| Social Dialogue | SOTOPIA Overall (AVG) | AVG Score5.63 | 11 | |
| Social Dialogue | SOTOPIA Interaction with GPT-4o-mini | GOAL Score7.53 | 11 | |
| Social Simulation | Sotopia hard (test) | Goal Score7.59 | 8 | |
| Social Simulation | Sotopia standard (test) | Goal Score8.92 | 8 |