| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Commonsense Reasoning | SocialIQA | Accuracy88.1 | 158 | |
| Social Commonsense Reasoning | SocialIQA | Accuracy87.11 | 143 | |
| Question Answering | SocialIQA | Accuracy83.9 | 30 | |
| Social Interaction Question Answering | SocialIQA (test) | Accuracy75.49 | 28 | |
| Commonsense Question Answering | SocialIQA (SIQA) (val) | Accuracy70.7 | 24 | |
| Commonsense Reasoning | SOCIALIQA (dev) | Accuracy73.8 | 11 | |
| Multi-agent Question Answering | SocialIQA (first 300 questions) | Average Accuracy82.56 | 10 | |
| Question Answering | SocialIQA (test) | Accuracy78.1 | 10 | |
| Ranking correlation with full dataset evaluation | SocialIQA | Kendall Correlation0.81 | 10 | |
| Scaling Law Prediction | SocialIQA | MAE0.0088 | 7 | |
| Inference correction review (discard) | SocialIQA | MHA100 | 6 | |
| Preference alignment | SocialIQA | Preference Alignment87.3 | 5 | |
| Adaptivity | SocialIQA | Adaptivity75 | 4 | |
| Downstream accuracy extrapolation | SocialIQA | RMSE0.011 | 3 | |
| Inference correction review (reason) | SocialIQA | MHA100 | 2 | |
| Timing comparison | SocialIQA | MHA60 | 2 | |
| Event/state classification | SocialIQA | MHA94.7 | 2 | |
| Triplet classification | SocialIQA | MHA66.2 | 2 |