Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SocialIQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Commonsense ReasoningSocialIQA
Accuracy88.1
158
Social Commonsense ReasoningSocialIQA
Accuracy87.11
143
Question AnsweringSocialIQA
Accuracy83.9
30
Social Interaction Question AnsweringSocialIQA (test)
Accuracy75.49
28
Commonsense Question AnsweringSocialIQA (SIQA) (val)
Accuracy70.7
24
Commonsense ReasoningSOCIALIQA (dev)
Accuracy73.8
11
Multi-agent Question AnsweringSocialIQA (first 300 questions)
Average Accuracy82.56
10
Question AnsweringSocialIQA (test)
Accuracy78.1
10
Ranking correlation with full dataset evaluationSocialIQA
Kendall Correlation0.81
10
Scaling Law PredictionSocialIQA
MAE0.0088
7
Inference correction review (discard)SocialIQA
MHA100
6
Preference alignmentSocialIQA
Preference Alignment87.3
5
AdaptivitySocialIQA
Adaptivity75
4
Downstream accuracy extrapolationSocialIQA
RMSE0.011
3
Inference correction review (reason)SocialIQA
MHA100
2
Timing comparisonSocialIQA
MHA60
2
Event/state classificationSocialIQA
MHA94.7
2
Triplet classificationSocialIQA
MHA66.2
2
Showing 18 of 18 rows