Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Social Commonsense Reasoning on SocialIQA

87.11Accuracy

Hard-routing MoE

41.682853.476465.2777.0636Jan 30, 2026Feb 9, 2026Feb 19, 2026Mar 1, 2026Mar 11, 2026Mar 21, 2026Mar 31, 2026
Updated 18d ago

Evaluation Results

MethodLinks
2026.02
87.11--
2026.02
85.52--
2026.02
83.96--
2026.02
83.74--
2026.02
83.4--
2026.02
83.04--
2026.03
82.91--
2026.03
82.69--
2026.03
81.63--
2026.02
80.99--
80.6--
80.55--
2026.01
79.9--
2026.01
79.8--
2026.03
79.63--
2026.03
79.53--
2026.02
79.49--
2026.01
79.220-
2026.01
79.220-
2026.02
79.2--
2026.02
79.16--
2026.01
78.820-
2026.01
78.820-
2026.01
78.8--
2026.01
78.8--
2026.02
78.64--
2026.03
78.4--
2026.02
78.38--
2026.03
77.74--
2026.03
77.74--
2026.03
77.53--
2026.03
77.28--
2026.03
76.15--
2026.03
76--
2026.01
75.9--
2026.01
75.8--
2026.01
75.8--
2026.01
75.8--
2026.03
75.74--
2026.03
75.74--
2026.03
75.08--
2026.03
74.51--
2026.03
74.51--
2026.01
74--
2026.01
74--
2026.01
73.4-10-
2026.01
73.4-10-
2026.03
73.23--
2026.03
73.18--
2026.03
73.13--
2026.03
73.08--
2026.03
72.82--
2026.01
72.2--
2026.01
72.2--
2026.03
72.01--
2026.01
70.70-
2026.01
70.6-20-
2026.01
70.220-
2026.01
70.220-
2026.01
69.4--
2026.01
69.4--
2026.01
68.930-
2026.01
68.930-
2026.01
68.140-
2026.01
6830-
2026.01
64.910-
2026.01
64.910-
2026.01
64.7-20-
2026.01
64.7-20-
2026.01
62.70-
2026.01
62.70-
2026.01
62.6-10-
2026.01
62.6-10-
2026.01
62.5--
2026.01
62.5--
2026.01
61.1-10-
2026.01
61.1-10-
2026.01
60.8-30-
2026.01
60.8-30-
2026.01
59.910-
2026.01
59.920-
2026.01
58--
2026.01
57.9--
2026.01
55.440-
2026.01
55.110-
2026.01
52-30-
2026.01
52-30-
2026.01
52--
2026.01
52--
2026.01
48.550-
2026.01
48.550-
2026.01
48.110-
2026.01
48.110-
2026.03
47.51--
46.21--
2026.03
45.02--
2026.03
44.36--
2026.03
43.58--
2026.03
43.57--
2026.03
43.43--
Showing 100 of 110 rows