Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning on SocialIQA

88.1Accuracy

HUMAN

37.888850.924463.9676.9956Mar 24, 2021Jan 21, 2022Nov 20, 2022Sep 19, 2023Jul 18, 2024May 17, 2025Mar 17, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2021.03
88.1--
2024.07
83.5--
2021.03
83.2--
2024.07
80.8--
2024.07
80.5--
2026.01
80.19--
2021.03
80--
79.8--
2024.07
79.2--
2026.01
79.02--
2024.07
78.8--
2024.05
77.2--
76.7--
2024.05
76.2--
2024.05
76--
2024.05
75.8--
2024.05
75.8--
2024.07
75.4--
2024.05
75.2--
2024.07
75.1--
2026.01
74.87--
2024.05
74.8--
2024.07
74.6--
2024.05
74--
2024.05
74--
2024.07
74--
2024.07
73.5--
2024.07
73.4--
2024.05
73.2--
2024.05
73--
2024.07
71.2--
2024.07
69.6--
2024.07
69.3--
2023.06
67.3--
2023.06
66.2--
2023.06
66--
2023.06
65.7--
2023.06
65.5--
2023.06
65.4--
2023.06
65.3--
2023.06
65.1--
2023.06
65.1--
2023.06
64.8--
2023.06
64.3--
2024.05
63.7--
2024.05
63.5--
2023.06
62.7--
2024.05
60.8--
2023.06
60.2--
2024.05
56.8--
2024.05
56.5--
2024.05
55.9--
2024.05
55.8--
2024.05
55.7--
2024.05
55.3--
2026.02
55.1--
2026.02
54.7--
2026.02
54.6--
2026.02
53.8--
2024.05
53.5--
2024.05
53.1--
2026.02
52.9--
2026.02
52.9--
2025.12
50.6--
2026.02
50.1--
2024.05
49.1--
2025.12
48.9--
2024.05
48.5--
2025.12
48.2--
2025.12
45.2--
2025.12
44.8--
2025.12
44.8--
2025.08
44.37--
2024.05
44.3--
2025.08
44.11--
2025.08
43.91--
2025.08
43.76--
2026.03
43.4--
2025.08
42.84--
2025.08
42.53--
2026.03
42.48--
2026.03
42.17--
2025.08
42.12--
2026.03
42.12--
2026.03
42.02--
2025.12
41.8--
2025.12
41.7--
2026.02
41.61--
2026.03
41.61--
2025.08
41.56--
2026.03
41.25--
2024.05
41.1--
2026.03
41.04--
2026.03
40.94--
2026.03
40.63--
2024.05
40.3--
2026.03
40.28--
2024.05
40.1--
2026.03
39.92--
2025.08
39.82--
Showing 100 of 131 rows