Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning on SocialIQA

88.1Accuracy

HUMAN

39.37652.025564.67577.3245Mar 24, 2021Jan 31, 2022Dec 11, 2022Oct 21, 2023Aug 30, 2024Jul 10, 2025May 20, 2026
Updated 7d ago

Evaluation Results

MethodLinks
2021.03
88.1--
2024.07
83.5--
2021.03
83.2--
80.86--
2024.07
80.8--
2024.07
80.5--
2026.01
80.19--
2021.03
80--
79.8--
2024.07
79.2--
2026.01
79.02--
2024.07
78.8--
2026.04
77.43--
2024.05
77.2--
76.7--
2026.04
76.56--
2024.05
76.2--
2024.05
76--
2024.05
75.8--
2024.05
75.8--
2024.07
75.4--
2024.05
75.2--
2024.07
75.1--
2026.01
74.87--
2024.05
74.8--
2024.07
74.6--
2026.04
74.51--
2024.05
74--
2024.05
74--
2024.07
74--
2024.07
73.5--
2024.07
73.4--
2026.04
73.39--
2024.05
73.2--
2026.04
73.08--
2024.05
73--
2024.07
71.2--
2024.07
69.6--
2024.07
69.3--
2023.06
67.3--
2023.06
66.2--
2023.06
66--
2023.06
65.7--
2023.06
65.5--
2023.06
65.4--
2023.06
65.3--
2023.06
65.1--
2023.06
65.1--
2023.06
64.8--
2023.06
64.3--
2024.05
63.7--
2024.05
63.5--
2023.06
62.7--
2024.05
60.8--
2023.06
60.2--
2024.05
56.8--
2024.05
56.5--
2024.05
55.9--
2024.05
55.8--
2024.05
55.7--
2024.05
55.3--
2026.02
55.1--
2026.02
54.7--
2026.02
54.6--
2026.02
53.8--
2024.05
53.5--
2024.05
53.1--
2026.02
52.9--
2026.02
52.9--
2025.12
50.6--
2026.02
50.1--
2024.05
49.1--
2025.12
48.9--
2024.05
48.5--
2025.12
48.2--
2025.12
45.2--
2025.12
44.8--
2025.12
44.8--
2025.08
44.37--
2024.05
44.3--
2025.08
44.11--
2025.08
43.91--
2025.08
43.76--
2026.03
43.4--
2026.05
42.94--
2025.08
42.84--
2025.08
42.53--
2026.03
42.48--
2026.03
42.17--
2025.08
42.12--
2026.03
42.12--
2026.03
42.02--
2025.12
41.8--
2026.05
41.76--
2025.12
41.7--
2026.02
41.61--
2026.03
41.61--
2026.05
41.61--
2025.08
41.56--
2026.03
41.25--
Showing 100 of 173 rows