Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning on SIQA

89.85Accuracy

In-Squeeze

41.85454.314566.77579.2355Sep 20, 2023Mar 1, 2024Aug 11, 2024Jan 21, 2025Jul 3, 2025Dec 13, 2025May 26, 2026
Updated 23h ago

Evaluation Results

MethodLinks
2026.02
89.85--
2026.02
89.77--
2026.02
89.35--
2026.02
89.35--
2026.02
89.33--
2026.02
88.78--
2026.02
88.13--
2026.02
87.87--
2026.04
81.78--
2025.09
81.49--
2025.09
81.35--
2026.04
81.15--
2025.09
81.03--
2025.09
81--
2025.09
80.81--
2026.04
80.71--
2026.04
80.45--
2026.04
79.9--
2026.04
79.9--
2025.09
79.84--
2025.09
79.84--
2026.04
79.53--
2026.04
79.53--
2026.04
79.5--
2025.09
79.43--
2025.09
78.92--
2026.04
76--
2025.09
75.54--
2025.09
75.37--
2025.09
74.75--
2025.09
73.95--
2025.09
73.83--
2025.09
73.47--
2025.09
73.39--
2025.09
73.39--
2025.09
73.34--
2024.02
72.129-
2024.02
72.129-
2024.02
70.624.5-
2024.02
70.630.3-
2024.02
70.624.5-
2024.02
70.630.3-
2024.02
70.228.2-
2024.02
70.228.2-
2024.02
69.227.8-
2024.02
6920.5-
2024.02
68.925-
2024.02
68.8--
2026.04
68.5--
2024.02
67.833.5-
2024.02
66.537.5-
2024.02
66.536.8-
2024.02
6537.3-
2025.05
64.8--
2024.02
62.243.1-
2025.05
61.5--
2025.05
61.4--
2025.05
59.5--
2025.05
56.5--
2026.05
54.3--
2024.07
53.7--
2026.02
53.5--
2026.05
52.9--
2024.07
52.2--
2026.02
52.2--
2024.04
51.8--
2024.07
51.8--
2024.07
51.6--
2024.04
50.3--
2026.05
50.2--
2025.05
49.9--
2024.04
49.6--
2024.07
49.5--
2026.05
49--
2023.09
48.9--
2023.09
48.8--
2024.04
48.5--
2024.04
48.3--
2024.07
48.2--
2026.02
47.8--
2023.09
47.5--
2026.02
47.5--
2026.05
47.4--
2026.02
47.1--
2024.04
47--
2026.02
46.9--
2026.05
46.8--
2026.05
46.78--
2026.02
46.7--
2026.01
46.7--
2026.02
45.7--
2026.05
45.5--
2026.01
45.2--
2026.02
45--
2026.02
44.8--
2026.02
44.7--
2026.02
44.6--
2026.02
44.5--
2026.02
44.1--
2026.02
43.7--
Showing 100 of 192 rows