Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning on PIQA

94.9Accuracy

HUMAN

79.61283.58187.5591.519May 28, 2020May 18, 2021May 9, 2022Apr 30, 2023Apr 19, 2024Apr 10, 2025Apr 1, 2026
Updated 8d ago

Evaluation Results

MethodLinks
2021.03
94.9-
2026.02
93.22-
2026.02
93.11-
2026.02
93.05-
2026.02
93.02-
2026.02
92.97-
2026.02
92.89-
2026.02
92.8-
2026.02
92.49-
2021.03
90.1-
2022.03
90.1-
2024.08
87.54-
2024.08
86.89-
2024.08
86.72-
2024.08
86.02-
2024.08
85.91-
2024.08
85.91-
2024.07
85.6-
2024.08
85.58-
2024.07
85.5-
2021.03
85.3-
2024.08
85.26-
2023.05
85-
2023.11
84.9-
2024.08
84.82-
2026.01
84.5-
2024.08
84.33-
84.2-
2023.05
83.9-
2024.07
83.8-
2024.01
83.6-
2023.05
83.2-
2024.08
83.19-
2023.11
83-
2024.07
83-
2026.02
83-
2020.05
82.8-
2023.02
82.8-
2023.11
82.8-
2026.01
82.7-
2024.01
82.6-
2025.03
82.5-
2026.01
82.5-
2026.01
82.4-
2023.02
82.3-
2023.02
82.3-
2023.11
82.3-
2023.09
82.3-
2023.05
82.2-
2024.01
82.2-
2024.01
82.2-
2024.04
82.2-
2024.08
82.15-
2026.01
82.1-
2022.03
82-
2023.11
82-
2026.02
82-
2023.11
81.9-
2022.03
81.8-
2022.03
81.8-
2023.02
81.8-
2023.02
81.8-
2023.11
81.8-
2023.11
81.8-
2024.08
81.72-
2024.08
81.54-
2024.07
81.5-
2023.02
81.4-
2026.01
81.4-
2024.04
81.2-
2026.02
81.12-
2022.10
81.1-
2026.01
81.1-
2022.03
81-
2023.02
81-
2023.11
81-
2023.09
81-
2024.07
81-
2024.02
80.85-
2024.02
80.85-
2024.01
80.8-
2026.02
80.74-
2024.01
80.7-
2026.04
80.7-
2024.04
80.6-
2026.02
80.6-
2024.06
80.52-
2020.05
80.5-
2020.05
80.5-
2023.02
80.5-
2023.11
80.5-
2024.04
80.5-
2025.03
80.5-
2025.12
80.5-
2026.02
80.47-
2026.02
80.36-
2023.11
80.3-
2026.01
80.3-
2024.08
80.2-
80.2-
Showing 100 of 751 rows
...