Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning on PIQA (test)

90.1Accuracy

UNICORN

53.315262.865172.41581.9649Oct 15, 2021Jul 4, 2022Mar 23, 2023Dec 10, 2023Aug 28, 2024May 17, 2025Feb 3, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2022.01
90.1
2022.01
83.19
2022.01
82.3
2022.01
81.99
2022.01
81.8
2023.01
81.07
2022.01
81
2022.01
80.96
2023.01
80.63
2022.01
80.5
2023.01
79.54
2023.01
79.54
2021.10
79.27
78.94
2023.04
76
2023.04
75.2
2023.04
73.9
2026.02
73.4
2026.02
73.4
2026.02
72.5
71.49
2023.04
71.1
2025.10
71
2021.10
70.84
2023.04
70.7
2025.10
69
2025.10
68
2025.10
67
2023.04
66.8
2021.10
66.32
2026.02
66.1
2026.02
63.7
63.44
2026.02
63.1
2021.10
62.89
2023.04
62.7
2021.10
60.45
2025.05
59.85
2023.04
59.5
2025.05
59.19
2025.05
59.03
2025.05
58.92
2025.05
58.27
2025.05
57.73
2021.10
57.45
2023.01
54.73