Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Physical Commonsense Reasoning on PIQA (test)

90.7Accuracy

UL20B

48.63259.553570.47581.3965May 10, 2022Jan 9, 2023Sep 10, 2023May 12, 2024Jan 11, 2025Sep 12, 2025May 15, 2026
Updated 16d ago

Evaluation Results

MethodLinks
2022.05
90.7
2022.10
90.1
2022.05
90.1
2022.10
89.5
2022.10
87.41
2022.10
85.9
2024.03
84.66
2026.05
84.6
2026.05
84.44
2026.05
84.22
2026.05
84.22
2026.05
84.11
2026.05
84.06
2023.07
82.8
2023.07
82.8
2023.07
82.4
2023.07
82.3
2023.07
81.9
2023.07
81.9
2024.03
80.74
2023.07
80.6
2023.07
80.5
2023.07
80.1
2023.07
79.8
2022.10
79.4
2026.05
79.39
2023.07
78.8
2024.03
78.07
2026.04
77.8
2026.04
77.67
2026.05
77.5
2026.05
77.4
2026.05
77.38
2023.07
76.7
2026.04
76.5
2026.04
76.17
2026.04
75.6
2024.03
75.14
2026.05
75.14
2026.05
75.14
2026.05
75.13
2022.10
74.55
2026.04
67.68
2026.04
66.92
2026.04
65.79
2026.04
65.77
2026.04
64.87
2026.05
63.11
2026.04
53.47
2026.04
52.88
2026.04
52.88
2026.04
52.1
2026.04
51.99
2026.04
51.47
2026.04
51.32
2026.04
50.84
2026.04
50.58
2026.04
50.36
2026.04
50.25