Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Physical Commonsense Reasoning on PIQA

92.93Accuracy

BF16

78.962882.588986.21589.8411Oct 6, 2022Apr 30, 2023Nov 22, 2023Jun 15, 2024Jan 7, 2025Aug 1, 2025Feb 24, 2026
Updated 2d ago

Evaluation Results

MethodLinks
2026.02
92.93--
2026.02
92.44-0.49-
2026.02
91.46--
2026.02
91.29-1.64-
2026.02
91.29-1.64-
2026.02
91.19-0.27-
2026.02
90.46--
2026.02
89.83-1.63-
2026.01
89.7--
2026.01
89.6--
2026.02
89.39-2.07-
2026.01
89.3--
2026.01
88.8--
2026.01
88.7--
2024.09
88.4--
2024.09
88.3--
2026.01
88.2--
2026.02
88.17--
2026.01
88.1--
2026.01
87.8--
2024.09
87.7--
2026.02
87.53--
2026.02
87.28--
2026.02
87.17--
2026.02
86.94--
2026.02
86.86--
2024.09
86.6--
2026.02
86.47--
2026.02
86.44--
2026.02
85.74--
2026.02
85.65--
2026.01
85.4--
2025.12
85.31--
2024.09
85.3--
2024.02
85.3--
2026.01
85.3--
2026.02
85.3--
2025.12
84.98--
2025.12
84.98--
2026.02
84.87--
2026.02
84.7--
2026.02
84.6--
2024.02
84.3--
2026.02
83.88--
2026.01
83.8--
2026.01
83.6--
2024.09
83.5--
2026.01
83.5--
2026.02
83.36--
2026.02
83.22--
2023.05
83.2--
2023.05
83.19--
2026.01
82.6--
2026.01
82.5--
2026.02
82.39--
2026.01
82.2--
2026.01
82--
2026.01
82--
2024.09
81.9--
2026.02
81.69--
2026.02
81.48--
2025.12
81.45--
2025.12
81.3--
2025.12
81.22--
2025.05
81.12--
2025.05
81.07--
2022.10
81--
2024.09
81--
2025.05
80.96--
2025.05
80.89--
2025.12
80.85--
2025.05
80.79--
2025.05
80.74--
2025.05
80.74--
2025.12
80.74--
2026.02
80.72--
2025.11
80.69--
2025.12
80.63--
2025.12
80.63--
2026.01
80.6--
2024.09
80.6--
2025.12
80.52--
2024.12
80.47--
2025.10
80.3-0.4
2025.10
80.3-0.4
2023.05
80.2--
2025.12
80.1--
2024.09
80.1--
2025.12
80.09--
2024.09
80--
2025.10
79.9-0
2024.09
79.9--
2025.12
79.86--
2025.12
79.8--
2024.09
79.8--
2025.12
79.7--
2023.05
79.54--
2023.05
79.5--
2023.05
79.5--
2025.12
79.5--
Showing 100 of 329 rows