Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Physical Commonsense Reasoning on PIQA (Accuracy and Performance Gain)

7,497Accuracy

HPTQ

-244.0841,765.62053,775.3255,785.0295Jul 24, 2025Aug 24, 2025Sep 25, 2025Oct 27, 2025Nov 27, 2025Dec 29, 2025Jan 30, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2025.07
7,497-
2026.01
919.7
2026.01
90.99.7
2026.01
89.57.3
2026.01
89.27
2026.01
88.87.6
2026.01
88.67.4
2026.01
88.411.7
2026.01
88.411.7
2026.01
88.28.8
2026.01
87.88.3
2026.01
82.615.9
2026.01
82.515.8
2025.07
80.52-
2025.07
80.14-
2025.07
79.92-
2025.07
79.65-
2025.07
79.54-
2025.07
79.49-
2025.07
79.49-
2025.07
79.27-
2025.07
79.16-
2025.07
79.11-
2025.07
79.05-
2025.07
78.78-
2025.07
78.73-
2025.07
77.97-
2025.07
77.37-
2025.07
76.82-
2025.07
76.55-
2025.07
76.33-
2025.07
76.28-
2025.07
76.17-
2025.07
76.12-
2025.07
75.73-
2025.07
75.57-
2025.07
75.52-
2025.07
75.03-
2025.07
75.03-
2025.07
74.92-
2025.07
74.43-
2025.07
72.58-
2025.07
72.58-
2025.07
69.15-
2025.07
69.15-
2025.07
66.49-
2025.07
65.61-
2025.07
65.56-
2025.07
61.04-
2025.07
60.01-
2025.07
58.81-
2025.07
58.49-
2025.07
58.22-
2025.07
54.95-
2025.07
54.46-
2025.07
53.65-