Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Physical Commonsense Reasoning on PIQA (test)

90.7Accuracy

UL20B

73.90478.264582.62586.9855May 10, 2022Sep 2, 2022Dec 26, 2022Apr 20, 2023Aug 13, 2023Dec 6, 2023Mar 30, 2024
Updated 1mo ago

Evaluation Results

MethodLinks
2022.05
90.7
2022.10
90.1
2022.05
90.1
2022.10
89.5
2022.10
87.41
2022.10
85.9
2024.03
84.66
2023.07
82.8
2023.07
82.8
2023.07
82.4
2023.07
82.3
2023.07
81.9
2023.07
81.9
2024.03
80.74
2023.07
80.6
2023.07
80.5
2023.07
80.1
2023.07
79.8
2022.10
79.4
2023.07
78.8
2024.03
78.07
2023.07
76.7
2024.03
75.14
2022.10
74.55