Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Physical Commonsense Reasoning on PIQA (Accuracy, AVG, and Delta)
Loading...
81.8
Accuracy
Phi-4 14B (w/ LoopUS)
71.816
74.408
77
79.592
May 10, 2026
Accuracy
Average Score
Performance Delta
Updated 21d ago
Evaluation Results
Method
Method
Links
Accuracy
Average Score
Performance Delta
Phi-4 14B (w/ LoopUS)
Model=Phi-4 14B, Setti...
2026.05
81.8
68.6
1.7
Phi-4 14B (w/o LoopUS)
Model=Phi-4 14B, Setti...
2026.05
80.7
67
-
Qwen 8B (w/ LoopUS)
Model=Qwen 8B, Setting...
2026.05
78.9
65.4
2.2
Qwen 4B (w/ LoopUS)
Model=Qwen 4B, Setting...
2026.05
76.8
62.1
1.8
Qwen 8B (w/o LoopUS)
Model=Qwen 8B, Setting...
2026.05
76.3
63.2
-
Qwen 4B (w/o LoopUS)
Model=Qwen 4B, Setting...
2026.05
75
60.3
-
Qwen 1.7B (w/ LoopUS)
Model=Qwen 1.7B, Setti...
2026.05
73.3
55.3
1.6
Qwen 1.7B (w/o LoopUS)
Model=Qwen 1.7B, Setti...
2026.05
72.2
53.7
-
Feedback
Search any
task
Search any
task