Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Physical Commonsense Reasoning on PIQA (Accuracy and Performance Gain)
Loading...
91
Accuracy
Llama 3.1 8B
82.16
84.455
86.75
89.045
Jan 30, 2026
Accuracy
Performance Gain
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Performance Gain
Llama 3.1 8B
Classifier=Self-labeled
2026.01
91
9.7
Llama 3.1 8B
Classifier=Majority-la...
2026.01
90.9
9.7
Mistral Nemo Base 2407
Classifier=Self-labeled
2026.01
89.5
7.3
Mistral Nemo Base 2407
Classifier=Majority-la...
2026.01
89.2
7
Mistral 7B v0.3
Classifier=Self-labeled
2026.01
88.8
7.6
Mistral 7B v0.3
Classifier=Majority-la...
2026.01
88.6
7.4
Llama 3.2 1B
Classifier=Self-labeled
2026.01
88.4
11.7
Llama 3.2 1B
Classifier=Majority-la...
2026.01
88.4
11.7
Qwen3 8B Base
Classifier=Self-labeled
2026.01
88.2
8.8
Qwen3 8B Base
Classifier=Majority-la...
2026.01
87.8
8.3
Qwen3 0.6B
Classifier=Self-labeled
2026.01
82.6
15.9
Qwen3 0.6B
Classifier=Majority-la...
2026.01
82.5
15.8
Feedback
Search any
task
Search any
task