| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Commonsense Reasoning | PIQA | Accuracy94.9 | 751 | |
| Physical Commonsense Reasoning | PIQA | Accuracy94.9 | 572 | |
| Question Answering | PIQA | Accuracy86.5 | 374 | |
| Physical Interaction Question Answering | PIQA | Accuracy94.9 | 333 | |
| Reasoning | PIQA | Accuracy96.5 | 145 | |
| Physical Commonsense Reasoning | PIQA (val) | Accuracy83 | 116 | |
| Physical Commonsense Reasoning | PIQA | Accuracy85.91 | 78 | |
| Physical Reasoning | PIQA | Accuracy81.34 | 74 | |
| Common Sense Reasoning | PIQA | Accuracy83 | 71 | |
| Zero-shot Reasoning | PIQA | PIQA Zero-shot Accuracy80.9 | 62 | |
| Physical Commonsense Reasoning | PIQA | Accuracy7,497 | 56 | |
| Commonsense reasoning | PIQA 1.0 (test) | Accuracy82.21 | 48 | |
| Commonsense Reasoning | PIQA (test) | Accuracy90.1 | 46 | |
| Physical Commonsense Reasoning | PiQA | Accuracy76.56 | 45 | |
| Question Answering | PiQA | Accuracy81.77 | 36 | |
| Physical Reasoning | PIQA | Accuracy91.3 | 34 | |
| Zero-shot Accuracy | PIQA | Zero-shot PIQA Accuracy81.5 | 30 | |
| Inactive Attention Head Identification | PIQA | Percentage of Heads Zeroed31.3 | 28 | |
| Commonsense reasoning | PIQA (out-of-domain) | Accuracy70.84 | 25 | |
| Physical Commonsense Reasoning | PIQA | Delta Accuracy0 | 24 | |
| Physical Commonsense Reasoning | PIQA (test) | Accuracy90.7 | 24 | |
| Physical Reasoning | PIQA | Accuracy82.21 | 20 | |
| Common Sense Reasoning | PIQA (dev) | Accuracy83.2 | 19 | |
| Correctness Prediction | PIQA | Accuracy79.64 | 18 | |
| Physical Commonsense Reasoning | PIQA | Mean Per-Step Regret0.152 | 15 |