| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Commonsense Reasoning | PIQA | Accuracy94.9 | 647 | |
| Physical Commonsense Reasoning | PIQA | Accuracy92.93 | 329 | |
| Physical Interaction Question Answering | PIQA | Accuracy94.9 | 323 | |
| Reasoning | PIQA | Accuracy96.5 | 133 | |
| Physical Commonsense Reasoning | PIQA (val) | Accuracy83 | 113 | |
| Question Answering | PIQA | Accuracy81.8 | 83 | |
| Commonsense reasoning | PIQA 1.0 (test) | Accuracy82.21 | 48 | |
| Commonsense Reasoning | PIQA (test) | Accuracy90.1 | 46 | |
| Physical Reasoning | PIQA | Accuracy81.34 | 44 | |
| Physical Commonsense Reasoning | PIQA | Accuracy81.23 | 41 | |
| Physical Reasoning | PIQA | Accuracy91.3 | 34 | |
| Zero-shot Reasoning | PIQA | PIQA Zero-shot Accuracy80.9 | 31 | |
| Zero-shot Accuracy | PIQA | Zero-shot PIQA Accuracy81.5 | 30 | |
| Commonsense reasoning | PIQA (out-of-domain) | Accuracy70.84 | 25 | |
| Physical Commonsense Reasoning | PIQA | Delta Accuracy0 | 24 | |
| Physical Commonsense Reasoning | PIQA (test) | Accuracy90.7 | 24 | |
| Physical Reasoning | PIQA | Accuracy82.21 | 20 | |
| Correctness Prediction | PIQA | Accuracy79.64 | 18 | |
| Physical Commonsense Reasoning | PIQA | Mean Per-Step Regret0.152 | 15 | |
| Question Answering | PiQA | Accuracy81.77 | 15 | |
| Question Answering | PIQA out-of-domain | ROUGE-L19.1 | 14 | |
| Physical Commonsense Reasoning | PIQA | Accuracy91 | 12 | |
| Reasoning | PIQA | Accuracy Improvement2.05 | 12 | |
| Question Answering | PIQA | Accuracy (Baseline)77.31 | 11 | |
| Common Sense Reasoning | PIQA (dev) | Accuracy83.2 | 11 |