| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Commonsense Reasoning | ECQA | MV Score77.4 | 27 | |
| Reasoning | ECQA | CACC83.96 | 25 | |
| Question Answering | ECQA | Accuracy70.34 | 12 | |
| Natural Language Explanation Generation | ECQA | Human Evaluation Score73.33 | 7 | |
| Commonsense Question Answering | ECQA (test) | Accuracy79.7 | 7 | |
| Explanation Generation | ECQA (out-domain) | Grammar Score2.99 | 7 | |
| Natural Language Explanation Generation | ECQA (test) | Accuracy59.4 | 6 | |
| Explanation Generation | ECQA complete (test) | BERTScore87.67 | 6 | |
| Explanation self-consistency | ECQA (test) | Accuracy71.11 | 4 | |
| Open-Label QA | ECQA | COS-E0.398 | 4 | |
| CoT Soundness Evaluation | ECQA | CSR87 | 3 | |
| CoT Naturalness | ECQA | Perplexity (PPL)20.15 | 3 | |
| Commonsense Reasoning | ECQA | Pass@10.7612 | 3 | |
| Natural Language Explanation Generation | ECQA few-shot 60-shot | Accuracy24.53 | 3 | |
| Commonsense Question Answering | ECQA | Performance Score (Finetune Baseline vs Predict Baseline)57.2 | 2 |