| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Natural Language Inference | E-SNLI | Accuracy91.31 | 46 | |
| Multiple Choice Classification | e-SNLI | Accuracy89.6 | 16 | |
| Natural Language Inference | e-SNLI (test) | Accuracy94 | 9 | |
| Logical Refinement of Natural Language Explanations | e-SNLI | Initial Performance41 | 8 | |
| Natural Language Explanation Generation | e-SNLI | Human Evaluation Score50 | 7 | |
| Explanation Generation | e-SNLI (out-domain) | Grammar Score2.98 | 7 | |
| Natural Language Inference | e-SNLI abundant | Accuracy88.8 | 6 | |
| Natural Language Inference | e-SNLI (medium) | Accuracy87.5 | 6 | |
| Natural Language Inference | e-SNLI scarce | Accuracy86.3 | 6 | |
| Natural Language Explanation Generation | e-SNLI (test) | Accuracy86.66 | 6 | |
| Chain-of-Thought Generation | e-SNLI (test) | GPT-4 Score3.49 | 6 | |
| Natural Language Inference | e-SNLI | ECE4.35 | 4 | |
| Natural Language Explanation Generation | e-SNLI 60-shot | Accuracy40.1 | 3 |