| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Selective Prediction | CommonsenseQA | Power0.9999 | 207 | |
| Question Answering | CommonsenseQA | Accuracy89.3 | 143 | |
| Commonsense Reasoning | CommonSenseQA | Accuracy91.2 | 132 | |
| Question Answering | CommonsenseQA (CSQA) | Accuracy91.2 | 124 | |
| Commonsense Question Answering | CommonSenseQA | Accuracy88.9 | 81 | |
| Question Answering | CommonsenseQA IH (test) | Accuracy88.9 | 57 | |
| Commonsense Reasoning | CommonSenseQA | BS0.1054 | 54 | |
| Question Answering | CommonsenseQA IH (dev) | Accuracy82.7 | 53 | |
| Commonsense Reasoning | CommonsenseQA (val) | Accuracy82.06 | 52 | |
| Hallucination Detection | CommonsenseQA | Mean AUROC0.7563 | 48 | |
| Commonsense Reasoning | CommonsenseQA (CSQA) v1.0 (test) | Accuracy64.11 | 46 | |
| Question Answering | CommonsenseQA (test) | Accuracy83.3 | 42 | |
| Commonsense Reasoning | CommonsenseQA (test) | Accuracy90 | 41 | |
| Commonsense Reasoning | CommonsenseQA (CSQA) | Accuracy79 | 38 | |
| Commonsense Reasoning | CommonsenseQA Non-Math | Accuracy87.31 | 32 | |
| Retrieval | CommonsenseQA | Accuracy86.81 | 25 | |
| Commonsense Question Answering | CommonsenseQA (CSQA) (val) | Accuracy75.7 | 23 | |
| Commonsense Question Answering | CommonsenseQA v1.0 (dev) | Accuracy79.3 | 22 | |
| Multiple-choice Question Answering | CommonsenseQA (CSQA) | Accuracy66.4 | 21 | |
| Veracity Inference | COMMONSENSEQA 1,000 examples | Mean Hamming Similarity0.935 | 20 | |
| Knowledge | CommonSenseQA CoQA | Score66.91 | 20 | |
| Commonsense Question Answering | CommonsenseQA blind v1.0 (test) | Accuracy75.3 | 20 | |
| Multiple-choice Question Answering | CommonsenseQA (dev) | Accuracy76.2 | 18 | |
| Question Answering | CommonsenseQA | PR-AUC0.595 | 16 | |
| Common sense | CommonsenseQA | Accuracy74 | 12 |