| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | CQA | Accuracy83.1 | 25 | |
| Reasoning | CQA | Accuracy67.32 | 12 | |
| Information Retrieval | cQA out-of-domain (test) | MAP@10052.4 | 8 | |
| Commonsense Reasoning | CQA (evaluation) | Accuracy79.2 | 8 | |
| Information Retrieval | cQA Scifi | MAP@10054.1 | 7 | |
| Information Retrieval | cQA Gaming | MAP@10051.5 | 7 | |
| Information Retrieval | cQA English | MAP@10054.3 | 7 | |
| Information Retrieval | cQA Apple | MAP@10030.7 | 7 | |
| Community Question Answering | cQA Scifi domain StackExchange (test) | MAP@10064.1 | 7 | |
| Community Question Answering | cQA Gaming domain StackExchange (test) | MAP@1000.592 | 7 | |
| Community Question Answering | cQA English domain StackExchange (test) | MAP@1000.606 | 7 | |
| Community Question Answering | cQA Apple domain StackExchange (test) | MAP@10037.8 | 7 | |
| Chain-of-Thought Generation | CQA (test) | GPT-4 Score4.11 | 6 | |
| Question Answering | CQA | Accuracy (GPT-2-Small)43 | 4 | |
| Commonsense Question Answering | CQA (test) | Accuracy43.9 | 3 | |
| Commonsense Question Answering | CQA | ECE11.75 | 2 |