| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | bAbI (test) | Mean Error0.032 | 54 | |
| sys-bAbI task | sys-bAbI original (test) | Gap7.95 | 22 | |
| Spatial Reasoning | bAbI (test) | Accuracy24 | 20 | |
| Question Answering | bAbI | Accuracy26.8 | 16 | |
| Question Answering | bAbI 10k (test) | Task 1: 1 Supporting Fact Error0 | 15 | |
| Question Answering | bAbI 1k (train test) | Task: 1 Supporting Fact Acc50 | 12 | |
| Question Answering | bAbI 10k 1.0 (test) | Mean Error Rate21 | 10 | |
| Question Answering | bAbI 1.0 (test) | Task 1 Accuracy0.4 | 10 | |
| Dialogue Response Generation | bAbI Dialogue Task 4 OOV | Per-response Accuracy100 | 9 | |
| Dialogue Response Generation | bAbI Dialogue Task 5 | Per-response Accuracy99.6 | 9 | |
| Dialogue Response Generation | bAbI Dialogue Task 3 | Accuracy (Per-response)96.3 | 9 | |
| Question Answering | bAbI 1K examples 1.0 (test) | Average Error Rate4.55 | 8 | |
| Dialog Reasoning | bAbI dialog tasks (test) | Issuing API calls Error Rate0 | 8 | |
| Reading Comprehension | bAbi 1K (test) | Maximum Accuracy90.1 | 7 | |
| Dialog | bAbI dialog | Average Error Rate1.5 | 7 | |
| Textual Question Answering | bAbI English 10k (test) | Failed Tasks Count (Error > 5%)0 | 7 | |
| Reasoning | BABI 2-Choice | Delta Accuracy28.9 | 6 | |
| Spatial Reasoning | bAbI original (test) | Task 17 Accuracy99.88 | 6 | |
| Question Answering | bAbI QA 1k | Failed Tasks Count5 | 6 | |
| Spatial reasoning | bAbI Task 1 qa1 50 samples | Accuracy19 | 5 | |
| Reasoning | bAbI (test) | Acc (Single Supporting Fact)96.27 | 5 | |
| Question Answering | bAbI 10K synthesized samples | Accuracy26.8 | 3 | |
| Question Answering | bAbi-Mix | Average Error Rate11.8 | 3 | |
| Question Answering | bAbI v1.2 (test) | Task 1: Single Supporting Fact1 | 2 |