Reasoning Like Program Executors
About
Reasoning over natural language is a long-standing goal for the research community. However, studies have shown that existing language models are inadequate in reasoning. To address the issue, we present POET, a novel reasoning pre-training paradigm. Through pre-training language models with programs and their execution results, POET empowers language models to harvest the reasoning knowledge possessed by program executors via a data-driven approach. POET is conceptually simple and can be instantiated by different kinds of program executors. In this paper, we showcase two simple instances POET-Math and POET-Logic, in addition to a complex instance, POET-SQL. Experimental results on six benchmarks demonstrate that POET can significantly boost model performance in natural language reasoning, such as numerical reasoning, logical reasoning, and multi-hop reasoning. POET opens a new gate on reasoning-enhancement pre-training, and we hope our analysis would shed light on the future research of reasoning like program executors.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | SVAMP (test) | Accuracy57.4 | 233 | |
| Arithmetic Reasoning | SVAMP | Accuracy57.4 | 48 | |
| Question Answering | HotpotQA (test) | EM0.687 | 12 | |
| Question Answering | DROP (dev) | EM78 | 10 | |
| Natural Language Inference | EQUATE (test) | Exact Match67.5 | 5 | |
| Numerical Reasoning | DROP span-subset (dev) | EM79.8 | 4 | |
| Numerical Reasoning | HotpotQA (test) | EM68.7 | 4 | |
| Numerical Reasoning | TAT-QA (dev) | Exact Match (EM)59.1 | 4 | |
| Numerical Reasoning | EQUATE (test) | Exact Match67.5 | 4 | |
| Numerical Reasoning | SVAMP (test) | Exact Match (EM)57.4 | 4 |