Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning
About
Large Language Models (LLMs) have shown human-like reasoning abilities but still struggle with complex logical problems. This paper introduces a novel framework, Logic-LM, which integrates LLMs with symbolic solvers to improve logical problem-solving. Our method first utilizes LLMs to translate a natural language problem into a symbolic formulation. Afterward, a deterministic symbolic solver performs inference on the formulated problem. We also introduce a self-refinement module, which utilizes the symbolic solver's error messages to revise symbolic formalizations. We demonstrate Logic-LM's effectiveness on five logical reasoning datasets: ProofWriter, PrOntoQA, FOLIO, LogicalDeduction, and AR-LSAT. On average, Logic-LM achieves a significant performance boost of 39.2% over using LLM alone with standard prompting and 18.4% over LLM with chain-of-thought prompting. Our findings suggest that Logic-LM, by combining LLMs with symbolic logic, offers a promising avenue for faithful logical reasoning. Code and data are publicly available at https://github.com/teacherpeterpan/Logic-LLM.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Logical reasoning | FOLIO | Accuracy71.6 | 119 | |
| Logical reasoning | FOLIO (test) | Accuracy73.83 | 58 | |
| Logical reasoning | ProofWriter (test) | Accuracy84.35 | 36 | |
| Logical reasoning | ProntoQA (test) | Accuracy90.5 | 36 | |
| Logical reasoning | ProofWriter | Accuracy64.7 | 32 | |
| Logical reasoning | AR-LSAT (test) | Accuracy62.1 | 24 | |
| Logical reasoning | AR-LSAT | Accuracy31.2 | 24 | |
| Logical reasoning | Deduction (test) | Accuracy99.5 | 20 | |
| Long-Context Stability & Retrieval | LongBench | SR (Stability)42.8 | 6 | |
| Long-Context Stability & Retrieval | InfBench | Stability Rate (SR)36.7 | 6 |