Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning

About

Geometry problem solving has attracted much attention in the NLP community recently. The task is challenging as it requires abstract problem understanding and symbolic reasoning with axiomatic knowledge. However, current datasets are either small in scale or not publicly available. Thus, we construct a new large-scale benchmark, Geometry3K, consisting of 3,002 geometry problems with dense annotation in formal language. We further propose a novel geometry solving approach with formal language and symbolic reasoning, called Interpretable Geometry Problem Solver (Inter-GPS). Inter-GPS first parses the problem text and diagram into formal language automatically via rule-based text parsing and neural object detecting, respectively. Unlike implicit learning in existing methods, Inter-GPS incorporates theorem knowledge as conditional rules and performs symbolic reasoning step by step. Also, a theorem predictor is designed to infer the theorem application sequence fed to the symbolic solver for the more efficient and reasonable searching path. Extensive experiments on the Geometry3K and GEOS datasets demonstrate that Inter-GPS achieves significant improvements over existing methods. The project with code and data is available at https://lupantech.github.io/inter-gps.

Pan Lu, Ran Gong, Shibiao Jiang, Liang Qiu, Siyuan Huang, Xiaodan Liang, Song-Chun Zhu• 2021

Related benchmarks

TaskDatasetResultRank
Geometry Problem SolvingGeometry3K (test)
Choice Accuracy90.9
32
Geometry Problem SolvingPGPS9K (test)
Completion59.8
18
Geometry Problem SolvingGeometry3K 1.0 (test)
Overall Score78.3
12
Geometry Problem SolvingIMP-Geometry3K
Accuracy58
10
Geometry Problem SolvingGEOS
Accuracy67
5
Specification Generation in Geometry Formal LanguageIMP-Geometry3K
All: Likely Same73.71
3
Specification Generation in Geometry Formal LanguagePGDP5K
Likely Same (All)65.7
3
Showing 7 of 7 rows

Other info

Code

Follow for update