Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning

About

Automatic math problem solving has recently attracted increasing attention as a long-standing AI benchmark. In this paper, we focus on solving geometric problems, which requires a comprehensive understanding of textual descriptions, visual diagrams, and theorem knowledge. However, the existing methods were highly dependent on handcraft rules and were merely evaluated on small-scale datasets. Therefore, we propose a Geometric Question Answering dataset GeoQA, containing 4,998 geometric problems with corresponding annotated programs, which illustrate the solving process of the given problems. Compared with another publicly available dataset GeoS, GeoQA is 25 times larger, in which the program annotations can provide a practical testbed for future research on explicit and explainable numerical reasoning. Moreover, we introduce a Neural Geometric Solver (NGS) to address geometric problems by comprehensively parsing multimodal information and generating interpretable programs. We further add multiple self-supervised auxiliary tasks on NGS to enhance cross-modal semantic representation. Extensive experiments on GeoQA validate the effectiveness of our proposed NGS and auxiliary tasks. However, the results are still significantly lower than human performance, which leaves large room for future research. Our benchmark and code are released at https://github.com/chen-judge/GeoQA .

Jiaqi Chen, Jianheng Tang, Jinghui Qin, Xiaodan Liang, Lingbo Liu, Eric P. Xing, Liang Lin• 2021

Related benchmarks

TaskDatasetResultRank
Multimodal ReasoningWeMath
Accuracy58.02
43
Geometry Problem SolvingMathVista GPS
Accuracy48.4
38
Geometry Problem SolvingGeometry3K (test)
Choice Accuracy58.8
32
Multimodal ReasoningMathVista
Pass@162.1
30
Multimodal ReasoningMathVision
Pass@122.68
23
Geometry CalculationUniGeo 1.0 (Calculation)
Overall Accuracy56.9
22
Multimodal ReasoningMathVerse--
20
Geometry Problem SolvingPGPS9K (test)
Completion34.1
18
Geometry ProvingUniGeo 1.0 (Proving)
Overall Score53.2
15
Geometry Problem SolvingGeoQA (test)
Choice Accuracy92.3
13
Showing 10 of 14 rows

Other info

Code

Follow for update