G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

About

Large language models (LLMs) have shown remarkable proficiency in human-level reasoning and generation capabilities, which encourages extensive research on their application in mathematical problem solving. However, current work has been largely focused on text-based mathematical problems, with limited investigation in problems involving geometric information. Addressing this gap, we aim to enable LLMs to solve geometric problems by understanding image input. We first analyze the limitations of current Multimodal Large Language Models (MLLMs) in this area: they struggle to accurately comprehending basic geometric elements and their relationships. To overcome these challenges, we take advantage of the unique characteristics of geometric problems (such as unique geometric logical form, and geometric scalability) and the capacity of the textual LLMs to build an enriched multimodal geometry dataset based on existing data. The augmented dataset, Geo170K, contains more than 170K geometric image-caption and question-answer pairs. Utilizing our constructed Geo170K dataset, we develop G-LLaVA, which demonstrates exceptional performance in solving geometric problems, significantly outperforming GPT-4-V on the MathVista benchmark with only 7B parameters.

Jiahui Gao, Renjie Pi, Jipeng Zhang, Jiacheng Ye, Wanjun Zhong, Yufei Wang, Lanqing Hong, Jianhua Han, Hang Xu, Zhenguo Li, Lingpeng Kong• 2023

Related benchmarks

Task	Dataset	Result
Visual Mathematical Reasoning	MathVista	Accuracy25.1	448
Mathematical Reasoning	MathVerse	--	266
Multimodal Reasoning	WeMath	Accuracy57.44	199
Multimodal Reasoning	MathVision	--	162
Multimodal Reasoning	MathVerse	--	138
Multimodal Mathematical Reasoning	MathVista mini (test)	--	114
Geometry Problem Solving	Geometry3K (test)	Choice Accuracy29	76
Mathematical Reasoning	MathVista (test)	Accuracy25.1	66
Geometry Problem Solving	PGPS9K (test)	Choice Accuracy27	57
Mathematical Reasoning	MathVista	Accuracy (All)25.1	43

Showing 10 of 32 rows

Other info

Follow for update

@wizwand_team Discord