TrustGeoGen: Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving

About

Geometric problem solving (GPS) requires precise multimodal understanding and rigorous, step-by-step logical reasoning. However, developing capable Multimodal Large Language Models (MLLMs) for GPS is heavily bottlenecked by the scarcity of high-quality, verifiable data. Existing data acquisition paradigms either suffer from modality incompleteness and unverified logical gaps ("leaps-of-faith"), or rely on formal engines that generate rigid, structurally homogeneous data, failing to produce high-difficulty problems or foster genuine natural-language reasoning. To overcome these limitations, we introduce TrustGeoGen, an autonomous and formalized geometric data generation engine. TrustGeoGen strictly guarantees reasoning trustworthiness through formal verification while generating multimodally integrated data, including premises, visual diagrams, and solutions. To systematically scale problem difficulty, we incorporates difficulty-aware filtering and iterative bootstrapping mechanism. Furthermore, we propose "connection thinking" to bridge the semantic gap between rigid formal logic and fluent human-like reasoning, ensuring coherent logical transitions. We also introduce the GeoExplore family of sampling algorithms to extract diverse problem-solving trajectories based on various thinking templates. Extensive experiments demonstrate that training models on our synthesized dataset, GeoTrust, substantially enhances deep geometric reasoning capabilities and yields significant performance gains across out-of-distribution (OOD) benchmarks, including GeoQA, Geometry3K, and OlympiadBench.Our code and data can be found at https://github.com/InternScience/TrustGeoGen

Daocheng Fu, Jianlong Chen, Renqiu Xia, Zijun Chen, Qi Liu, Yuan Feng, Hongbin Zhou, Renrui Zhang, Shiyang Feng, Peng Gao, Hongyuan Zha, Junchi Yan, Botian Shi, Yu Qiao, Bo Zhang• 2025

Related benchmarks

Task	Dataset	Result
Multimodal Reasoning	WeMath	Accuracy58.61	199
Multimodal Reasoning	MathVision	--	162
Multimodal Reasoning	MathVerse	--	138
Multimodal Reasoning	MathVista	Pass@161.8	36
Multimodal Mathematical Reasoning	MathVista 14 (1000)	Macro Score63.2	22
Geometric problem solving	GeoTrust (test)	--	15
Multimodal Reasoning	GeomVerse	Mean@14.44	11
Multimodal Reasoning	GeoQA	Mean@146.35	11
Geometric problem solving	GeoTrust Tier1 (test)	--	9
Geometric problem solving	GeoTrust Tier2 (test)	--	9

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord