Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TrustGeoGen: Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving

About

Geometric problem solving (GPS) requires precise multimodal understanding and rigorous, step-by-step logical reasoning. However, developing capable Multimodal Large Language Models (MLLMs) for GPS is heavily bottlenecked by the scarcity of high-quality, verifiable data. Existing data acquisition paradigms either suffer from modality incompleteness and unverified logical gaps ("leaps-of-faith"), or rely on formal engines that generate rigid, structurally homogeneous data, failing to produce high-difficulty problems or foster genuine natural-language reasoning. To overcome these limitations, we introduce TrustGeoGen, an autonomous and formalized geometric data generation engine. TrustGeoGen strictly guarantees reasoning trustworthiness through formal verification while generating multimodally integrated data, including premises, visual diagrams, and solutions. To systematically scale problem difficulty, we incorporates difficulty-aware filtering and iterative bootstrapping mechanism. Furthermore, we propose "connection thinking" to bridge the semantic gap between rigid formal logic and fluent human-like reasoning, ensuring coherent logical transitions. We also introduce the GeoExplore family of sampling algorithms to extract diverse problem-solving trajectories based on various thinking templates. Extensive experiments demonstrate that training models on our synthesized dataset, GeoTrust, substantially enhances deep geometric reasoning capabilities and yields significant performance gains across out-of-distribution (OOD) benchmarks, including GeoQA, Geometry3K, and OlympiadBench.Our code and data can be found at https://github.com/InternScience/TrustGeoGen

Daocheng Fu, Jianlong Chen, Renqiu Xia, Zijun Chen, Qi Liu, Yuan Feng, Hongbin Zhou, Renrui Zhang, Shiyang Feng, Peng Gao, Hongyuan Zha, Junchi Yan, Botian Shi, Yu Qiao, Bo Zhang• 2025

Related benchmarks

TaskDatasetResultRank
Multimodal ReasoningWeMath
Accuracy58.61
129
Multimodal ReasoningMathVision--
102
Multimodal ReasoningMathVerse--
84
Multimodal ReasoningMathVista
Pass@161.8
36
Geometric problem solvingGeoTrust (test)--
15
Multimodal ReasoningGeomVerse
Mean@14.44
11
Multimodal ReasoningGeoQA
Mean@146.35
11
Geometric problem solvingGeoTrust Tier1 (test)--
9
Geometric problem solvingGeoTrust Tier2 (test)--
9
Geometric problem solvingGeoTrust Tier3 (test)--
9
Showing 10 of 14 rows

Other info

Follow for update