GenesisGeo: Technical Report
About
Recent neuro-symbolic geometry theorem provers have made significant progress on Euclidean problems by coupling neural guidance with symbolic verification. However, most existing systems operate almost exclusively in a symbolic space, leaving diagram-based intuition largely unused during reasoning. For humans, geometric diagrams provide essential heuristics for identifying non-trivial auxiliary constructions. Meanwhile, visual language models (VLMs) still struggle with geometry due to the lack of high-quality data with geometric diagrams and reasoning supervision. In this paper, we introduce GenesisGeo-1M, a large-scale synthetic dataset for visual geometric reasoning that contains 1M multimodal geometry problems paired with machine-checkable proof traces. Building on this dataset, we formulate geometric learning as a multi-task training paradigm that jointly optimizes text-based proof generation and diagram-grounded proof generation, encouraging models to learn visual grounding and symbolic deduction. Extensive experiments show that our GenesisGeo-2B model achieves gold-medal-level performance on Olympiad geometry benchmarks, solving 29/30 problems on IMO-30, 63/95 on IMO-95, and 278/409 on HAGeo-409.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Euclidean geometry problem solving | IMO-30 | Solved Problems29 | 19 | |
| Euclidean geometry problem solving | HAGeo-409 | Solved Problems278 | 16 | |
| Euclidean geometry problem solving | IMO-95 | Solved Problems63 | 15 |