Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning

About

Multimodal Large Language Models (MLLMs) have recently demonstrated remarkable perceptual and reasoning abilities. However, they struggle to perceive fine-grained geometric structures, constraining their ability of geometric understanding and visual reasoning. To address this, we propose GeoTikzBridge, a framework that enhances local geometric perception and visual reasoning through tikz-based code generation. Within this framework, we build two models supported by two complementary datasets. The GeoTikzBridge-Base model is trained on GeoTikz-Base dataset, the largest image-to-tikz dataset to date with 2.5M pairs (16 $\times$ larger than existing open-sourced datasets). This process is achieved via iterative data expansion and a localized geometric transformation strategy. Subsequently, GeoTikzBridge-Instruct is fine-tuned on GeoTikz-Instruct dataset which is the first instruction-augmented tikz dataset supporting visual reasoning. Extensive experimental results demonstrate that our models achieve state-of-the-art performance among open-sourced MLLMs. Furthermore, GeoTikzBridge models can serve as plug-and-play reasoning modules for any MLLM(LLM), enhancing reasoning performance in geometric problem-solving. Datasets and codes are publicly available at: https://github.com/sjy-1995/GeoTikzBridge.

Jiayin Sun, Caixia Sun, Boyu Yang, Hailin Li, Xiao Chen, Yi Zhang, Errui Ding, Liang Li, Chao Deng, Junlan Feng• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMathVista
Accuracy (All)88.9
43
Math ReasoningMathBench EN
Score49.8
32
Mathematical ReasoningGAOKAO-MM Math
Accuracy68.8
17
Image-to-TikZDaTikZ
CLIP Score81.3
16
Image-to-TikZMathVista GPS
CLIP Score91.5
16
Mathematical ReasoningRBench-V Math
Accuracy30.1
16
Mathematical ReasoningRBench-V (Overall)
Accuracy16.4
16
Image-to-TikZGAOKAO-MM Math
CLIP Score90
9
Image-to-TikZEDUBenchmark
CLIP Score82.1
9
Instructed Code GenerationInstructed Code Generation
CLIP Score99.2
3
Showing 10 of 11 rows

Other info

Follow for update