GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning

About

Multimodal Large Language Models (MLLMs) have recently demonstrated remarkable perceptual and reasoning abilities. However, they struggle to perceive fine-grained geometric structures, constraining their ability of geometric understanding and visual reasoning. To address this, we propose GeoTikzBridge, a framework that enhances local geometric perception and visual reasoning through tikz-based code generation. Within this framework, we build two models supported by two complementary datasets. The GeoTikzBridge-Base model is trained on GeoTikz-Base dataset, the largest image-to-tikz dataset to date with 2.5M pairs (16 $\times$ larger than existing open-sourced datasets). This process is achieved via iterative data expansion and a localized geometric transformation strategy. Subsequently, GeoTikzBridge-Instruct is fine-tuned on GeoTikz-Instruct dataset which is the first instruction-augmented tikz dataset supporting visual reasoning. Extensive experimental results demonstrate that our models achieve state-of-the-art performance among open-sourced MLLMs. Furthermore, GeoTikzBridge models can serve as plug-and-play reasoning modules for any MLLM(LLM), enhancing reasoning performance in geometric problem-solving. Datasets and codes are publicly available at: https://github.com/sjy-1995/GeoTikzBridge.

Jiayin Sun, Caixia Sun, Boyu Yang, Hailin Li, Xiao Chen, Yi Zhang, Errui Ding, Liang Li, Chao Deng, Junlan Feng• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MathVista	Accuracy (All)88.9	43
Math Reasoning	MathBench EN	Score49.8	32
Mathematical Reasoning	RBench-V Math	Accuracy30.1	23
Mathematical Reasoning	GAOKAO-MM Math	Accuracy68.8	17
Image-to-TikZ	DaTikZ	CLIP Score81.3	16
Image-to-TikZ	MathVista GPS	CLIP Score91.5	16
Mathematical Reasoning	RBench-V (Overall)	Accuracy16.4	16
Image-to-TikZ	GAOKAO-MM Math	CLIP Score90	9
Image-to-TikZ	EDUBenchmark	CLIP Score82.1	9
Instructed Code Generation	Instructed Code Generation	CLIP Score99.2	3

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord