Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving

About

Geometry problem-solving remains a significant challenge for Large Multimodal Models (LMMs), requiring not only global shape recognition but also attention to intricate local relationships related to geometric theory. To address this, we propose GeoFocus, a novel framework comprising two core modules. 1) Critical Local Perceptor, which automatically identifies and emphasizes critical local structure (e.g., angles, parallel lines, comparative distances) through thirteen theory-based perception templates, boosting critical local feature coverage by 61% compared to previous methods. 2) VertexLang, a compact topology formal language, encodes global figures through vertex coordinates and connectivity relations. By replacing bulky code-based encodings, VertexLang reduces global perception training time by 20% while improving topology recognition accuracy. When evaluated in Geo3K, GeoQA, and FormalGeo7K, GeoFocus achieves a 4.7% accuracy improvement over leading specialized models and demonstrates superior robustness in MATHVERSE under diverse visual conditions. Project Page -- https://github.com/dle666/GeoFocus

Linger Deng, Yuliang Liu, Wenwen Yu, Zujia Zhang, Jianzhong Ju, Zhenbo Luo, Xiang Bai• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMathVista
Accuracy74.3
257
Mathematical ReasoningWeMath
Accuracy69.4
161
Mathematical ReasoningMathVision
Accuracy28
144
Chart UnderstandingChartQA
Accuracy81.5
127
Mathematical ReasoningMathVerse
Accuracy45.7
109
Geometry Problem SolvingGeo3K
Top-1 Accuracy55.3
28
Geometry Problem SolvingGeoQA
Top-1 Acc71.9
26
Geometry Problem SolvingFormalgeo7k
Top-1 Accuracy63.5
17
Hallucination EvaluationHalluBench
Accuracy71.1
8
Showing 9 of 9 rows

Other info

Follow for update