Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving

About

Geometry problem-solving remains a significant challenge for Large Multimodal Models (LMMs), requiring not only global shape recognition but also attention to intricate local relationships related to geometric theory. To address this, we propose GeoFocus, a novel framework comprising two core modules. 1) Critical Local Perceptor, which automatically identifies and emphasizes critical local structure (e.g., angles, parallel lines, comparative distances) through thirteen theory-based perception templates, boosting critical local feature coverage by 61% compared to previous methods. 2) VertexLang, a compact topology formal language, encodes global figures through vertex coordinates and connectivity relations. By replacing bulky code-based encodings, VertexLang reduces global perception training time by 20% while improving topology recognition accuracy. When evaluated in Geo3K, GeoQA, and FormalGeo7K, GeoFocus achieves a 4.7% accuracy improvement over leading specialized models and demonstrates superior robustness in MATHVERSE under diverse visual conditions. Project Page -- https://github.com/dle666/GeoFocus

Linger Deng, Yuliang Liu, Wenwen Yu, Zujia Zhang, Jianzhong Ju, Zhenbo Luo, Xiang Bai• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMathVista
Accuracy74.3
97
Chart UnderstandingChartQA
Accuracy81.5
83
Mathematical ReasoningWeMath
Accuracy69.4
75
Mathematical ReasoningMathVerse
Accuracy45.7
39
Mathematical ReasoningMathVision
Accuracy28
38
Geometry Problem SolvingGeoQA
Top-1 Acc71.9
26
Geometry Problem SolvingGeo3K
Top-1 Accuracy55.3
19
Geometry Problem SolvingFormalgeo7k
Top-1 Accuracy63.5
17
Hallucination EvaluationHalluBench
Accuracy71.1
8
Showing 9 of 9 rows

Other info

Follow for update