GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving
About
Geometry problem-solving remains a significant challenge for Large Multimodal Models (LMMs), requiring not only global shape recognition but also attention to intricate local relationships related to geometric theory. To address this, we propose GeoFocus, a novel framework comprising two core modules. 1) Critical Local Perceptor, which automatically identifies and emphasizes critical local structure (e.g., angles, parallel lines, comparative distances) through thirteen theory-based perception templates, boosting critical local feature coverage by 61% compared to previous methods. 2) VertexLang, a compact topology formal language, encodes global figures through vertex coordinates and connectivity relations. By replacing bulky code-based encodings, VertexLang reduces global perception training time by 20% while improving topology recognition accuracy. When evaluated in Geo3K, GeoQA, and FormalGeo7K, GeoFocus achieves a 4.7% accuracy improvement over leading specialized models and demonstrates superior robustness in MATHVERSE under diverse visual conditions. Project Page -- https://github.com/dle666/GeoFocus
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | MathVista | Accuracy74.3 | 97 | |
| Chart Understanding | ChartQA | Accuracy81.5 | 83 | |
| Mathematical Reasoning | WeMath | Accuracy69.4 | 75 | |
| Mathematical Reasoning | MathVerse | Accuracy45.7 | 39 | |
| Mathematical Reasoning | MathVision | Accuracy28 | 38 | |
| Geometry Problem Solving | GeoQA | Top-1 Acc71.9 | 26 | |
| Geometry Problem Solving | Geo3K | Top-1 Accuracy55.3 | 19 | |
| Geometry Problem Solving | Formalgeo7k | Top-1 Accuracy63.5 | 17 | |
| Hallucination Evaluation | HalluBench | Accuracy71.1 | 8 |