Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GeometryZero: Advancing Geometry Solving via Group Contrastive Policy Optimization

About

Recent progress in large language models (LLMs) has boosted mathematical reasoning, yet geometry remains challenging where auxiliary construction is often essential. Prior methods either underperform or depend on very large models (e.g., GPT-4o), making them costly. We argue that reinforcement learning with verifiable rewards (e.g., GRPO) can train smaller models to couple auxiliary construction with solid geometric reasoning. However, naively applying GRPO yields unconditional rewards, encouraging indiscriminate and sometimes harmful constructions. We propose Group Contrastive Policy Optimization (GCPO), an RL framework with two components: (1) Group Contrastive Masking, which assigns positive/negative construction rewards based on contextual utility, and (2) a Length Reward that encourages longer reasoning chains. On top of GCPO, we build GeometryZero, an affordable family of geometry reasoning models that selectively use auxiliary construction. Experiments on Geometry3K and MathVista show GeometryZero consistently outperforms RL baselines (e.g., GRPO, ToRL). The code has been available at https://github.com/ekonwang/GeometryZero.

Yikun Wang, Yibin Wang, Dianyi Wang, Zimian Peng, Qipeng Guo, Dacheng Tao, Jiaqi Wang• 2025

Related benchmarks

TaskDatasetResultRank
Geometry Problem SolvingGeometry3K
Accuracy74.68
41
Geometry Problem SolvingMathVista
BoN@3 Accuracy87.15
17
Geometry Problem SolvingGeomVerse
BoN@3 Accuracy18.23
17
Geometry Problem SolvingOlympiadBench
BoN@3 Accuracy45.69
17
Geometric problem solvingGeomVerse
Accuracy68.3
15
Geometry Problem SolvingOlympiadBench
Accuracy37.19
12
Geometric problem solvingGeoAux
Accuracy40.18
10
Geometric ProvingUniGeo proof part
Accuracy72.2
4
Showing 8 of 8 rows

Other info

Follow for update