GeometryZero: Advancing Geometry Solving via Group Contrastive Policy Optimization

About

Recent progress in large language models (LLMs) has boosted mathematical reasoning, yet geometry remains challenging where auxiliary construction is often essential. Prior methods either underperform or depend on very large models (e.g., GPT-4o), making them costly. We argue that reinforcement learning with verifiable rewards (e.g., GRPO) can train smaller models to couple auxiliary construction with solid geometric reasoning. However, naively applying GRPO yields unconditional rewards, encouraging indiscriminate and sometimes harmful constructions. We propose Group Contrastive Policy Optimization (GCPO), an RL framework with two components: (1) Group Contrastive Masking, which assigns positive/negative construction rewards based on contextual utility, and (2) a Length Reward that encourages longer reasoning chains. On top of GCPO, we build GeometryZero, an affordable family of geometry reasoning models that selectively use auxiliary construction. Experiments on Geometry3K and MathVista show GeometryZero consistently outperforms RL baselines (e.g., GRPO, ToRL). The code has been available at https://github.com/ekonwang/GeometryZero.

Yikun Wang, Yibin Wang, Dianyi Wang, Zimian Peng, Qipeng Guo, Dacheng Tao, Jiaqi Wang• 2025

Related benchmarks

Task	Dataset	Result
Geometry Problem Solving	Geometry3K	Accuracy74.68	41
Geometry Problem Solving	MathVista	BoN@3 Accuracy87.15	17
Geometry Problem Solving	GeomVerse	BoN@3 Accuracy18.23	17
Geometry Problem Solving	OlympiadBench	BoN@3 Accuracy45.69	17
Geometric problem solving	GeomVerse	Accuracy68.3	15
Geometry Problem Solving	OlympiadBench	Accuracy37.19	12
Geometric problem solving	GeoAux	Accuracy40.18	10
Geometric Proving	UniGeo proof part	Accuracy72.2	4

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord