REC-RL: Referring expression counting via Gaussian and range-based reward optimization

About

Referring expression counting (REC) is an intention-driven task that requires context-aware visual reasoning. While recent vision-language models incorporate language for visual understanding, most existing REC methods rely on rulebased reinforcement learning with rewards focused primarily on final accuracy, overlooking the quality of intermediate reasoning. We propose REC-RL, a reinforcement learning framework that introduces a think-range-answer paradigm to explicitly optimize the visual reasoning process. RECRL employs Group Relative Policy Optimization and two lightweight rewards: an accuracy reward that combines range-based interval supervision with Gaussian-based precision guidance, and a format reward that enforces structured outputs. By modeling intermediate focus prediction as internal decision-making, REC-RL avoids additional annotations and better aligns with human perception. Extensive experiments demonstrate consistent improvements over strong baselines and robust generalization across benchmarks.

Hui Liu, Yunlai Teng, Kunlong Bai, Pengfei Qi, Haotian Yan, Liang Li, Junlan Feng• 2026

Related benchmarks

Task	Dataset	Result
Referring Expression Counting	REC 8K (test)	MAE5.06	40
Referring Expression Counting	REC 8K (val)	MAE5.05	33
Object Counting	Cross-domain General Object Counting v1 (test)	MAE (Sheep)2.08	11
Referring Expression Counting	RefCOCO OOD (test)	MAE0.23	2

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord