Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

About

We propose YOLO-Count, a differentiable open-vocabulary object counting model that tackles both general counting challenges and enables precise quantity control for text-to-image (T2I) generation. A core contribution is the 'cardinality' map, a novel regression target that accounts for variations in object size and spatial distribution. Leveraging representation alignment and a hybrid strong-weak supervision scheme, YOLO-Count bridges the gap between open-vocabulary counting and T2I generation control. Its fully differentiable architecture facilitates gradient-based optimization, enabling accurate object count estimation and fine-grained guidance for generative models. Extensive experiments demonstrate that YOLO-Count achieves state-of-the-art counting accuracy while providing robust and effective quantity control for T2I systems.

Guanning Zeng, Xiang Zhang, Zirui Wang, Haiyang Xu, Zeyuan Chen, Bingnan Li, Zhuowen Tu• 2025

Related benchmarks

TaskDatasetResultRank
Object CountingFSC-147 (test)
MAE15.43
322
Object CountingFSC-147 (val)
MAE14.8
240
Object CountingFSC-147 (Average)
MAE15.12
19
Showing 3 of 3 rows

Other info

Follow for update