FiGO: Fine-Grained Object Counting without Annotations

About

Class-agnostic counting (CAC) methods reduce annotation costs by letting users define what to count at test-time through text or visual exemplars. However, current open-vocabulary approaches work well for broad categories but fail when fine-grained category distinctions are needed, such as telling apart waterfowl species or pepper cultivars. We present FiGO, a new annotation-free method that adapts existing counting models to fine-grained categories using only the category name. Our approach uses a text-to-image diffusion model to create synthetic examples and a joint positive/hard-negative loss to learn a compact concept embedding that conditions a specialization module to convert outputs from any frozen counter into accurate, fine-grained estimates. To evaluate fine-grained counting, we introduce LOOKALIKES, a dataset of 37 subcategories across 14 parent categories with many visually similar objects per image. Our method substantially outperforms strong open-vocabulary baselines, moving counting systems from "count all the peppers" to "count only the habaneros."

Adriano D'Alessandro, Ali Mahdavi-Amiri, Ghassan Hamarneh• 2025

Related benchmarks

Task	Dataset	Result	Rank
Object Counting	LOOKALIKES (test)	MAE10		11

Showing 1 of 1 rows

Other info

Follow for update

@wizwand_team Discord