CountZES: Counting via Zero-Shot Exemplar Selection

About

Object counting in complex scenes is particularly challenging in the zero-shot (ZS) setting, where instances of unseen categories are counted using only a class name. Existing ZS counting methods that infer exemplars from text often rely on off-the-shelf open-vocabulary detectors (OVDs), which in dense scenes suffer from semantic noise, appearance variability, and frequent multi-instance proposals. Alternatively, random image-patch sampling is employed, which fails to accurately delineate object instances. To address these issues, we propose CountZES, an inference-only approach for object counting via ZS exemplar selection. CountZES discovers diverse exemplars through three synergistic stages: Detection-Anchored Exemplar (DAE), Density-Guided Exemplar (DGE), and Feature-Consensus Exemplar (FCE). DAE refines OVD detections to isolate precise single-instance exemplars. DGE introduces a density-driven, self-supervised paradigm to identify statistically consistent and semantically compact exemplars, while FCE reinforces visual coherence through feature-space clustering. Together, these stages yield a complementary exemplar set that balances textual grounding, count consistency, and feature representativeness. Experiments on diverse datasets demonstrate CountZES superior performance among ZOC methods while generalizing effectively across domains.

Muhammad Ibraheem Siddiqui, Muhammad Haris Khan• 2025

Related benchmarks

Task	Dataset	Result
Object Counting	FSC-147 (test)	MAE15.77	322
Counting	CARPK	MAE7.24	52
Cell Counting	MBM (test)	MAE22.16	14
Cell Counting	VGG (test)	MAE45.55	14
Object Counting	PerSense-D Overall (test)	MAE12.29	4
Object Counting	PerSense-D Low density (test)	MAE6.86	4
Object Counting	PerSense-D Med density (test)	MAE10.26	4
Object Counting	PerSense-D High density (test)	MAE20.36	4

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord