CountZES: Counting via Zero-Shot Exemplar Selection
About
Object counting in complex scenes is particularly challenging in the zero-shot (ZS) setting, where instances of unseen categories are counted using only a class name. Existing ZS counting methods that infer exemplars from text often rely on off-the-shelf open-vocabulary detectors (OVDs), which in dense scenes suffer from semantic noise, appearance variability, and frequent multi-instance proposals. Alternatively, random image-patch sampling is employed, which fails to accurately delineate object instances. To address these issues, we propose CountZES, an inference-only approach for object counting via ZS exemplar selection. CountZES discovers diverse exemplars through three synergistic stages: Detection-Anchored Exemplar (DAE), Density-Guided Exemplar (DGE), and Feature-Consensus Exemplar (FCE). DAE refines OVD detections to isolate precise single-instance exemplars. DGE introduces a density-driven, self-supervised paradigm to identify statistically consistent and semantically compact exemplars, while FCE reinforces visual coherence through feature-space clustering. Together, these stages yield a complementary exemplar set that balances textual grounding, count consistency, and feature representativeness. Experiments on diverse datasets demonstrate CountZES superior performance among ZOC methods while generalizing effectively across domains.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Counting | FSC-147 (test) | MAE15.77 | 297 | |
| Counting | CARPK | MAE7.24 | 41 | |
| Cell Counting | MBM (test) | MAE22.16 | 14 | |
| Cell Counting | VGG (test) | MAE45.55 | 14 | |
| Object Counting | PerSense-D Overall (test) | MAE12.29 | 4 | |
| Object Counting | PerSense-D Low density (test) | MAE6.86 | 4 | |
| Object Counting | PerSense-D Med density (test) | MAE10.26 | 4 | |
| Object Counting | PerSense-D High density (test) | MAE20.36 | 4 |