Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CounTR: Transformer-based Generalised Visual Counting

About

In this paper, we consider the problem of generalised visual object counting, with the goal of developing a computational model for counting the number of objects from arbitrary semantic categories, using arbitrary number of "exemplars", i.e. zero-shot or few-shot counting. To this end, we make the following four contributions: (1) We introduce a novel transformer-based architecture for generalised visual object counting, termed as Counting Transformer (CounTR), which explicitly capture the similarity between image patches or with given "exemplars" with the attention mechanism;(2) We adopt a two-stage training regime, that first pre-trains the model with self-supervised learning, and followed by supervised fine-tuning;(3) We propose a simple, scalable pipeline for synthesizing training images with a large number of instances or that from different semantic categories, explicitly forcing the model to make use of the given "exemplars";(4) We conduct thorough ablation studies on the large-scale counting benchmark, e.g. FSC-147, and demonstrate state-of-the-art performance on both zero and few-shot settings.

Chang Liu, Yujie Zhong, Andrew Zisserman, Weidi Xie• 2022

Related benchmarks

TaskDatasetResultRank
Object CountingFSC-147 (test)
MAE11.95
322
Object CountingFSC-147 (val)
MAE13.13
240
Car Object CountingCARPK (test)
MAE5.75
116
CountingCARPK
MAE5.75
52
Object CountingFSC-147 1.0 (test)
MAE11.95
50
Object CountingFSC-147 1.0 (val)
MAE13.13
50
Object CountingFSCD-LVIS (test)
MAE34.76
21
Object CountingFSC-147 (Average)
MAE12.54
19
Few-shot Object CountingFSC147 1.0 (val)
MAE13.13
19
Few-shot Object CountingFSC147 1.0 (test)
MAE11.95
19
Showing 10 of 27 rows

Other info

Code

Follow for update