Counting Out Time: Class Agnostic Video Repetition Counting in the Wild
About
We present an approach for estimating the period with which an action is repeated in a video. The crux of the approach lies in constraining the period prediction module to use temporal self-similarity as an intermediate representation bottleneck that allows generalization to unseen repetitions in videos in the wild. We train this model, called Repnet, with a synthetic dataset that is generated from a large unlabeled video collection by sampling short clips of varying lengths and repeating them with different periods and counts. This combination of synthetic data and a powerful yet constrained model, allows us to predict periods in a class-agnostic fashion. Our model substantially exceeds the state of the art performance on existing periodicity (PERTUBE) and repetition counting (QUVA) benchmarks. We also collect a new challenging dataset called Countix (~90 times larger than existing datasets) which captures the challenges of repetition counting in real-world videos. Project webpage: https://sites.google.com/view/repnet .
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Repetition Counting | UCFRep (test) | MAE99.8 | 32 | |
| Repetitive Action Counting | RepCount (test) | MAE0.995 | 9 | |
| Repetitive Action Counting | RepCount-A Regular Setting (test) | MAE0.995 | 9 | |
| Repetitive Action Counting | UCFRep-pose (test) | MAE98.1 | 8 | |
| Repetitive Action Counting | Countix (test) | MAE0.36 | 8 | |
| Repetitive Action Counting | RepCount-pose (test) | MAE0.995 | 8 | |
| Visual Repetition Counting | RepCount benchmark split | MAE0.013 | 7 | |
| Action Counting | RepCount part-A (test) | MAE0.995 | 7 | |
| Video Repetition Counting | Countix (test) | MAE0.729 | 5 | |
| Repetition Counting | QUVA | MAE0.104 | 4 |