Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Beyond Transfer Learning: Co-finetuning for Action Localisation

About

Transfer learning is the predominant paradigm for training deep networks on small target datasets. Models are typically pretrained on large ``upstream'' datasets for classification, as such labels are easy to collect, and then finetuned on ``downstream'' tasks such as action localisation, which are smaller due to their finer-grained annotations. In this paper, we question this approach, and propose co-finetuning -- simultaneously training a single model on multiple ``upstream'' and ``downstream'' tasks. We demonstrate that co-finetuning outperforms traditional transfer learning when using the same total amount of data, and also show how we can easily extend our approach to multiple ``upstream'' datasets to further improve performance. In particular, co-finetuning significantly improves the performance on rare classes in our downstream task, as it has a regularising effect, and enables the network to learn feature representations that transfer between different datasets. Finally, we observe how co-finetuning with public, video classification datasets, we are able to achieve state-of-the-art results for spatio-temporal action localisation on the challenging AVA and AVA-Kinetics datasets, outperforming recent works which develop intricate models.

Anurag Arnab, Xuehan Xiong, Alexey Gritsenko, Rob Romijnders, Josip Djolonga, Mostafa Dehghani, Chen Sun, Mario Lu\v{c}i\'c, Cordelia Schmid• 2022

Related benchmarks

TaskDatasetResultRank
Spatiotemporal Action LocalizationAVA 2.2
mAP36.1
21
Spatio-temporal Action LocalizationAVA-Kinetics v1.0
mAP36.2
10
Showing 2 of 2 rows

Other info

Follow for update