Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Drop-DTW: Aligning Common Signal Between Sequences While Dropping Outliers

About

In this work, we consider the problem of sequence-to-sequence alignment for signals containing outliers. Assuming the absence of outliers, the standard Dynamic Time Warping (DTW) algorithm efficiently computes the optimal alignment between two (generally) variable-length sequences. While DTW is robust to temporal shifts and dilations of the signal, it fails to align sequences in a meaningful way in the presence of outliers that can be arbitrarily interspersed in the sequences. To address this problem, we introduce Drop-DTW, a novel algorithm that aligns the common signal between the sequences while automatically dropping the outlier elements from the matching. The entire procedure is implemented as a single dynamic program that is efficient and fully differentiable. In our experiments, we show that Drop-DTW is a robust similarity measure for sequence retrieval and demonstrate its effectiveness as a training loss on diverse applications. With Drop-DTW, we address temporal step localization on instructional videos, representation learning from noisy videos, and cross-modal representation learning for audio-visual retrieval and localization. In all applications, we take a weakly- or unsupervised approach and demonstrate state-of-the-art results under these settings.

Nikita Dvornik, Isma Hadji, Konstantinos G. Derpanis, Animesh Garg, Allan D. Jepson• 2021

Related benchmarks

TaskDatasetResultRank
Audio-Visual Event LocalizationAVE--
35
Action SegmentationCOIN
Frame Accuracy59.6
29
Keystep recognitionCrossTask
Accuracy22.3
17
Step localizationCrossTask
Recall49.7
8
Step localizationCOIN
Accuracy59.6
8
Step localizationYouCook2
Recall77.4
7
Audio localization from visual segment queryAVE
V2A35.8
4
Showing 7 of 7 rows

Other info

Follow for update