Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation

About

We propose a novel approach to the action segmentation task for long, untrimmed videos, based on solving an optimal transport problem. By encoding a temporal consistency prior into a Gromov-Wasserstein problem, we are able to decode a temporally consistent segmentation from a noisy affinity/matching cost matrix between video frames and action classes. Unlike previous approaches, our method does not require knowing the action order for a video to attain temporal consistency. Furthermore, our resulting (fused) Gromov-Wasserstein problem can be efficiently solved on GPUs using a few iterations of projected mirror descent. We demonstrate the effectiveness of our method in an unsupervised learning setting, where our method is used to generate pseudo-labels for self-training. We evaluate our segmentation approach and unsupervised learning pipeline on the Breakfast, 50-Salads, YouTube Instructions and Desktop Assembly datasets, yielding state-of-the-art results for the unsupervised video action segmentation task.

Ming Xu, Stephen Gould• 2024

Related benchmarks

TaskDatasetResultRank
Action SegmentationBreakfast
MoF63.3
66
Action Segmentation50 Salads Mid--
17
Action SegmentationYouTube Instructions
F163.3
16
Action Segmentation50 Salads (eval)
MoF64.5
13
Temporal action segmentationYouTube Instructional YTI (test)
F1 Score35.1
11
Unsupervised Temporal Action SegmentationBreakfast
MOF56.1
10
Action SegmentationDesktop Assembly
MoF73.4
7
Temporal action segmentationIKEA ASM (test)
MOF34
5
Showing 8 of 8 rows

Other info

Code

Follow for update