Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation

About

We propose a novel approach to the action segmentation task for long, untrimmed videos, based on solving an optimal transport problem. By encoding a temporal consistency prior into a Gromov-Wasserstein problem, we are able to decode a temporally consistent segmentation from a noisy affinity/matching cost matrix between video frames and action classes. Unlike previous approaches, our method does not require knowing the action order for a video to attain temporal consistency. Furthermore, our resulting (fused) Gromov-Wasserstein problem can be efficiently solved on GPUs using a few iterations of projected mirror descent. We demonstrate the effectiveness of our method in an unsupervised learning setting, where our method is used to generate pseudo-labels for self-training. We evaluate our segmentation approach and unsupervised learning pipeline on the Breakfast, 50-Salads, YouTube Instructions and Desktop Assembly datasets, yielding state-of-the-art results for the unsupervised video action segmentation task.

Ming Xu, Stephen Gould• 2024

Related benchmarks

TaskDatasetResultRank
Action SegmentationBreakfast
MoF63.3
66
Temporal SegmentationWeizmann
ACC71.4
18
Temporal SegmentationKeck
Accuracy67
18
Action Segmentation50 Salads Mid--
17
Phase RecognitionCholec80--
17
Action SegmentationYouTube Instructions
F163.3
16
Unsupervised Temporal Action SegmentationBreakfast
MOF63.3
16
Action Segmentation50 Salads (eval)
MoF64.5
13
Temporal action segmentationYouTube Instructional YTI (test)
F1 Score35.1
11
Action SegmentationProceL
F1 Score33.6
9
Showing 10 of 23 rows

Other info

Code

Follow for update