Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation

About

We propose a novel approach to the action segmentation task for long, untrimmed videos, based on solving an optimal transport problem. By encoding a temporal consistency prior into a Gromov-Wasserstein problem, we are able to decode a temporally consistent segmentation from a noisy affinity/matching cost matrix between video frames and action classes. Unlike previous approaches, our method does not require knowing the action order for a video to attain temporal consistency. Furthermore, our resulting (fused) Gromov-Wasserstein problem can be efficiently solved on GPUs using a few iterations of projected mirror descent. We demonstrate the effectiveness of our method in an unsupervised learning setting, where our method is used to generate pseudo-labels for self-training. We evaluate our segmentation approach and unsupervised learning pipeline on the Breakfast, 50-Salads, YouTube Instructions and Desktop Assembly datasets, yielding state-of-the-art results for the unsupervised video action segmentation task.

Ming Xu, Stephen Gould• 2024

Related benchmarks

Task	Dataset	Result
Action Segmentation	Breakfast	MoF63.3	78
Action Segmentation	YouTube Instructions	F163.3	28
Action Segmentation	50 Salads (eval)	MoF64.5	24
Phase Recognition	Cholec80	--	24
Temporal Segmentation	Weizmann	ACC71.4	18
Temporal Segmentation	Keck	Accuracy67	18
Action Segmentation	50 Salads Mid	--	17
Unsupervised Temporal Action Segmentation	Breakfast	MOF63.3	16
Action Segmentation	Desktop Assembly	MoF73.4	15
Temporal action segmentation	YouTube Instructional YTI (test)	F1 Score35.1	11

Showing 10 of 23 rows

Other info

Code

Follow for update

@wizwand_team Discord