Action Segmentation with Mixed Temporal Domain Adaptation

About

The main progress for action segmentation comes from densely-annotated data for fully-supervised learning. Since manual annotation for frame-level actions is time-consuming and challenging, we propose to exploit auxiliary unlabeled videos, which are much easier to obtain, by shaping this problem as a domain adaptation (DA) problem. Although various DA techniques have been proposed in recent years, most of them have been developed only for the spatial direction. Therefore, we propose Mixed Temporal Domain Adaptation (MTDA) to jointly align frame- and video-level embedded feature spaces across domains, and further integrate with the domain attention mechanism to focus on aligning the frame-level features with higher domain discrepancy, leading to more effective domain adaptation. Finally, we evaluate our proposed methods on three challenging datasets (GTEA, 50Salads, and Breakfast), and validate that MTDA outperforms the current state-of-the-art methods on all three datasets by large margins (e.g. 6.4% gain on F1@50 and 6.8% gain on the edit score for GTEA).

Min-Hung Chen, Baopu Li, Yingze Bao, Ghassan AlRegib• 2021

Related benchmarks

Task	Dataset	Result
Action Segmentation	Breakfast	Acc71	127
Temporal action segmentation	50Salads	Accuracy83.2	117
Action Segmentation	50Salads	Edit Distance75.2	114
Temporal action segmentation	GTEA	F1 Score @ 10% Threshold90.5	105
Action Segmentation	GTEA (test)	F1@10%90.5	25
Action Segmentation	GTEA	F1@1090.5	23
Temporal action segmentation	50 Salads 65	F1@1082	22
Temporal action segmentation	GTEA 23	F1@10%90.5	19
Temporal action segmentation	Breakfast 40	F1@1074.2	19
Action Segmentation	50Salads (test)	F1@1082	16

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord