Guided Slot Attention for Unsupervised Video Object Segmentation

About

Unsupervised video object segmentation aims to segment the most prominent object in a video sequence. However, the existence of complex backgrounds and multiple foreground objects make this task challenging. To address this issue, we propose a guided slot attention network to reinforce spatial structural information and obtain better foreground--background separation. The foreground and background slots, which are initialized with query guidance, are iteratively refined based on interactions with template information. Furthermore, to improve slot--template interaction and effectively fuse global and local features in the target and reference frames, K-nearest neighbors filtering and a feature aggregation transformer are introduced. The proposed model achieves state-of-the-art performance on two popular datasets. Additionally, we demonstrate the robustness of the proposed model in challenging scenes through various comparative experiments.

Minhyeok Lee, Suhwan Cho, Dogyoon Lee, Chaewon Park, Jungho Lee, Sangyoun Lee• 2023

Related benchmarks

Task	Dataset	Result	Rank
Unsupervised Video Object Segmentation	DAVIS 2016 (val)	F Mean89.6		108
Unsupervised Video Object Segmentation	FBMS (test)	J Mean83.1		66

Showing 2 of 2 rows

Other info

Code

Follow for update

@wizwand_team Discord