Actor-Action Semantic Segmentation with Grouping Process Models

About

Actor-action semantic segmentation made an important step toward advanced video understanding problems: what action is happening; who is performing the action; and where is the action in space-time. Current models for this problem are local, based on layered CRFs, and are unable to capture long-ranging interaction of video parts. We propose a new model that combines these local labeling CRFs with a hierarchical supervoxel decomposition. The supervoxels provide cues for possible groupings of nodes, at various scales, in the CRFs to encourage adaptive, high-order groups for more effective labeling. Our model is dynamic and continuously exchanges information during inference: the local CRFs influence what supervoxels in the hierarchy are active, and these active nodes influence the connectivity in the CRF; we hence call it a grouping process model. The experimental results on a recent large-scale video dataset show a large margin of 60% relative improvement over the state of the art, which demonstrates the effectiveness of the dynamic, bidirectional flow between labeling and grouping.

Chenliang Xu, Jason J. Corso• 2015

Related benchmarks

Task	Dataset	Result	Rank
Actor Semantic Segmentation	A2D (test)	Class-Avg Pixel Acc58.3		8
Action Semantic Segmentation	A2D (test)	mIoU32		7

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord