Target Adaptive Context Aggregation for Video Scene Graph Generation

About

This paper deals with a challenging task of video scene graph generation (VidSGG), which could serve as a structured video representation for high-level understanding tasks. We present a new {\em detect-to-track} paradigm for this task by decoupling the context modeling for relation prediction from the complicated low-level entity tracking. Specifically, we design an efficient method for frame-level VidSGG, termed as {\em Target Adaptive Context Aggregation Network} (TRACE), with a focus on capturing spatio-temporal context information for relation recognition. Our TRACE framework streamlines the VidSGG pipeline with a modular design, and presents two unique blocks of Hierarchical Relation Tree (HRTree) construction and Target-adaptive Context Aggregation. More specific, our HRTree first provides an adpative structure for organizing possible relation candidates efficiently, and guides context aggregation module to effectively capture spatio-temporal structure information. Then, we obtain a contextualized feature representation for each relation candidate and build a classification head to recognize its relation category. Finally, we provide a simple temporal association strategy to track TRACE detected results to yield the video-level VidSGG. We perform experiments on two VidSGG benchmarks: ImageNet-VidVRD and Action Genome, and the results demonstrate that our TRACE achieves the state-of-the-art performance. The code and models are made available at \url{https://github.com/MCG-NJU/TRACE}.

Yao Teng, Limin Wang, Zhifeng Li, Gangshan Wu• 2021

Related benchmarks

Task	Dataset	Result
PredCLS	Action Genome (test)	Recall@1073.3	76
Relation Detection	VRD (test)	R@509.08	75
Scene Graph Classification	Action Genome (test)	Recall@1037.1	55
Scene Graph Detection	Action Genome	Recall@1026.5	41
Scene Graph Detection (SGDet)	Action Genome v1.0 (test)	R@1026.5	32
Predicate Classification	Action Genome	Recall@1072.6	26
Scene Graph Detection (SGDet)	Action Genome (test)	R@1027.5	22
SGCLS	Action Genome (test)	Recall@1014.8	21
Relation Tagging	VidVRD v1.0 (test)	P@545.3	18
Relation Detection	VidVRD v1.0 (test)	R@509.08	18

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord