Unbiased Scene Graph Generation in Videos

About

The task of dynamic scene graph generation (SGG) from videos is complicated and challenging due to the inherent dynamics of a scene, temporal fluctuation of model predictions, and the long-tailed distribution of the visual relationships in addition to the already existing challenges in image-based SGG. Existing methods for dynamic SGG have primarily focused on capturing spatio-temporal context using complex architectures without addressing the challenges mentioned above, especially the long-tailed distribution of relationships. This often leads to the generation of biased scene graphs. To address these challenges, we introduce a new framework called TEMPURA: TEmporal consistency and Memory Prototype guided UnceRtainty Attenuation for unbiased dynamic SGG. TEMPURA employs object-level temporal consistencies via transformer-based sequence modeling, learns to synthesize unbiased relationship representations using memory-guided training, and attenuates the predictive uncertainty of visual relations using a Gaussian Mixture Model (GMM). Extensive experiments demonstrate that our method achieves significant (up to 10% in some cases) performance gain over existing methods highlighting its superiority in generating more unbiased scene graphs.

Sayak Nag, Kyle Min, Subarna Tripathi, Amit K. Roy Chowdhury• 2023

Related benchmarks

Task	Dataset	Result
PredCLS	Action Genome (test)	Recall@1080.4	76
Scene Graph Classification	Action Genome (test)	Recall@1056.3	55
Scene Graph Detection	Action Genome	Recall@1029.8	41
Scene Graph Detection (SGDet)	Action Genome v1.0 (test)	R@1029.8	32
Predicate Classification	Action Genome	Recall@1080.4	26
Scene Graph Detection (SGDet)	Action Genome (test)	R@1029.8	22
SGCLS	Action Genome (test)	Recall@1047.2	21
SGDET	Action Genome (test)	R@1028.1	14

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord