Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

OED: Towards One-stage End-to-End Dynamic Scene Graph Generation

About

Dynamic Scene Graph Generation (DSGG) focuses on identifying visual relationships within the spatial-temporal domain of videos. Conventional approaches often employ multi-stage pipelines, which typically consist of object detection, temporal association, and multi-relation classification. However, these methods exhibit inherent limitations due to the separation of multiple stages, and independent optimization of these sub-problems may yield sub-optimal solutions. To remedy these limitations, we propose a one-stage end-to-end framework, termed OED, which streamlines the DSGG pipeline. This framework reformulates the task as a set prediction problem and leverages pair-wise features to represent each subject-object pair within the scene graph. Moreover, another challenge of DSGG is capturing temporal dependencies, we introduce a Progressively Refined Module (PRM) for aggregating temporal context without the constraints of additional trackers or handcrafted trajectories, enabling end-to-end optimization of the network. Extensive experiments conducted on the Action Genome benchmark demonstrate the effectiveness of our design. The code and models are available at \url{https://github.com/guanw-pku/OED}.

Guan Wang, Zhimin Li, Qingchao Chen, Yang Liu• 2024

Related benchmarks

TaskDatasetResultRank
PredCLSAction Genome (test)
Recall@1083.3
54
Scene Graph ClassificationAction Genome (test)
Recall@1046.7
40
Scene Graph Detection (SGDet)Action Genome v1.0 (test)
R@1035.3
32
Scene Graph DetectionAction Genome
Recall@1035.3
30
Predicate ClassificationAction Genome
Recall@1083.3
26
Showing 5 of 5 rows

Other info

Code

Follow for update