Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Location-Free Scene Graph Generation

About

Scene Graph Generation (SGG) is a visual understanding task, aiming to describe a scene as a graph of entities and their relationships with each other. Existing works rely on location labels in form of bounding boxes or segmentation masks, increasing annotation costs and limiting dataset expansion. Recognizing that many applications do not require location data, we break this dependency and introduce location-free scene graph generation (LF-SGG). This new task aims at predicting instances of entities, as well as their relationships, without the explicit calculation of their spatial localization. To objectively evaluate the task, the predicted and ground truth scene graphs need to be compared. We solve this NP-hard problem through an efficient branching algorithm. Additionally, we design the first LF-SGG method, Pix2SG, using autoregressive sequence modeling. We demonstrate the effectiveness of our method on three scene graph generation datasets as well as two downstream tasks, image retrieval and visual question answering, and show that our approach is competitive to existing methods while not relying on location cues.

Ege \"Ozsoy, Felix Holm, Mahdi Saleh, Tobias Czempiel, Chantal Pellegrini, Nassir Navab, Benjamin Busam• 2023

Related benchmarks

TaskDatasetResultRank
Scene Graph GenerationVisual Genome (test)
R@500.2692
86
Scene Graph GenerationPSG dataset
Recall@2035.54
10
Image RetrievalVisual Genome
R@2038.3
6
Location-Free Scene Graph Generation4D-OR
Precision89
4
Visual Question AnsweringCOCOVQA zero-shot 2015
Acc (Open)28.27
3
Showing 5 of 5 rows

Other info

Code

Follow for update