Semantic Compositional Learning for Low-shot Scene Graph Generation

About

Scene graphs provide valuable information to many downstream tasks. Many scene graph generation (SGG) models solely use the limited annotated relation triples for training, leading to their underperformance on low-shot (few and zero) scenarios, especially on the rare predicates. To address this problem, we propose a novel semantic compositional learning strategy that makes it possible to construct additional, realistic relation triples with objects from different images. Specifically, our strategy decomposes a relation triple by identifying and removing the unessential component and composes a new relation triple by fusing with a semantically or visually similar object from a visual components dictionary, whilst ensuring the realisticity of the newly composed triple. Notably, our strategy is generic and can be combined with existing SGG models to significantly improve their performance. We performed a comprehensive evaluation on the benchmark dataset Visual Genome. For three recent SGG models, adding our strategy improves their performance by close to 50\%, and all of them substantially exceed the current state-of-the-art.

Tao He, Lianli Gao, Jingkuan Song, Jianfei Cai, Yuan-Fang Li• 2021

Related benchmarks

Task	Dataset	Result	Rank
Predicate Classification	VG 50 (test)	Mean Recall@5035.9		29
Scene Graph Detection	VG 50 (test)	mR@5011.2		27

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord