Visual Graphs from Motion (VGfM): Scene understanding with object geometry reasoning

About

Recent approaches on visual scene understanding attempt to build a scene graph -- a computational representation of objects and their pairwise relationships. Such rich semantic representation is very appealing, yet difficult to obtain from a single image, especially when considering complex spatial arrangements in the scene. Differently, an image sequence conveys useful information using the multi-view geometric relations arising from camera motion. Indeed, in such cases, object relationships are naturally related to the 3D scene structure. To this end, this paper proposes a system that first computes the geometrical location of objects in a generic scene and then efficiently constructs scene graphs from video by embedding such geometrical reasoning. Such compelling representation is obtained using a new model where geometric and visual features are merged using an RNN framework. We report results on a dataset we created for the task of 3D scene graph generation in multiple views.

Paul Gay, Stuart James, Alessio Del Bue• 2018

Related benchmarks

Task	Dataset	Result
3D scene graph generation	3DSSG (test)	Recall (Rel)19.6	28
Relationship Detection	3RScan	Old Recall@163	10
Object Detection	3RScan	R@1077	10
Predicate Detection	3RScan	R@336	10
Scene graph prediction	3RScan 20 object and 8 predicate classes (test)	Recall (Relationship)52	6
3D Scene Graph Prediction	3RScan 160 object and 26 predicate classes (test)	Recall (Rel.)63.3	6

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord