g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasks

About

We introduce Generalizable 3D-Language Feature Fields (g3D-LF), a 3D representation model pre-trained on large-scale 3D-language dataset for embodied tasks. Our g3D-LF processes posed RGB-D images from agents to encode feature fields for: 1) Novel view representation predictions from any position in the 3D scene; 2) Generations of BEV maps centered on the agent; 3) Querying targets using multi-granularity language within the above-mentioned representations. Our representation can be generalized to unseen environments, enabling real-time construction and dynamic updates. By volume rendering latent features along sampled rays and integrating semantic and spatial relationships through multiscale encoders, our g3D-LF produces representations at different scales and perspectives, aligned with multi-granularity language, via multi-level contrastive learning. Furthermore, we prepare a large-scale 3D-language dataset to align the representations of the feature fields with language. Extensive experiments on Vision-and-Language Navigation under both Panorama and Monocular settings, Zero-shot Object Navigation, and Situated Question Answering tasks highlight the significant advantages and effectiveness of our g3D-LF for embodied tasks.

Zihan Wang, Gim Hee Lee• 2024

Related benchmarks

Task	Dataset	Result
Vision-Language Navigation	R2R-CE (val-unseen)	Success Rate (SR)61	677
Vision-Language Navigation	RxR-CE (val-unseen)	SR57.1	426
Vision-and-Language Navigation	REVERIE (val unseen)	SPL23.8	225
Vision-and-Language Navigation	R2R-CE (test-unseen)	SR58	63
Vision-and-Language Navigation	R2R-CE v1.0 (val unseen)	SR (Success Rate)61	44
Vision-Language Navigation	HA-VLN Unseen (val)	SR27	32
Embodied Navigation	R2R-CE	Navigation Error (NE)5.7	19
Vision-Language Navigation	HA-VLN Seen (val)	NE5.12	16
Vision-and-Language Navigation	REVERIE CE (val unseen)	NE6.5	8
Embodied Navigation	NavRAG-CE	Navigation Error (NE)8.85	5

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord