Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning

About

This paper introduces Scene-LLM, a 3D-visual-language model that enhances embodied agents' abilities in interactive 3D indoor environments by integrating the reasoning strengths of Large Language Models (LLMs). Scene-LLM adopts a hybrid 3D visual feature representation, that incorporates dense spatial information and supports scene state updates. The model employs a projection layer to efficiently project these features in the pre-trained textual embedding space, enabling effective interpretation of 3D visual information. Unique to our approach is the integration of both scene-level and ego-centric 3D information. This combination is pivotal for interactive planning, where scene-level data supports global planning and ego-centric data is important for localization. Notably, we use ego-centric 3D frame features for feature alignment, an efficient technique that enhances the model's ability to align features of small objects within the scene. Our experiments with Scene-LLM demonstrate its strong capabilities in dense captioning, question answering, and interactive planning. We believe Scene-LLM advances the field of 3D visual understanding and reasoning, offering new possibilities for sophisticated agent interactions in indoor settings.

Rao Fu, Jingyu Liu, Xilun Chen, Yixin Nie, Wenhan Xiong• 2024

Related benchmarks

TaskDatasetResultRank
3D Question AnsweringScanQA (val)
CIDEr80
133
3D Question AnsweringSQA3D (test)
EM@154.2
55
3D Situated Question AnsweringSQA3D (test)
Average Accuracy54.2
40
3D Question AnsweringScanQA v1.0 (test)
ROUGE40
26
Instruction FollowingALFRED (test-unseen)
GC33.75
23
3D Dense CaptioningScan2Cap--
23
3D Question AnsweringScanQA
C Score80
16
Embodied Task CompletionALFRED seen (test)
Success Rate (SR)26.52
14
Situated 3D Question AnsweringSQA3D (test)
EM@154.2
12
3D Question AnsweringSQA3D
EM53.6
11
Showing 10 of 17 rows

Other info

Follow for update