Semantic Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention

About

In this paper, we aim to learn a semantic radiance field from multiple scenes that is accurate, efficient and generalizable. While most existing NeRFs target at the tasks of neural scene rendering, image synthesis and multi-view reconstruction, there are a few attempts such as Semantic-NeRF that explore to learn high-level semantic understanding with the NeRF structure. However, Semantic-NeRF simultaneously learns color and semantic label from a single ray with multiple heads, where the single ray fails to provide rich semantic information. As a result, Semantic NeRF relies on positional encoding and needs to train one specific model for each scene. To address this, we propose Semantic Ray (S-Ray) to fully exploit semantic information along the ray direction from its multi-view reprojections. As directly performing dense attention over multi-view reprojected rays would suffer from heavy computational cost, we design a Cross-Reprojection Attention module with consecutive intra-view radial and cross-view sparse attentions, which decomposes contextual information along reprojected rays and cross multiple views and then collects dense connections by stacking the modules. Experiments show that our S-Ray is able to learn from multiple scenes, and it presents strong generalization ability to adapt to unseen scenes.

Fangfu Liu, Chubin Zhang, Yu Zheng, Yueqi Duan• 2023

Related benchmarks

Task	Dataset	Result
Novel View Synthesis	ScanNet	PSNR29.27	130
Semantic View Synthesis (Novel View)	ScanNet V2 (val)	mIoU56	12
Semantic segmentation	Replica synthetic (test)	Total Acc96.38	9
Semantic segmentation	ScanNet real (test)	Total Accuracy98.2	9
Semantic segmentation	ScanNet (10 held-out scenes)	mIoU60.4	9
Semantic novel view synthesis	ScanNet++ 10 scenes 1.0 (test)	mIoU46.7	9

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord