Segment Any 4D Gaussians
About
Modeling, understanding, and reconstructing the real world are crucial in XR/VR. Recently, 3D Gaussian Splatting (3D-GS) methods have shown remarkable success in modeling and understanding 3D scenes. Similarly, various 4D representations have demonstrated the ability to capture the dynamics of the 4D world. However, there is a dearth of research focusing on segmentation within 4D representations. In this paper, we propose Segment Any 4D Gaussians (SA4D), one of the first frameworks to segment anything in the 4D digital world based on 4D Gaussians. In SA4D, an efficient temporal identity feature field is introduced to handle Gaussian drifting, with the potential to learn precise identity features from noisy and sparse input. Additionally, a 4D segmentation refinement process is proposed to remove artifacts. Our SA4D achieves precise, high-quality segmentation within seconds in 4D Gaussians and shows the ability to remove, recolor, compose, and render high-quality anything masks. More demos are available at: https://jsxzs.github.io/sa4d/.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Open-vocabulary 4D querying | HyperNeRF espresso | mAcc99.56 | 6 | |
| Open-vocabulary 4D querying | HyperNeRF americano scene | Mean Accuracy88.91 | 6 | |
| 4D Gaussian Instance Segmentation | HyperNeRF | Time (min)19.71 | 6 | |
| 4D Gaussian Instance Segmentation | Neu3D | Time (min)19.71 | 6 | |
| Panoptic Segmentation | HyperNeRF espresso | Pixel Acc0.9197 | 5 | |
| Panoptic Segmentation | HyperNeRF torchocolate | mAcc (Pixel)92.87 | 5 | |
| 4D Scene Segmentation | Neural3DV | mIoU66.8 | 5 | |
| 4D Scene Segmentation | Multi-Human | mIoU0.592 | 5 | |
| Novel-view Panoptic Segmentation | Neu3D cook spinach | mAcc (Pixel)84.7 | 5 | |
| Novel-view Panoptic Segmentation | Neu3D cut roasted beef | Pixel Accuracy (mAcc-pix)74.26 | 5 |