Consistent Instance Field for Dynamic Scene Understanding
About
We introduce Consistent Instance Field, a continuous and probabilistic spatio-temporal representation for dynamic scene understanding. Unlike prior methods that rely on discrete tracking or view-dependent features, our approach disentangles visibility from persistent object identity by modeling each space-time point with an occupancy probability and a conditional instance distribution. To realize this, we introduce a novel instance-embedded representation based on deformable 3D Gaussians, which jointly encode radiance and semantic information and are learned directly from input RGB images and instance masks through differentiable rasterization. Furthermore, we introduce new mechanisms to calibrate per-Gaussian identities and resample Gaussians toward semantically active regions, ensuring consistent instance representations across space and time. Experiments on HyperNeRF and Neu3D datasets demonstrate that our method significantly outperforms state-of-the-art methods on novel-view panoptic segmentation and open-vocabulary 4D querying tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Open-vocabulary 4D querying | HyperNeRF americano scene | Mean Accuracy99.02 | 6 | |
| Open-vocabulary 4D querying | HyperNeRF espresso | mAcc99.73 | 6 | |
| Novel-view Panoptic Segmentation | Neu3D coffee martini | mAcc (Pixel)96.07 | 5 | |
| Novel-view Panoptic Segmentation | Neu3D cook spinach | mAcc (Pixel)96.63 | 5 | |
| Novel-view Panoptic Segmentation | Neu3D cut roasted beef | Pixel Accuracy (mAcc-pix)95.12 | 5 | |
| Novel-view Panoptic Segmentation | Neu3D flame salmon | mAcc (Pixel)91.31 | 5 | |
| Novel-view Panoptic Segmentation | Neu3D flame steak | Pixel Acc95.31 | 5 | |
| Novel-view Panoptic Segmentation | Neu3D sear steak | mAcc (Pixel)95.36 | 5 | |
| Panoptic Segmentation | HyperNeRF americano | Pixel Accuracy98.4 | 5 | |
| Panoptic Segmentation | HyperNeRF split-cookie | mAcc (pix)97.93 | 5 |