Inst4DGS: Instance-Decomposed 4D Gaussian Splatting with Multi-Video Label Permutation Learning
About
We present Inst4DGS, an instance-decomposed 4D Gaussian Splatting (4DGS) approach with long-horizon per-Gaussian trajectories. While dynamic 4DGS has advanced rapidly, instance-decomposed 4DGS remains underexplored, largely due to the difficulty of associating inconsistent instance labels across independently segmented multi-view videos. We address this challenge by introducing per-video label-permutation latents that learn cross-video instance matches through a differentiable Sinkhorn layer, enabling direct multi-view supervision with consistent identity preservation. This explicit label alignment yields sharp decision boundaries and temporally stable identities without identity drift. To further improve efficiency, we propose instance-decomposed motion scaffolds that provide low-dimensional motion bases per object for long-horizon trajectory optimization. Experiments on Panoptic Studio and Neural3DV show that Inst4DGS jointly supports tracking and instance decomposition while achieving state-of-the-art rendering and segmentation quality. On the Panoptic Studio dataset, Inst4DGS improves PSNR from 26.10 to 28.36, and instance mIoU from 0.6310 to 0.9129, over the strongest baseline.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 4D Scene Segmentation | Neural3DV | mIoU (coffee_martini)96.67 | 8 | |
| Photometric Rendering | Panoptic Studio | PSNR28.36 | 5 | |
| Photometric Rendering | Neural3DV | PSNR30.88 | 5 | |
| Instance Segmentation | Panoptic Studio | Basketball mIoU93.14 | 3 | |
| Instance Segmentation | Neural3DV | Coffee-Martini mIoU98.51 | 3 | |
| Instance-decomposed Reconstruction | Panoptic Studio | mIoU91.29 | 3 | |
| Instance-decomposed Reconstruction | Neural3DV | mIoU94.2 | 3 |