EndoGSim: Physics-Aware 4D Dynamic Endoscopic Scene Simulations via MLLM-Guided Gaussian Splatting
About
In robot-assisted minimally invasive surgery, high-fidelity dynamic endoscopic scene reconstruction and simulation are crucial to enhancing downstream tasks and advancing surgical outcomes. However, existing methods primarily focus on visual reconstruction, lacking physics-based descriptions of the scene required for realistic simulation. We propose a unified framework that achieves physics-aware reconstruction and physical simulation of endoscopic scenes through Multi-modal Large Language Models (MLLMs)-guided Gaussian Splatting. Our approach utilizes 4D Gaussian Splatting (4DGS) integrated with pre-trained segmentation and depth estimation to represent deformable tissues and tools. To achieve automatic inference of physical properties, we introduce an object-wise material field that initializes material parameters via MLLM and refines them through a differentiable Material Point Method (MPM) under joint supervision from rendered images and optical flow. Validated on both open-source and in-house datasets, our framework achieves superior simulation fidelity and physical accuracy compared to state-of-the-art methods, underscoring its potential to advance robot-assisted surgical applications.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Physical Realism Assessment | Surgical Video Dataset (EndoNeRF, CholecSeg8K, and PorcineEndo) (test) | Physical Realism Score4.02 | 5 | |
| System Identification | EndoNeRF v01_080 | RE0.081 | 5 | |
| System Identification | EndoNeRF v01_240 | RE0.035 | 5 | |
| System Identification | CholecSeg8K pulling | RE (Error Rate)0.26 | 5 | |
| System Identification | CholecSeg8K cutting | Registration Error (RE)0.146 | 5 | |
| System Identification | PorcineEndo gallbladder | RE0.165 | 5 | |
| System Identification | PorcineEndo stomach | Residual Error (RE)0.147 | 5 | |
| System Identification | Full Combined Dataset Average | RE0.139 | 5 |