DefVINS: Visual-Inertial Odometry for Deformable Scenes
About
Deformable scenes violate the rigidity assumptions underpinning classical visual--inertial odometry (VIO), often leading to over-fitting to local non-rigid motion or to severe camera pose drift when deformation dominates visual parallax. In this paper, we introduce DefVINS, the first visual-inertial odometry pipeline designed to operate in deformable environments. Our approach models the odometry state by decomposing it into a rigid, IMU-anchored component and a non-rigid scene warp represented by an embedded deformation graph. As a second contribution, we present VIMandala, the first benchmark containing real images and ground-truth camera poses for visual-inertial odometry in deformable scenes. In addition, we augment the synthetic Drunkard's benchmark with simulated inertial measurements to further evaluate our pipeline under controlled conditions. We also provide an observability analysis of the visual-inertial deformable odometry problem, characterizing how inertial measurements constrain camera motion and render otherwise unobservable modes identifiable in the presence of deformation. This analysis motivates the use of IMU anchoring and leads to a conditioning-based activation strategy that avoids ill-posed updates under poor excitation. Experimental results on both the synthetic Drunkard's and our real VIMandala benchmarks show that DefVINS outperforms rigid visual--inertial and non-rigid visual odometry baselines. Our source code and data will be released upon acceptance.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual-Inertial Odometry | Real Deformable Sequences Medium Deformation R2 | ATE RMSE (mm)10.2 | 5 | |
| Visual-Inertial Odometry | Real Deformable Sequences Medium Deformation R3 | ATE RMSE10.8 | 5 | |
| Visual-Inertial Odometry | Real Deformable Sequences Medium Deformation R4 | ATE RMSE (mm)11.4 | 5 | |
| Visual-Inertial Odometry | Real Deformable Sequences R5 (High Deformation) | ATE RMSE (mm)15.6 | 5 | |
| Visual-Inertial Odometry | Real Deformable Sequences High Deformation R6 | ATE RMSE (mm)19.8 | 5 | |
| Visual-Inertial SLAM | Drunkard's Dataset Medium deformation - L1 | ATE RMSE (mm)9.4 | 5 | |
| Visual-Inertial SLAM | Drunkard's Dataset Hard deformation - L2 | ATE RMSE (mm)14.3 | 5 | |
| Visual-Inertial SLAM | Drunkard's Dataset Extreme deformation - L3 | ATE RMSE (mm)19.6 | 5 | |
| Visual-Inertial Odometry | Real Deformable Sequences Low Deformation R0 | ATE RMSE8.1 | 5 | |
| Visual-Inertial Odometry | Real Deformable Sequences Low Deformation R1 | ATE RMSE (mm)9 | 5 |