Rectified Schr\"odinger Bridge Matching for Few-Step Visual Navigation
About
Visual navigation is a core challenge in Embodied AI, requiring autonomous agents to translate high-dimensional sensory observations into continuous, long-horizon action trajectories. While generative policies based on diffusion models and Schr\"odinger Bridges (SB) effectively capture multimodal action distributions, they require dozens of integration steps due to high-variance stochastic transport, posing a critical barrier for real-time robotic control. We propose Rectified Schr\"odinger Bridge Matching (RSBM), a framework that exploits a shared velocity-field structure between standard Schr\"odinger Bridges ($\varepsilon=1$, maximum-entropy transport) and deterministic Optimal Transport ($\varepsilon\to 0$, as in Conditional Flow Matching), controlled by a single entropic regularization parameter $\varepsilon$. We prove two key results: (1) the conditional velocity field's functional form is invariant across the entire $\varepsilon$-spectrum (Velocity Structure Invariance), enabling a single network to serve all regularization strengths; and (2) reducing $\varepsilon$ linearly decreases the conditional velocity variance, enabling more stable coarse-step ODE integration. Anchored to a learned conditional prior that shortens transport distance, RSBM operates at an intermediate $\varepsilon$ that balances multimodal coverage and path straightness. Empirically, while standard bridges require $\geq 10$ steps to converge, RSBM achieves over 94% cosine similarity and 92% success rate in merely 3 integration steps -- without distillation or multi-stage training -- substantially narrowing the gap between high-fidelity generative policies and the low-latency demands of Embodied AI.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Navigation | Custom Indoor | MSE1.72 | 11 | |
| Visual Navigation | Citysim Outdoor | MSE2.4 | 11 | |
| Action Prediction | HuRoN (test) | Action MSE0.24 | 8 | |
| Action Prediction | Recon (test) | Action MSE0.8 | 8 | |
| Action Prediction | SACSoN (test) | Action MSE1.32 | 8 | |
| Action Prediction | SCAND (test) | Action MSE0.47 | 8 | |
| Action Prediction | GoStanford (test) | Action MSE2.95 | 8 | |
| Visual Navigation | HuRON | MSE0.25 | 5 | |
| Visual Navigation | RECON | MSE0.82 | 5 | |
| Visual Navigation | SACSoN | MSE1.35 | 5 |