CoRL-MPPI: Enhancing MPPI With Learnable Behaviours For Efficient And Provably-Safe Multi-Robot Collision Avoidance

About

Decentralized collision avoidance is a core challenge for scalable multi-robot systems. One of the promising approaches to tackle this problem is Model Predictive Path Integral (MPPI) -- a framework that naturally handles arbitrary motion models and provides strong theoretical guarantees. Still, in practice MPPI-based controller may provide suboptimal trajectories as its performance relies heavily on uninformed random sampling. In this work, we introduce CoRL-MPPI, a novel fusion of Cooperative Reinforcement Learning and MPPI to address this limitation. We train an action policy (approximated as deep neural network) in simulation that learns local cooperative collision avoidance behaviors. This learned policy is then embedded into the MPPI framework to guide its sampling distribution, biasing it towards more intelligent and cooperative actions. Notably, CoRL-MPPI preserves all the theoretical guarantees of regular MPPI. We evaluate our approach in dense, dynamic simulation environments against state-of-the-art baselines, such as ORCA, BVC, RL-RVO-NAV and classical MPPI. Our results demonstrate that CoRL-MPPI significantly improves navigation efficiency (measured by success rate and makespan) and safety, enabling agile and robust multi-robot navigation.

Stepan Dergachev, Artem Pshenitsyn, Aleksandr Panov, Alexey Skrynnik, Konstantin Yakovlev• 2025

Related benchmarks

Task	Dataset	Result
Multi-robot navigation	Random scenario	Success Rate (SR)100	6
Multi-robot navigation	Circle scenario	Success Rate (SR)100	6
Multi-robot navigation	Mesh Dense	Success Rate (SR)100	6
Multi-robot navigation	Corridor scenario	Success Rate100	6

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord