Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Time-Constrained Robust MDPs

About

Robust reinforcement learning is essential for deploying reinforcement learning algorithms in real-world scenarios where environmental uncertainty predominates. Traditional robust reinforcement learning often depends on rectangularity assumptions, where adverse probability measures of outcome states are assumed to be independent across different states and actions. This assumption, rarely fulfilled in practice, leads to overly conservative policies. To address this problem, we introduce a new time-constrained robust MDP (TC-RMDP) formulation that considers multifactorial, correlated, and time-dependent disturbances, thus more accurately reflecting real-world dynamics. This formulation goes beyond the conventional rectangularity paradigm, offering new perspectives and expanding the analytical framework for robust RL. We propose three distinct algorithms, each using varying levels of environmental information, and evaluate them extensively on continuous control benchmarks. Our results demonstrate that these algorithms yield an efficient tradeoff between performance and robustness, outperforming traditional deep robust RL methods in time-constrained environments while preserving robustness in classical benchmarks. This study revisits the prevailing assumptions in robust RL and opens new avenues for developing more practical and realistic RL applications.

Adil Zouitine, David Bertoin, Pierre Clavier, Matthieu Geist, Emmanuel Rachelson• 2024

Related benchmarks

TaskDatasetResultRank
Reinforcement LearningMuJoCo HumanoidStandup
Average Performance1.31e+5
24
Reinforcement LearningMuJoCo Half-Cheetah
Average Return9.54e+3
18
Reinforcement LearningMuJoCo Ant
Average Return7.89e+3
14
Reinforcement LearningMuJoCo Walker
Average Return5.81e+3
14
Reinforcement LearningMuJoCo Hopper
Average Return3.28e+3
14
Continuous ControlMuJoCo v2 (test)
Ant Score1.78
12
Continuous ControlAnt MuJoCo (test)
Worst-case Performance7.53e+3
12
Continuous ControlHalfCheetah MuJoCo (test)
Worst-case Performance7.53e+3
12
Continuous ControlHopper (MuJoCo) (test)
Worst-case Performance3.39e+3
12
Continuous ControlHumanoidStandup MuJoCo (test)
Worst Case Performance1.29e+5
12
Showing 10 of 46 rows

Other info

Follow for update