Effective Reinforcement Learning Control using Conservative Soft Actor-Critic
About
Reinforcement Learning (RL) has shown great potential in complex control tasks, particularly when combined with deep neural networks within the Actor-Critic (AC) framework. However, in practical applications, balancing exploration, learning stability, and sample efficiency remains a significant challenge. Traditional methods such as Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) address these issues by incorporating entropy or relative entropy regularization, but often face problems of instability and low sample efficiency. In this paper, we propose the Conservative Soft Actor-Critic (CSAC) algorithm, which seamlessly integrates entropy and relative entropy regularization within the AC framework. CSAC improves exploration through entropy regularization while avoiding overly aggressive policy updates with the use of relative entropy regularization. Evaluations on benchmark tasks and real-world robotic simulations demonstrate that CSAC offers significant improvements in stability and efficiency over existing methods. These findings suggest that CSAC provides strong robustness and application potential in control tasks under dynamic environments.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Continuous Control | HalfCheetah v4 | Max Average Return1.17e+4 | 5 | |
| Continuous Control | Walker2d v4 | Average Return4.11e+3 | 5 | |
| Continuous Control | Ant v4 | Average Return5.54e+3 | 5 | |
| Robot Control | QuadX-Waypoints v1 | Max Average Return530.3 | 5 | |
| Robot Control | PandaReach v2 | Max Average Return-2.04 | 5 | |
| Continuous Control | Hopper v4 | Maximum Average Return3.46e+3 | 5 | |
| Continuous Control | Walker2d v4 | Number of Interactions (10^4 steps)45 | 5 | |
| Continuous Control | Ant v4 | Interaction Count ($10^4$ steps)22.5 | 5 |