Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Effective Reinforcement Learning Control using Conservative Soft Actor-Critic

About

Reinforcement Learning (RL) has shown great potential in complex control tasks, particularly when combined with deep neural networks within the Actor-Critic (AC) framework. However, in practical applications, balancing exploration, learning stability, and sample efficiency remains a significant challenge. Traditional methods such as Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) address these issues by incorporating entropy or relative entropy regularization, but often face problems of instability and low sample efficiency. In this paper, we propose the Conservative Soft Actor-Critic (CSAC) algorithm, which seamlessly integrates entropy and relative entropy regularization within the AC framework. CSAC improves exploration through entropy regularization while avoiding overly aggressive policy updates with the use of relative entropy regularization. Evaluations on benchmark tasks and real-world robotic simulations demonstrate that CSAC offers significant improvements in stability and efficiency over existing methods. These findings suggest that CSAC provides strong robustness and application potential in control tasks under dynamic environments.

Zhiwei Shang, Xinyi Yuan, Wenjun Huang, Yunduan Cui, Di Chen, Meixin Zhu• 2025

Related benchmarks

TaskDatasetResultRank
Continuous ControlHalfCheetah v4
Max Average Return1.17e+4
5
Continuous ControlWalker2d v4
Average Return4.11e+3
5
Continuous ControlAnt v4
Average Return5.54e+3
5
Robot ControlQuadX-Waypoints v1
Max Average Return530.3
5
Robot ControlPandaReach v2
Max Average Return-2.04
5
Continuous ControlHopper v4
Maximum Average Return3.46e+3
5
Continuous ControlWalker2d v4
Number of Interactions (10^4 steps)45
5
Continuous ControlAnt v4
Interaction Count ($10^4$ steps)22.5
5
Showing 8 of 8 rows

Other info

Follow for update