DSAC: Distributional Soft Actor-Critic for Risk-Sensitive Reinforcement Learning
About
We present Distributional Soft Actor-Critic (DSAC), a distributional reinforcement learning (RL) algorithm that combines the strengths of distributional information of accumulated rewards and entropy-driven exploration from Soft Actor-Critic (SAC) algorithm. DSAC models the randomness in both action and rewards, surpassing baseline performances on various continuous control tasks. Unlike standard approaches that solely maximize expected rewards, we propose a unified framework for risk-sensitive learning, one that optimizes the risk-related objective while balancing entropy to encourage exploration. Extensive experiments demonstrate DSAC's effectiveness in enhancing agent performances for both risk-neutral and risk-sensitive control tasks.
Xiaoteng Ma, Junyao Chen, Li Xia, Jun Yang, Qianchuan Zhao, Zhengyuan Zhou• 2020
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Continuous Control | DeepMind Control Suite (DMC) | Cheetah Run753 | 15 | |
| Continuous Control | MuJoCo v5 | Ant Score776 | 15 | |
| Continuous Control | Mujoco | Ant-v5776 | 9 | |
| Continuous Control | DeepMind Control Suite Vision Cheetah-Run (test) | AULC770.5 | 5 | |
| Continuous Control | DMC Vision Finger-Turn Hard (test) | AULC661.1 | 5 | |
| Continuous Control | DeepMind Control Suite Vision Quadruped-Run (test) | AULC550.2 | 5 | |
| Continuous Control | DMC Vision Reacher-Hard (test) | AULC773.1 | 5 | |
| Robot navigation | Risky PointMass (test) | Mean Return-7.69 | 5 | |
| Continuous Control | DMC Vision Walker-Run (test) | AULC509.5 | 5 | |
| Continuous Control | DMC | Cheetah-run Score753 | 5 |
Showing 10 of 25 rows