DSAC: Distributional Soft Actor-Critic for Risk-Sensitive Reinforcement Learning
About
We present Distributional Soft Actor-Critic (DSAC), a distributional reinforcement learning (RL) algorithm that combines the strengths of distributional information of accumulated rewards and entropy-driven exploration from Soft Actor-Critic (SAC) algorithm. DSAC models the randomness in both action and rewards, surpassing baseline performances on various continuous control tasks. Unlike standard approaches that solely maximize expected rewards, we propose a unified framework for risk-sensitive learning, one that optimizes the risk-related objective while balancing entropy to encourage exploration. Extensive experiments demonstrate DSAC's effectiveness in enhancing agent performances for both risk-neutral and risk-sensitive control tasks.
Xiaoteng Ma, Junyao Chen, Li Xia, Jun Yang, Qianchuan Zhao, Zhengyuan Zhou• 2020
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Continuous Control | DeepMind Control Suite Vision Cheetah-Run (test) | AULC770.5 | 5 | |
| Continuous Control | DMC Vision Finger-Turn Hard (test) | AULC661.1 | 5 | |
| Continuous Control | DeepMind Control Suite Vision Quadruped-Run (test) | AULC550.2 | 5 | |
| Continuous Control | DMC Vision Reacher-Hard (test) | AULC773.1 | 5 | |
| Robot navigation | Risky PointMass (test) | Mean Return-7.69 | 5 | |
| Continuous Control | DMC Vision Walker-Run (test) | AULC509.5 | 5 | |
| Robot navigation | Risky Ant (test) | Mean Return-866.1 | 5 | |
| Locomotion | DeepMind Control Suite Walker Run | AULC637.6 | 4 | |
| Soft Robot Control | EvoGym Bidirectionalwalker V0 | AULC4.68 | 4 | |
| Locomotion | DeepMind Control suite Dog-Walk | AULC468.3 | 4 |
Showing 10 of 21 rows