Deep Reinforcement Learning in Parameterized Action Space
About
Recent work has shown that deep neural networks are capable of approximating both value functions and policies in reinforcement learning domains featuring continuous state and action spaces. However, to the best of our knowledge no previous work has succeeded at using deep neural networks in structured (parameterized) continuous action spaces. To fill this gap, this paper focuses on learning within the domain of simulated RoboCup soccer, which features a small set of discrete action types, each of which is parameterized with continuous variables. The best learned agent can score goals more reliably than the 2012 RoboCup champion agent. As such, this paper represents a successful extension of deep reinforcement learning to the class of parameterized action space MDPs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Half Field Offense | Half Field Offense (HFO) (evaluation) | P(Goal)0.923 | 7 | |
| Maze Navigation | Maze Navigation Easy | Success Rate88.2 | 7 | |
| Air-to-Air Combat | Air-to-Air Combat Hard | Success Rate61.8 | 7 | |
| Maze Navigation | Maze Navigation Medium | Success Rate74.2 | 7 | |
| Maze Navigation | Maze Navigation Hard | Success Rate38.8 | 7 | |
| Air-to-Air Combat | Air-to-Air Combat Easy | Success Rate79.6 | 7 | |
| Air-to-Air Combat | Air-to-Air Combat Medium | Success Rate68.6 | 7 | |
| Platform Control | Platform (Evaluation) | Return28.4 | 5 | |
| Robot Soccer | Robot Soccer Goal (Evaluation) | P(Goal)0.6 | 5 | |
| Reinforcement Learning | Recommender hybrid 343^10 | Mean Return1.74e+3 | 3 |