Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning

About

Recent advances in CV and NLP have been largely driven by scaling up the number of network parameters, despite traditional theories suggesting that larger networks are prone to overfitting. These large networks avoid overfitting by integrating components that induce a simplicity bias, guiding models toward simple and generalizable solutions. However, in deep RL, designing and scaling up networks have been less explored. Motivated by this opportunity, we present SimBa, an architecture designed to scale up parameters in deep RL by injecting a simplicity bias. SimBa consists of three components: (i) an observation normalization layer that standardizes inputs with running statistics, (ii) a residual feedforward block to provide a linear pathway from the input to output, and (iii) a layer normalization to control feature magnitudes. By scaling up parameters with SimBa, the sample efficiency of various deep RL algorithms-including off-policy, on-policy, and unsupervised methods-is consistently improved. Moreover, solely by integrating SimBa architecture into SAC, it matches or surpasses state-of-the-art deep RL methods with high computational efficiency across DMC, MyoSuite, and HumanoidBench. These results demonstrate SimBa's broad applicability and effectiveness across diverse RL algorithms and environments.

Hojoon Lee, Dongyoon Hwang, Donghu Kim, Hyunseung Kim, Jun Jet Tai, Kaushik Subramanian, Peter R. Wurman, Jaegul Choo, Peter Stone, Takuma Seno• 2024

Related benchmarks

TaskDatasetResultRank
LocomotionDog & Humanoid suite
IQM0.773
32
Dexterous ManipulationMyoSuite
IQM0.952
28
Humanoid Locomotion and ManipulationHumanoidBench
IQM0.521
28
Continuous ControlDeepMind Control (DMC) Suite (100k steps)
IQM0.12
8
Continuous ControlDeepMind Control (DMC) Suite 200k steps
IQM26.3
8
Continuous ControlDeepMind Control (DMC) Suite 500k steps
IQM52.2
8
Continuous ControlDeepMind Control (DMC) Suite (1M steps)
IQM69.1
8
LocomotionHumanoidBench 1.0 (test)
Balance Hard145.9
7
Reinforcement LearningDeepMind Control Suite (DMC) Hard Tasks (test)
Dog Run544.9
7
Reinforcement LearningDeepMind Control Suite Easy & Medium
Acrobot Swingup390.8
7
Showing 10 of 11 rows

Other info

Follow for update