Reinforcement Learning with Latent Flow

About

Temporal information is essential to learning effective policies with Reinforcement Learning (RL). However, current state-of-the-art RL algorithms either assume that such information is given as part of the state space or, when learning from pixels, use the simple heuristic of frame-stacking to implicitly capture temporal information present in the image observations. This heuristic is in contrast to the current paradigm in video classification architectures, which utilize explicit encodings of temporal information through methods such as optical flow and two-stream architectures to achieve state-of-the-art performance. Inspired by leading video classification architectures, we introduce the Flow of Latents for Reinforcement Learning (Flare), a network architecture for RL that explicitly encodes temporal information through latent vector differences. We show that Flare (i) recovers optimal performance in state-based RL without explicit access to the state velocity, solely with positional state information, (ii) achieves state-of-the-art performance on pixel-based challenging continuous control tasks within the DeepMind control benchmark suite, namely quadruped walk, hopper hop, finger turn hard, pendulum swing, and walker run, and is the most sample efficient model-free pixel-based RL algorithm, outperforming the prior model-free state-of-the-art by 1.9X and 1.5X on the 500k and 1M step benchmarks, respectively, and (iv), when augmented over rainbow DQN, outperforms this state-of-the-art level baseline on 5 of 8 challenging Atari games at 100M time step benchmark.

Wenling Shang, Xiaofei Wang, Aravind Srinivas, Aravind Rajeswaran, Yang Gao, Pieter Abbeel, Michael Laskin• 2021

Related benchmarks

Task	Dataset	Result
Continuous Control	DeepMind Control (DMC) Suite (1M steps)	IQM54.62	14
Visual Reinforcement Learning	Atari 10	Assault Score846	10
Driving Policy	CARLA JW scenario	Episode Reward136	7
Driving Policy	CARLA HW scenario	Episode Reward138	7
Driving Policy	CARLA HB scenario	Episode Reward82	7
Visual Reinforcement Learning	DeepMind Control Suite 500K steps	Quadruped Walk Score296	6
Visual Reinforcement Learning	DMControl Walker Run (test)	Environment Reward426	5
Visual Reinforcement Learning	DMControl Hopper, Hop (test)	ER90	5
Visual Reinforcement Learning	DMControl Pendulum, Swingup (test)	Episode Reward (ER)242	5
Finger Turn hard	DMControl (test)	Score661	4

Showing 10 of 15 rows

Other info

Code

Follow for update

@wizwand_team Discord