Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations

About

A critical flaw of existing inverse reinforcement learning (IRL) methods is their inability to significantly outperform the demonstrator. This is because IRL typically seeks a reward function that makes the demonstrator appear near-optimal, rather than inferring the underlying intentions of the demonstrator that may have been poorly executed in practice. In this paper, we introduce a novel reward-learning-from-observation algorithm, Trajectory-ranked Reward EXtrapolation (T-REX), that extrapolates beyond a set of (approximately) ranked demonstrations in order to infer high-quality reward functions from a set of potentially poor demonstrations. When combined with deep reinforcement learning, T-REX outperforms state-of-the-art imitation learning and IRL methods on multiple Atari and MuJoCo benchmark tasks and achieves performance that is often more than twice the performance of the best demonstration. We also demonstrate that T-REX is robust to ranking noise and can accurately extrapolate intention by simply watching a learner noisily improve at a task over time.

Daniel S. Brown, Wonjoon Goo, Prabhat Nagarajan, Scott Niekum• 2019

Related benchmarks

Task	Dataset	Result
Reinforcement Learning	Hopper v5	Average Return2.86e+3	101
Reinforcement Learning	Ant v5	Average Return3.66e+3	57
Reinforcement Learning	Halfcheetah v5	Average Return9.12e+3	47
Reinforcement Learning	Walker2D v5	Average Return4.42e+3	45
Multi-Agent Reinforcement Learning	StarCraft Multi-Agent Challenge (SMAC)	1c3s5z Win Rate64.76	13
Reinforcement Learning	AdroitHandDoor v1	Average Return1.66e+3	12
Reinforcement Learning	AdroitHandRelocate v1	Average Return29	10
Reinforcement Learning	18 Confounded Environments Aggregate	Normalized Mean0.96	5
Multi-agent Decision Making	SMAC 1c3s unseen (test)	Win Rate23.53	3
Multi-agent Decision Making	SMAC 1c_vs_32zg unseen (test)	Win Rate11.41	3

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord