Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents

About

We introduce AMAGO, an in-context Reinforcement Learning (RL) agent that uses sequence models to tackle the challenges of generalization, long-term memory, and meta-learning. Recent works have shown that off-policy learning can make in-context RL with recurrent policies viable. Nonetheless, these approaches require extensive tuning and limit scalability by creating key bottlenecks in agents' memory capacity, planning horizon, and model size. AMAGO revisits and redesigns the off-policy in-context approach to successfully train long-sequence Transformers over entire rollouts in parallel with end-to-end RL. Our agent is scalable and applicable to a wide range of problems, and we demonstrate its strong performance empirically in meta-RL and long-term memory domains. AMAGO's focus on sparse rewards and off-policy data also allows in-context learning to extend to goal-conditioned problems with challenging exploration. When combined with a multi-goal hindsight relabeling scheme, AMAGO can solve a previously difficult category of open-world domains, where agents complete many possible instructions in procedurally generated environments.

Jake Grigsby, Linxi Fan, Yuke Zhu• 2023

Related benchmarks

TaskDatasetResultRank
Multi-Agent Reinforcement LearningSMAC 3s5z vs 3s6z v2
Win Rate0.775
15
Multi-Agent Reinforcement LearningSMAC 5m_vs_6m v2
Win Rate49.5
15
Multi-Agent Reinforcement LearningSMAC corridor v2
Win Rate76.3
15
PushMeta-World ML-1 (test)
Success Rate0.87
12
PushMetaWorld ML1 Push OOD (interpolation)
Average Success Rate98
9
PushMetaWorld ML1 Push-OOD-Extra (extrapolation)
Average Success Rate83
9
ReachMetaWorld ML1 Reach-OOD (interpolation)
Average Success Rate93
9
ReachMetaWorld ML1 Reach
Average Success Rate71
9
ReachMetaWorld ML1 Reach-OOD-Extra (extrapolation)
Success Rate43
9
Actuator InversionBallInCup (train)
AER728
8
Showing 10 of 48 rows

Other info

Follow for update