A Generalist Agent

About

Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato.

Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, Tom Eccles, Jake Bruce, Ali Razavi, Ashley Edwards, Nicolas Heess, Yutian Chen, Raia Hadsell, Oriol Vinyals, Mahyar Bordbar, Nando de Freitas• 2022

Related benchmarks

Task	Dataset	Result
Robotic Manipulation	Meta-World	Average Success Rate87	27
Robotic Manipulation	VIMA-Bench 1.0 (test)	L1 Score57	14
Robotic Manipulation	VIMA-Bench	Task 1 Score50.7	13
Long-Horizon Robotics Manipulation	Paint-block (Seen)	Success Rate31.2	8
Long-Horizon Robotics Manipulation	Paint-block (Unseen)	Success Rate28.6	8
Long-Horizon Robotics Manipulation	Object-arrange (Seen)	Success Rate37.9	8
Long-Horizon Robotics Manipulation	Object-arrange (Unseen)	Success Rate36.5	8
Long-Horizon Robotics Manipulation	Kitchen-tasks (Seen)	Success Rate0.702	7
Long-Horizon Robotics Manipulation	Kitchen-tasks (Unseen)	Success Rate66.8	7
Offline Multi-Agent Reinforcement Learning	SMAC	3s5z Win Rate72	5

Showing 10 of 29 rows

Other info

Follow for update

@wizwand_team Discord