Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Generalizable Visual Reinforcement Learning with Segment Anything Model

About

Learning policies that can generalize to unseen environments is a fundamental challenge in visual reinforcement learning (RL). While most current methods focus on acquiring robust visual representations through auxiliary supervision, pre-training, or data augmentation, the potential of modern vision foundation models remains underleveraged. In this work, we introduce Segment Anything Model for Generalizable visual RL (SAM-G), a novel framework that leverages the promptable segmentation ability of Segment Anything Model (SAM) to enhance the generalization capabilities of visual RL agents. We utilize image features from DINOv2 and SAM to find correspondence as point prompts to SAM, and then SAM produces high-quality masked images for agents directly. Evaluated across 8 DMControl tasks and 3 Adroit tasks, SAM-G significantly improves the visual generalization ability without altering the RL agents' architecture but merely their observations. Notably, SAM-G achieves 44% and 29% relative improvements on the challenging video hard setting on DMControl and Adroit respectively, compared to state-of-the-art methods. Video and code: https://yanjieze.com/SAM-G/

Ziyu Wang, Yanjie Ze, Yifei Sun, Zhecheng Yuan, Huazhe Xu• 2023

Related benchmarks

TaskDatasetResultRank
LiftPegUprightManiSkill3 Medium Table Color (test)
Success Rate39
7
LiftPegUprightManiSkill3 Easy Camera Pose v1 (test)
Success Rate25
7
LiftPegUprightManiSkill3 Easy Lighting Color (test)
Success Rate42
7
LiftPegUprightManiSkill3 Hard Lighting Direction
Success Rate40
7
LiftPegUprightManiSkill3 Hard Table Texture v1 (test)
Success Rate22
7
LiftPegUprightManiSkill Medium Table Texture 3
Success Rate43
7
LiftPegUprightManiSkill3 Easy Table Texture v1 (test)
Success Rate41
7
LiftPegUprightManiSkill3 Easy Ground Color (test)
Success Rate43
7
LiftPegUprightManiSkill3 Easy Mo Texture v1 (test)
Success Rate31
7
LiftPegUprightManiSkill3 Hard Table Color v1 (test)
Normalized Return0.41
7
Showing 10 of 232 rows
...

Other info

Follow for update