Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SegDAC: Visual Generalization in Reinforcement Learning via Dynamic Object Tokens

About

Visual reinforcement learning policies trained on pixel observations often struggle to generalize when visual conditions change at test time. Object-centric representations are a promising alternative, but most approaches use fixed-size slot representations, require image reconstruction, or need auxiliary losses to learn object decompositions. As a result, it remains unclear how to learn RL policies directly from object-level inputs without these constraints. We propose SegDAC, a Segmentation-Driven Actor-Critic that operates on a variable-length set of object token embeddings. At each timestep, text-grounded segmentation produces object masks from which spatially aware token embeddings are extracted. A transformer-based actor-critic processes these dynamic tokens, using segment positional encoding to preserve spatial information across objects. We ablate these design choices and show that both segment positional encoding and variable-length processing are individually necessary for strong performance. We evaluate SegDAC on 8 ManiSkill3 manipulation tasks under 12 visual perturbation types across 3 difficulty levels. SegDAC improves over prior visual generalization methods by 15% on easy, 66% on medium, and 88% on the hardest settings. SegDAC matches the sample efficiency of the state-of-the-art visual RL methods while achieving improved generalization under visual changes. Project Page: https://segdac.github.io/

Alexandre Brown, Glen Berseth• 2025

Related benchmarks

TaskDatasetResultRank
LiftPegUprightManiSkill3 Medium Lighting Direction v1 (test)
Success Rate42
7
LiftPegUprightManiSkill3 Hard Mo Texture v1 (test)
Return28
7
LiftPegUprightManiSkill3 (Hard Ground Color Test)
Success Rate38
7
LiftPegUprightManiSkill3 Easy Mo Color v1 (test)
Success Rate40
7
LiftPegUprightManiSkill Medium Mo Texture 3 (test)
Success Rate30
7
LiftPegUprightManiSkill3 Medium Lighting Color v1 (test)
Success Rate43
7
LiftPegUprightManiSkill3 Hard Mo Color (test)
Success Rate39
7
LiftPegUprightManiSkill Easy Camera Fov v3 (test)
Success Rate27
7
LiftPegUprightManiSkill3 Hard Ground Texture v1 (test)
Success Rate40
7
PickCubeManiSkill3 Medium Table Color (test)
Success Rate17
7
Showing 10 of 232 rows
...

Other info

Follow for update