Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Beyond Pixels: Learning Invariant Rewards for Real-World Robotics From a Few Demonstrations

About

Designing reward functions that generalize beyond controlled laboratory settings remains a fundamental challenge in reinforcement learning for robotics. In open-world manipulation problems, a single task can appear in numerous variants through different object instances, positions, and camera viewpoints. Recent vision-based reward models tend to memorize specific pixel distributions and fail to generalize beyond their training conditions. To address this, we propose a framework that learns invariant symbolic reward functions from as few as five demonstrations. The insight is to shift from visual feature-fitting to the discovery of behavioral invariants: task-level properties that remain constant across diverse visual instantiations. The framework has two coupled components: a structural reward formulation that encodes task-level strategies and physical constraints while preserving optimal policy invariance, and a hybrid symbolic-numerical procedure that distills these invariants from demonstrations without online interaction. Experiments on eight Meta-World tasks and three Franka manipulation tasks demonstrate that our method achieves stronger process alignment and policy rollout ranking abilities compared to baselines, accelerating downstream policy learning. Three real-world out-of-distribution experiments further show that the same learned reward generalizes zero-shot to position, viewpoint, and object variations, enabling a single reward representation to be reused across diverse task variants in practice.

Tengye Xu, Yangting Sun, Ziju Shen, Guanqi Chen, Zhen Fu, Chen yizhou, Hua Chen, Jia Pan• 2026

Related benchmarks

TaskDatasetResultRank
Box OpenReal-world Franka Emika
Success Rate1
9
Bulb-UnscrewReal-world Franka Emika
Success Rate9
9
Peg-InsertReal-world Franka Emika
Success Rate100
9
Robotic ManipulationReal-world Box-Open Position OOD v1
Success Rate100
6
Robotic ManipulationReal-world Box-Open Object OOD v1
Success Rate90
6
Reward Model EvaluationMeta-World (train)
Procedural Alignment Correlation (ρ)0.97
5
Reward Model EvaluationMeta-World Position OOD
Process Alignment ρ0.85
5
Reward Model EvaluationMeta-World Viewpoint OOD
Process Alignment ρ0.88
5
Reward Model EvaluationMeta-World Object OOD
Process Alignment Correlation (ρ)0.81
5
Robotic ManipulationReal-world Box-Open Viewpoint OOD v1
Success Rate15
3
Showing 10 of 10 rows

Other info

Follow for update