ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations

About

We introduce ReWiND, a framework for learning robot manipulation tasks solely from language instructions without per-task demonstrations. Standard reinforcement learning (RL) and imitation learning methods require expert supervision through human-designed reward functions or demonstrations for every new task. In contrast, ReWiND starts from a small demonstration dataset to learn: (1) a data-efficient, language-conditioned reward function that labels the dataset with rewards, and (2) a language-conditioned policy pre-trained with offline RL using these rewards. Given an unseen task variation, ReWiND fine-tunes the pre-trained policy using the learned reward function, requiring minimal online interaction. We show that ReWiND's reward model generalizes effectively to unseen tasks, outperforming baselines by up to 2.4x in reward generalization and policy alignment metrics. Finally, we demonstrate that ReWiND enables sample-efficient adaptation to new tasks, beating baselines by 2x in simulation and improving real-world pretrained bimanual policies by 5x, taking a step towards scalable, real-world robot learning. See website at https://rewind-reward.github.io/.

Jiahui Zhang, Yusen Luo, Abrar Anwar, Sumedh Anand Sontakke, Joseph J Lim, Jesse Thomason, Erdem Biyik, Jesse Zhang• 2025

Related benchmarks

Task	Dataset	Result
Reward alignment	RBM-EVAL ID	Pearson r (VOC)0.46	14
Reward alignment	RBM-EVAL OOD	Pearson r (VOC)0.51	14
Box Open	Real-world Franka Emika	Success Rate1	9
Peg-Insert	Real-world Franka Emika	Success Rate100	9
Bulb-Unscrew	Real-world Franka Emika	Success Rate0.00e+0	9
Reward Prediction	10-task benchmark S1 classic	Demo L (MSE)0.026	8
Reward rollout alignment	10-task benchmark T1: Folding Shorts	Rollout ρ0.167	8
Trajectory Ranking	RBM OOD 1.0 (test)	Kendall's Tau-a0.01	8
Reward Prediction	S2 10-task benchmark unconventional	Demo L MSE0.044	7
Reward Prediction	10-task benchmark Overall	Demo L (MSE)0.036	7

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord