Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Noise-conditioned Energy-based Annealed Rewards (NEAR): A Generative Framework for Imitation Learning from Observation

About

This paper introduces a new imitation learning framework based on energy-based generative models capable of learning complex, physics-dependent, robot motion policies through state-only expert motion trajectories. Our algorithm, called Noise-conditioned Energy-based Annealed Rewards (NEAR), constructs several perturbed versions of the expert's motion data distribution and learns smooth, and well-defined representations of the data distribution's energy function using denoising score matching. We propose to use these learnt energy functions as reward functions to learn imitation policies via reinforcement learning. We also present a strategy to gradually switch between the learnt energy functions, ensuring that the learnt rewards are always well-defined in the manifold of policy-generated samples. We evaluate our algorithm on complex humanoid tasks such as locomotion and martial arts and compare it with state-only adversarial imitation learning algorithms like Adversarial Motion Priors (AMP). Our framework sidesteps the optimisation challenges of adversarial imitation learning techniques and produces results comparable to AMP in several quantitative metrics across multiple imitation settings.

Anish Abhijit Diwan, Julen Urain, Jens Kober, Jan Peters• 2025

Related benchmarks

TaskDatasetResultRank
Robot ManipulationMeta-World
Button Success Rate76.8
9
Robot ManipulationRoboMimic ph
Lift Success Rate93.6
9
Imitation LearningImitation Learning Tasks Aggregate
IQM9.86
7
Inverse Reinforcement LearningAnt
Normalized Performance46
6
Inverse Reinforcement LearningHalf Cheetah
Normalized Performance9
6
Inverse Reinforcement LearningHopper
Normalized Performance22
6
Inverse Reinforcement LearningPoint Maze
Normalized Performance0.28
6
Inverse Reinforcement LearningPoint Maze Flipped
Normalized Performance29
3
Inverse Reinforcement LearningAnt Disabled
Normalized Performance0.33
3
Inverse Reinforcement LearningHalf Cheetah Windy
Normalized Performance10
3
Showing 10 of 11 rows

Other info

Follow for update