Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Distributional Inverse Reinforcement Learning

About

We propose a distributional framework for offline Inverse Reinforcement Learning (IRL) that jointly models uncertainty over reward functions and full distributions of returns. Unlike conventional IRL approaches that recover a deterministic reward estimate or match only expected returns, our method captures richer structure in expert behavior, particularly in learning the reward distribution, by minimizing first-order stochastic dominance (FSD) violations and thus integrating distortion risk measures (DRMs) into policy learning, enabling the recovery of both reward distributions and distribution-aware policies. This formulation is well-suited for behavior analysis and risk-aware imitation learning. Theoretical analysis shows that the algorithm converges with $\mathcal{O}(\varepsilon^{-2})$ iteration complexity. Empirical results on synthetic benchmarks, real-world neurobehavioral data, and MuJoCo control tasks demonstrate that our method recovers expressive reward representations and achieves state-of-the-art performance.

Feiyang Wu, Ye Zhao, Anqi Wu• 2025

Related benchmarks

TaskDatasetResultRank
Inverse Reinforcement LearningD4RL walker2d
Return1.53e+3
6
Inverse Reinforcement LearningD4RL halfcheetah-medium-expert
Return1.12e+4
6
Inverse Reinforcement LearningD4RL HalfCheetah
Return3.47e+3
6
Inverse Reinforcement LearningD4RL hopper
Return886
6
Inverse Reinforcement LearningD4RL hopper-medium-expert
Return3.41e+3
5
Inverse Reinforcement LearningD4RL walker2d-medium-expert
Return4.57e+3
5
Showing 6 of 6 rows

Other info

Follow for update