Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

An Empirical Risk Minimization Approach for Offline Inverse RL and Dynamic Discrete Choice Model

About

We study the problem of estimating Dynamic Discrete Choice (DDC) models, also known as offline Maximum Entropy-Regularized Inverse Reinforcement Learning (offline MaxEnt-IRL) in machine learning. The objective is to recover reward or $Q^*$ functions that govern agent behavior from offline behavior data. In this paper, we propose a globally convergent gradient-based method for solving these problems without the restrictive assumption of linearly parameterized rewards. The novelty of our approach lies in introducing the Empirical Risk Minimization (ERM) based IRL/DDC framework, which circumvents the need for explicit state transition probability estimation in the Bellman equation. Furthermore, our method is compatible with non-parametric estimation techniques such as neural networks. Therefore, the proposed method has the potential to be scaled to high-dimensional, infinite state spaces. A key theoretical insight underlying our approach is that the Bellman residual satisfies the Polyak-Lojasiewicz (PL) condition -- a property that, while weaker than strong convexity, is sufficient to ensure fast global convergence guarantees. Through a series of synthetic experiments, we demonstrate that our approach consistently outperforms benchmark methods and state-of-the-art alternatives.

Enoch H. Kang, Hema Yoganarasimhan, Lalit Jain• 2025

Related benchmarks

TaskDatasetResultRank
Reward EstimationStandard bus engine replacement simulation without dummy variables
MAPE0.12
34
Imitation LearningCartPole v1 (test)
Optimality (%)100
15
Imitation LearningAcrobot v1 (test)
Optimality (%)103.7
15
Imitation LearningLunar Lander v2 (test)
Optimality (%)107.3
15
Showing 4 of 4 rows

Other info

Follow for update