Scalable Kernel Inverse Optimization
About
Inverse Optimization (IO) is a framework for learning the unknown objective function of an expert decision-maker from a past dataset. In this paper, we extend the hypothesis class of IO objective functions to a reproducing kernel Hilbert space (RKHS), thereby enhancing feature representation to an infinite-dimensional space. We demonstrate that a variant of the representer theorem holds for a specific training loss, allowing the reformulation of the problem as a finite-dimensional convex optimization program. To address scalability issues commonly associated with kernel methods, we propose the Sequential Selection Optimization (SSO) algorithm to efficiently train the proposed Kernel Inverse Optimization (KIO) model. Finally, we validate the generalization capabilities of the proposed KIO model and the effectiveness of the SSO algorithm through learning-from-demonstration tasks on the MuJoCo benchmark.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Offline Reinforcement Learning | D4RL halfcheetah-medium-expert | Normalized Score46.4 | 117 | |
| Offline Reinforcement Learning | D4RL hopper-medium-expert | Normalized Score79.6 | 115 | |
| Locomotion Control | D4RL walker2d-medium-expert | Normalized Return100.1 | 23 | |
| Continuous Control | D4RL Hopper medium | Normalized Return50.2 | 19 | |
| Continuous Control | D4RL Walker2d medium | Normalized Return74.6 | 14 | |
| Continuous Control | D4RL Hopper (expert) | Normalized Return109.9 | 5 | |
| Continuous Control | D4RL Walker2d expert | Normalized Return108.5 | 5 | |
| Continuous Control | D4RL Halfcheetah medium | Normalized Return39 | 5 | |
| Continuous Control | D4RL Halfcheetah-expert | Normalized Return84.4 | 5 |