How to Solve Contextual Goal-Oriented Problems with Offline Datasets?
About
We present a novel method, Contextual goal-Oriented Data Augmentation (CODA), which uses commonly available unlabeled trajectories and context-goal pairs to solve Contextual Goal-Oriented (CGO) problems. By carefully constructing an action-augmented MDP that is equivalent to the original MDP, CODA creates a fully labeled transition dataset under training contexts without additional approximation error. We conduct a novel theoretical analysis to demonstrate CODA's capability to solve CGO problems in the offline data setup. Empirical results also showcase the effectiveness of CODA, which outperforms other baseline methods across various context-goal relationships of CGO problem. This approach offers a promising direction to solving CGO problems using offline datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Goal Reaching | AntMaze large-play v2 | Success Rate60 | 10 | |
| Goal Reaching | AntMaze medium-play v2 | Success Rate76.8 | 10 | |
| Goal Reaching | AntMaze umaze v2 | Success Rate94.8 | 6 | |
| Goal Reaching | AntMaze Medium-Diverse v2 | Success Rate84.5 | 6 | |
| Goal Reaching | AntMaze large-diverse v2 | Success Rate36.8 | 6 | |
| Goal Reaching | AntMaze umaze-diverse v2 | Success Rate72.8 | 6 | |
| Goal-reaching Navigation | Four Rooms medium-play v1 (test) | Average Success Rate0.787 | 4 | |
| Goal-reaching Navigation | Four Rooms large-diverse v1 (test) | Success Rate72.2 | 4 | |
| Offline Context-conditioned Goal-oriented (CGO) Reinforcement Learning | Random Cells (medium-diverse) | Success Rate72.5 | 4 | |
| Offline Context-conditioned Goal-oriented (CGO) Reinforcement Learning | Random Cells (large-play) | Success Rate60.2 | 4 |