Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Hitting Time Isomorphism for Multi-Stage Planning with Foundation Policies

About

We present a new operator-theoretic representation learning framework for offline reinforcement learning that recovers the directed temporal geometry of a controlled Markov process from hitting time observations. While prior art often produces symmetric distances or fails to satisfy the triangle inequality, our framework learns a Hilbert-space displacement geometry where expected hitting times are realized as linear functionals of latent displacements. We prove that this representation exists under latent linear closure and is uniquely identifiable up to a bounded linear isomorphism. For finite-dimensional implementations, we show that global hitting-time error is bounded by one-step transition error amplified by the environment's transient spectral radius. Furthermore, we provide finite-sample guarantees accounting for approximation, statistical complexity, and trajectory-label mismatch. Derived from this theory, we curate Isomorphic Embedding Learning (IEL) as a new goal-agnostic foundation policy learning algorithm that anchors a HILP-style consistency objective with explicit hitting-time regression to ensure that the learned geometry reflects actual decision-time progress. This asymmetric and compositional structure enables robust graph-based multi-stage planning for long-horizon navigation. Our experiments demonstrate that IEL improves the state of the art of learning foundation policy policies from offline maze locomotion data. Our code can be found on https://github.com/MagnusBoock/IEL

Magnus Victor Boock, Abdullah Akg\"ul, Mustafa Mert \c{C}elikok, Melih Kandemir• 2026

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningKitchen Partial
Normalized Score60.4
69
Offline goal-conditioned RLantmaze large-diverse
Mean Normalized Return71.8
7
Offline goal-conditioned RLantmaze large-play
Mean Normalized Return63
7
Offline goal-conditioned RLAntMaze Ultra-Diverse
Mean Normalized Return79
7
Offline goal-conditioned RLAntMaze-Ultra-Play
Mean Normalized Return73.8
7
Offline goal-conditioned RLkitchen mixed
Mean Normalized Return57.8
7
Showing 6 of 6 rows

Other info

Follow for update