Hitting Time Isomorphism for Multi-Stage Planning with Foundation Policies

About

We present a new operator-theoretic representation learning framework for offline reinforcement learning that recovers the directed temporal geometry of a controlled Markov process from hitting time observations. While prior art often produces symmetric distances or fails to satisfy the triangle inequality, our framework learns a Hilbert-space displacement geometry where expected hitting times are realized as linear functionals of latent displacements. We prove that this representation exists under latent linear closure and is uniquely identifiable up to a bounded linear isomorphism. For finite-dimensional implementations, we show that global hitting-time error is bounded by one-step transition error amplified by the environment's transient spectral radius. Furthermore, we provide finite-sample guarantees accounting for approximation, statistical complexity, and trajectory-label mismatch. Derived from this theory, we curate Isomorphic Embedding Learning (IEL) as a new goal-agnostic foundation policy learning algorithm that anchors a HILP-style consistency objective with explicit hitting-time regression to ensure that the learned geometry reflects actual decision-time progress. This asymmetric and compositional structure enables robust graph-based multi-stage planning for long-horizon navigation. Our experiments demonstrate that IEL improves the state of the art of learning foundation policy policies from offline maze locomotion data. Our code can be found on https://github.com/MagnusBoock/IEL

Magnus Victor Boock, Abdullah Akg\"ul, Mustafa Mert \c{C}elikok, Melih Kandemir• 2026

Related benchmarks

Task	Dataset	Result
Offline Reinforcement Learning	Kitchen Partial	Normalized Score60.4	69
Offline goal-conditioned RL	antmaze large-diverse	Mean Normalized Return71.8	7
Offline goal-conditioned RL	antmaze large-play	Mean Normalized Return63	7
Offline goal-conditioned RL	AntMaze Ultra-Diverse	Mean Normalized Return79	7
Offline goal-conditioned RL	AntMaze-Ultra-Play	Mean Normalized Return73.8	7
Offline goal-conditioned RL	kitchen mixed	Mean Normalized Return57.8	7

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord