Provably Efficient Offline-to-Online Value Adaptation with General Function Approximation

About

We study value adaptation in offline-to-online reinforcement learning under general function approximation. Starting from an imperfect offline pretrained $Q$-function, the learner aims to adapt it to the target environment using only a limited amount of online interaction. We first characterize the difficulty of this setting by establishing a minimax lower bound, showing that even when the pretrained $Q$-function is close to optimal $Q^\star$, online adaptation can be no more efficient than pure online RL on certain hard instances. On the positive side, under a novel structural condition on the offline-pretrained value functions, we propose O2O-LSVI, an adaptation algorithm with problem-dependent sample complexity that provably improves over pure online RL. Finally, we complement our theory with neural-network experiments that demonstrate the practical effectiveness of the proposed method.

Shangzhe Li, Weitong Zhang• 2026

Related benchmarks

Task	Dataset	Result
Reinforcement Learning	AntMaze umaze D4RL	Average Episodic Return85.8	12
Reinforcement Learning	AntMaze large-play D4RL	Average Episodic Return35.3	12
Reinforcement Learning	antmaze medium-play	D4RL Score70.3	4

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord