Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Provably Efficient Offline-to-Online Value Adaptation with General Function Approximation

About

We study value adaptation in offline-to-online reinforcement learning under general function approximation. Starting from an imperfect offline pretrained $Q$-function, the learner aims to adapt it to the target environment using only a limited amount of online interaction. We first characterize the difficulty of this setting by establishing a minimax lower bound, showing that even when the pretrained $Q$-function is close to optimal $Q^\star$, online adaptation can be no more efficient than pure online RL on certain hard instances. On the positive side, under a novel structural condition on the offline-pretrained value functions, we propose O2O-LSVI, an adaptation algorithm with problem-dependent sample complexity that provably improves over pure online RL. Finally, we complement our theory with neural-network experiments that demonstrate the practical effectiveness of the proposed method.

Shangzhe Li, Weitong Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Reinforcement LearningAntMaze umaze D4RL
Average Episodic Return85.8
12
Reinforcement LearningAntMaze large-play D4RL
Average Episodic Return35.3
12
Reinforcement Learningantmaze medium-play
D4RL Score70.3
4
Showing 3 of 3 rows

Other info

Follow for update