Just-In-Time Reinforcement Learning: Continual Learning in LLM Agents Without Gradient Updates

About

While Large Language Model (LLM) agents excel at general tasks, they inherently struggle with continual adaptation due to the frozen weights after deployment. Conventional reinforcement learning (RL) offers a solution but incurs prohibitive computational costs and the risk of catastrophic forgetting. We introduce Just-In-Time Reinforcement Learning (JitRL), a training-free framework that enables test-time policy optimization without any gradient updates. JitRL maintains a dynamic, non-parametric memory of experiences and retrieves relevant trajectories to estimate action advantages on-the-fly. These estimates are then used to directly modulate the LLM's output logits. We theoretically prove that this additive update rule is the exact closed-form solution to the KL-constrained policy optimization objective. Extensive experiments on WebArena and Jericho demonstrate that JitRL establishes a new state-of-the-art among training-free methods. Crucially, JitRL outperforms the performance of computationally expensive fine-tuning methods (e.g., WebRL) while reducing monetary costs by over 30 times, offering a scalable path for continual learning agents. The code is available at https://github.com/liushiliushi/JitRL.

Yibo Li, Zijie Lin, Ailin Deng, Xuan Zhang, Yufei He, Shuo Ji, Tri Cao, Bryan Hooi• 2026

Related benchmarks

Task	Dataset	Result
Web navigation	WebArena	--	55
Text-based Game Playing	Jericho Library (test)	Average Score25.9	7
Text-based Game Playing	Jericho Zork1 (test)	Average Score53	7
Text-based Game Playing	Jericho Zork3 (test)	Avg Score3.1	7
Web Agent Task Success	WebArena-Lite (test)	Admin Success Rate65.71	3

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord