SLEA-RL: Step-Level Experience Augmented Reinforcement Learning for Multi-Turn Agentic Training

About

Large Language Model (LLM) agents have shown strong results on multi-turn tool-use tasks, yet they operate in isolation during training, failing to leverage experiences accumulated across episodes. Existing experience-augmented methods address this by organizing trajectories into retrievable libraries, but they retrieve experiences only once based on the initial task description and hold them constant throughout the episode. In multi-turn settings where observations change at every step, this static retrieval becomes increasingly mismatched as episodes progress. We propose SLEA-RL (Step-Level Experience-Augmented Reinforcement Learning), a framework that retrieves relevant experiences at each decision step conditioned on the current observation. SLEA-RL operates through three components: (i) step-level observation clustering that groups structurally equivalent environmental states for efficient cluster-indexed retrieval; (ii) a self-evolving experience library that distills successful strategies and failure patterns through score-based admission and rate-limited extraction; and (iii) policy optimization with step-level credit assignment for fine-grained advantage estimation across multi-turn episodes. The experience library evolves alongside the policy through semantic analysis rather than gradient updates. Experiments on long-horizon multi-turn agent benchmarks demonstrate that SLEA-RL achieves superior performance compared to various reinforcement learning baselines.

Prince Zizhuang Wang, Shuli Jiang• 2026

Related benchmarks

Task	Dataset	Result
Single-hop Question Answering	PopQA	--	186
Embodied Task	AlfWorld	Overall Success Rate93.5	169
Multi-hop QA	HotpotQA	--	143
Single-hop Question Answering	TriviaQA	--	133
Multi-hop QA	MuSiQue	EM77.2	95
Interactive web-based shopping tasks	Webshop	Score88.7	60
Multi-hop Question Answering	Multi-Hop QA (HotpotQA, 2Wiki, Musique, Bamboogle)	HotpotQA Score59.8	54
Multi-hop QA	Bamboogle	Accuracy (%)33.2	25
Multi-hop QA	2Wiki	Accuracy70.5	17
Single-hop QA	NQ	Accuracy48.5	17

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord