Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Skill-Pro: Learning Reusable Skills from Experience via Non-Parametric PPO for LLM Agents

About

LLM-driven agents demonstrate strong performance in sequential decision-making but often rely on on-the-fly reasoning, re-deriving solutions even in recurring scenarios. This insufficient experience reuse leads to computational redundancy and execution instability. To bridge this gap, we propose Skill-Pro, a framework that enables agents to autonomously learn reusable procedural skills from interaction experiences without parameter updates. By formalizing a Skill-MDP, Skill-Pro transforms passive episodic narratives into executable Skills defined by activation, execution, and termination conditions to ensure executability. To achieve reliable reusability without capability degradation, we introduce Non-Parametric PPO, which leverages semantic gradients for high-quality candidate generation and a PPO Gate for robust Skill verification. Through score-based maintenance, Skill-Pro sustains compact, high-quality procedural memory. Experimental results across in-domain, cross-task, and cross-agent scenarios demonstrate that Skill-Pro achieves superior reuse rates and significant performance gains with extreme memory compression. Visualized evolutionary trajectories and Skill distributions further reveal how Skill-Pro transparently accumulates, refines, and reuses procedural knowledge to facilitate long-term autonomy.

Qirui Mi, Zhijian Ma, Mengyue Yang, Haoxuan Li, Yisen Wang, Haifeng Zhang, Jun Wang• 2026

Related benchmarks

TaskDatasetResultRank
Interactive Instruction FollowingALFWorld (train)
Success Rate90
9
Interactive Instruction FollowingALFWorld OOD
Success Rate90.9
9
Strategic game playingMastermind Hard
Average Return0.463
9
Strategic game playingMastermind Extreme
Average Return0.333
9
Experience ReuseMastermind v0
Reuse Rate92.5
6
Experience ReuseMastermind Hard v0
Experience Reuse Rate82.5
6
Experience ReuseMastermind Extreme v0
Experience Reuse Rate90
6
Experience ReuseMastermind Gemma-3-4B agent
Experience Reuse Rate85
6
Experience ReuseMastermind Qwen3-32B agent
Experience Reuse Rate87.5
6
Showing 9 of 9 rows

Other info

Follow for update