Skill-Pro: Learning Reusable Skills from Experience via Non-Parametric PPO for LLM Agents

About

LLM-driven agents excel at sequential decision-making but often rely on on-the-fly reasoning, re-deriving solutions even in recurring scenarios. This insufficient experience reuse leads to computational redundancy and instability. To bridge this gap, we propose Skill-Pro, a framework enabling agents to autonomously learn reusable procedural skills from interaction experiences without parameter updates. By formalizing a Skill-MDP, Skill-Pro transforms passive episodic narratives into executable Skills defined by activation, execution, and termination conditions to ensure executability. To achieve reliable reusability without capability degradation, we introduce Non-Parametric PPO, which leverages semantic gradients for high-quality candidate generation and a PPO Gate for robust Skill verification. Through score-based maintenance, Skill-Pro sustains compact, high-quality procedural memory. Experimental results across in-domain, cross-task, and cross-agent scenarios demonstrate that Skill-Pro achieves superior reuse rates and significant gains with extreme memory compression. Visualized evolutionary trajectories and Skill distributions further reveal how Skill-Pro transparently accumulates, refines, and reuses procedural knowledge to facilitate long-term autonomy.

Qirui Mi, Zhijian Ma, Mengyue Yang, Haoxuan Li, Yisen Wang, Haifeng Zhang, Jun Wang• 2026

Related benchmarks

Task	Dataset	Result
Online Shopping	WebShop (test)	Score38.7	59
Interactive agent-based task completion	ALFWorld Unseen (val)	Pick Success Rate66.7	11
Interactive Instruction Following	ALFWorld (train)	Success Rate90	9
Interactive Instruction Following	ALFWorld OOD	Success Rate90.9	9
Strategic game playing	Mastermind Hard	Average Return0.463	9
Strategic game playing	Mastermind Extreme	Average Return0.333	9
Experience Reuse	Mastermind v0	Reuse Rate92.5	6
Experience Reuse	Mastermind Hard v0	Experience Reuse Rate82.5	6
Experience Reuse	Mastermind Extreme v0	Experience Reuse Rate90	6
Experience Reuse	Mastermind Gemma-3-4B agent	Experience Reuse Rate85	6

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord