Skill-Pro: Learning Reusable Skills from Experience via Non-Parametric PPO for LLM Agents
About
LLM-driven agents demonstrate strong performance in sequential decision-making but often rely on on-the-fly reasoning, re-deriving solutions even in recurring scenarios. This insufficient experience reuse leads to computational redundancy and execution instability. To bridge this gap, we propose Skill-Pro, a framework that enables agents to autonomously learn reusable procedural skills from interaction experiences without parameter updates. By formalizing a Skill-MDP, Skill-Pro transforms passive episodic narratives into executable Skills defined by activation, execution, and termination conditions to ensure executability. To achieve reliable reusability without capability degradation, we introduce Non-Parametric PPO, which leverages semantic gradients for high-quality candidate generation and a PPO Gate for robust Skill verification. Through score-based maintenance, Skill-Pro sustains compact, high-quality procedural memory. Experimental results across in-domain, cross-task, and cross-agent scenarios demonstrate that Skill-Pro achieves superior reuse rates and significant performance gains with extreme memory compression. Visualized evolutionary trajectories and Skill distributions further reveal how Skill-Pro transparently accumulates, refines, and reuses procedural knowledge to facilitate long-term autonomy.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Interactive Instruction Following | ALFWorld (train) | Success Rate90 | 9 | |
| Interactive Instruction Following | ALFWorld OOD | Success Rate90.9 | 9 | |
| Strategic game playing | Mastermind Hard | Average Return0.463 | 9 | |
| Strategic game playing | Mastermind Extreme | Average Return0.333 | 9 | |
| Experience Reuse | Mastermind v0 | Reuse Rate92.5 | 6 | |
| Experience Reuse | Mastermind Hard v0 | Experience Reuse Rate82.5 | 6 | |
| Experience Reuse | Mastermind Extreme v0 | Experience Reuse Rate90 | 6 | |
| Experience Reuse | Mastermind Gemma-3-4B agent | Experience Reuse Rate85 | 6 | |
| Experience Reuse | Mastermind Qwen3-32B agent | Experience Reuse Rate87.5 | 6 |