Skill Reuse as Compression in Agentic RL
About
Large language model agents trained with reinforcement learning (RL) often learn brittle, task-specific shortcuts. We hypothesize that agents generalize better when their successful trajectories are structurally compressible, decomposed into a small set of reusable abstract patterns. To formalize this, we introduce ReuseRL, which grounds agentic RL in the Minimum Description Length (MDL) principle. ReuseRL extracts a shared skill dictionary from successful trajectories and augments the RL objective with a segmentation cost, explicitly penalizing idiosyncratic behaviors that encode poorly. We prove a PAC-Bayes generalization bound for this compression penalty. Across ALFWorld, TextWorld-Cooking, and Countdown-Stepwise, ReuseRL improves in- and out-of-distribution success over vanilla GRPO and strong round-length baselines.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Interactive Task Completion | ALFWorld 140 scenes (IID) | Success Count @ 7 Steps97.14 | 5 | |
| Interactive Task Completion | ALFWorld 134 scenes (OOD) | SC@793.28 | 5 | |
| Mathematical Reasoning | Countdown-Stepwise (test) | Pass@180.37 | 5 | |
| Text-based Game Playing | TW-Cooking 1000 (test) | Pass@183.5 | 5 |