Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents

About

LLM agents benefit from reusable skills, yet test-time tasks often require guidance more specific than a static skill library can provide. We propose \emph{SkillTTA}, a Test-Time Adaptive Skill Synthesis method that retrieves a small set of training trajectories relevant to the current task and synthesizes them into a temporary, task-specific textual skill. The solver model is kept fixed, so adaptation happens entirely through generated context rather than parameter updates. We evaluate the method on SpreadsheetBench, ALFWorld, and BigCodeBench. Compared with static trajectory-to-skill synthesis using GPT-5.5, task-specific skills improve SpreadsheetBench Pass@1 from 0.397 to 0.505 and BigCodeBench Pass@1 from 0.517 to 0.651. On ALFWorld, the method matches a heavier memory-learning baseline within four points of success rate while producing the shortest successful trajectories among reported methods. Ablations on SpreadsheetBench further show that synthesized skills outperform raw trajectory prompting, that top-$k$ retrieval should stay small, and that failed trajectories are especially useful because they expose recurring evaluator-facing mistakes.

Jingxing Wang, Chenyu Zhou, Zhihui Fu, Jun Wang, Weiwen Liu, Weinan Zhang, Jianghao Lin• 2026

Related benchmarks

TaskDatasetResultRank
Interactive Decision-makingAlfWorld
Overall Success Rate87.9
295
Code GenerationBigCodeBench
Pass@165.1
7
Spreadsheet ReasoningSpreadSheetBench
Pass@150.5
7
Showing 3 of 3 rows

Other info

Follow for update