SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution

About

Long-horizon LLM agents generate traces that could become reusable experience, but raw trajectories are noisy, local, and hard to govern. Agent Skills offer a structured artifact for combining procedural guidance, executable resources, and applicability boundaries. Yet open skill ecosystems contain redundant, uneven, environment-sensitive artifacts, and indiscriminate updates can pollute future context. We present SkillsVote, a lifecycle-governance framework for Agent Skills across collection, recommendation, attribution, and evolution. SkillsVote profiles a million-scale open source corpus for environment requirements, quality, and verifiability, and synthesizes tasks for verifiable skills. Before execution, it performs agentic library search over structured skill folders to expose instructional context. After execution, it decomposes trajectories into skill-linked subtasks, attributes outcomes to skill-guided execution, agent exploration, environment, and result signals, and admits only successful reusable discoveries to evidence-gated updates. Experiments on Terminal-Bench 2.0 and SWE-Bench Pro show that SkillsVote improves agent performance on challenging agentic coding benchmarks. The gains arise from two complementary pathways: online evolution over task streams at test time and offline transfer via frozen libraries built from either historical trajectories or curated open source skills.

Hongyi Liu, Haoyan Yang, Tao Jiang, Bo Tang, Feiyu Xiong, Yuyu Luo, Zhiyu Li• 2026

Related benchmarks

Task	Dataset	Result
Software Engineering	SWE-Bench Pro (public)	Resolve Rate (Pass@1)50.2	19
Terminal Task Execution	Terminal-Bench 2.0 (full)	Overall avg@5 Accuracy58.9	6
Skill retrieval	SkillResolve-Bench 1.0 (test)	Recall@367.6	5
Skill retrieval	SkillResolve-Bench 1.0 (held-out pairs)	R@2085.3	5

Showing 4 of 4 rows

Other info

GitHub

Follow for update

@wizwand_team Discord