Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CoEvoSkills: Self-Evolving Agent Skills via Co-Evolutionary Verification

About

Anthropic proposes the concept of skills for LLM agents to tackle multi-step professional tasks that simple tool invocations cannot address. A tool is a single, self-contained function, whereas a skill is a structured bundle of interdependent multi-file artifacts. Currently, skill generation is not only label-intensive due to manual authoring, but also may suffer from human--machine cognitive misalignment, which can lead to degraded agent performance, as evidenced by evaluations on SkillsBench. Therefore, we aim to enable agents to autonomously generate skills. However, existing self-evolving methods designed for tools cannot be directly applied to skills due to their increased complexity. To address these issues, we propose CoEvoSkills, a self-evolving skills framework that enables agents to autonomously construct complex, multi-file skill packages. Specifically, CoEvoSkills couples a Skill Generator that iteratively refines skills with a Surrogate Verifier that co-evolves to provide informative and actionable feedback without access to ground-truth test content. On SkillsBench, CoEvoSkills achieves the highest pass rate among five baselines on both Claude Code and Codex, and also exhibits strong generalization capabilities to six additional LLMs.

Hanrong Zhang, Shicheng Fan, Henry Peng Zou, Yankai Chen, Zhenting Wang, Jiayu Zhou, Chengze Li, Wei-Chieh Huang, Yifei Yao, Kening Zheng, Xue Liu, Xiaoxiao Li, Philip S. Yu• 2026

Related benchmarks

TaskDatasetResultRank
Skill EvolutionS²-Bench
Accuracy52.4
40
Skill EvolutionSL-Bench
Accuracy56
40
Skill EvolutionSRA-Bench
Accuracy74.9
40
Agent task executionSkillsBench 1.0 (test)
Pass Rate (With Skills)71.1
8
Showing 4 of 4 rows

Other info

Follow for update