Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry
About
Autonomous AI agents increasingly extend their capabilities through Agent Skills: modular filesystem packages whose SKILL.md files describe when and how agents should use them. While this design enables scalable, on-demand capability expansion, it also introduces a semantic supply-chain risk in which natural-language metadata and instructions can affect which skills are admitted, surfaced, selected, and loaded. We study SKILL.md - only attacks across three registry-facing stages of the Agent Skill lifecycle, using real ClawHub skills and realistic registry mechanisms. In Discovery, short textual triggers can manipulate embedding-based retrieval and improve adversarial skill visibility, achieving up to 86% pairwise win rate and 80% Top-10 placement. In Selection, description-only framing biases agents toward functionally equivalent adversarial variants, which are selected in 77.6% of paired trials on average. In Governance, semantic evasion strategies cause malicious skills to avoid a blocking verdict in 36.5%-100% of cases. Overall, our results show that SKILL.md is not passive documentation but operational text that shapes which third-party capabilities agents find, trust, and use.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Adversarial Attack | Skill Ranking Evaluation Set (test) | Top-3 Success Rate34.29 | 6 | |
| Discovery Manipulation | ClawHub | Top-3 Accuracy56 | 5 | |
| Black-box beam-search-based attack | Skill Ranking Target: BAAI/bge-base-en-v1.5 | Top-3 Success Rate19.8 | 3 | |
| Black-box beam-search-based attack | Skill Ranking Target: BAAI/bge-small-en-v1.5 | Top-3 Success Rate14.85 | 3 | |
| Black-box beam-search-based attack | Skill Ranking Target OpenAI text-embedding-3-small | Top-3 Success Rate56 | 3 | |
| Discovery Manipulation | ClawHub 0-day | Win-Rate94 | 2 | |
| Discovery Manipulation | ClawHub average-day | Win-Rate74.14 | 1 |