Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry

About

Autonomous AI agents increasingly extend their capabilities through Agent Skills: modular filesystem packages whose SKILL.md files describe when and how agents should use them. While this design enables scalable, on-demand capability expansion, it also introduces a semantic supply-chain risk in which natural-language metadata and instructions can affect which skills are admitted, surfaced, selected, and loaded. We study SKILL.md - only attacks across three registry-facing stages of the Agent Skill lifecycle, using real ClawHub skills and realistic registry mechanisms. In Discovery, short textual triggers can manipulate embedding-based retrieval and improve adversarial skill visibility, achieving up to 86% pairwise win rate and 80% Top-10 placement. In Selection, description-only framing biases agents toward functionally equivalent adversarial variants, which are selected in 77.6% of paired trials on average. In Governance, semantic evasion strategies cause malicious skills to avoid a blocking verdict in 36.5%-100% of cases. Overall, our results show that SKILL.md is not passive documentation but operational text that shapes which third-party capabilities agents find, trust, and use.

Shoumik Saha, Kazem Faghih, Soheil Feizi• 2026

Related benchmarks

Task	Dataset	Result
Adversarial Attack	Skill Ranking Evaluation Set (test)	Top-3 Success Rate34.29	6
Discovery Manipulation	ClawHub	Top-3 Accuracy56	5
Black-box beam-search-based attack	Skill Ranking Target: BAAI/bge-base-en-v1.5	Top-3 Success Rate19.8	3
Black-box beam-search-based attack	Skill Ranking Target: BAAI/bge-small-en-v1.5	Top-3 Success Rate14.85	3
Black-box beam-search-based attack	Skill Ranking Target OpenAI text-embedding-3-small	Top-3 Success Rate56	3
Discovery Manipulation	ClawHub 0-day	Win-Rate94	2
Discovery Manipulation	ClawHub average-day	Win-Rate74.14	1

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord