Extracting Recurring Vulnerabilities from Black-Box LLM-Generated Software
About
LLMs are increasingly used for code generation, but their outputs often follow recurring templates that can induce predictable vulnerabilities. We study vulnerability persistence in LLM-generated software and introduce Feature--Security Table (FSTab) with two components. First, FSTab enables a black-box attack that predicts likely backend vulnerabilities from observable frontend features and knowledge of the source LLM, without access to the backend or source code. Second, FSTab provides a model-centric evaluation that quantifies how consistently a model reproduces the same vulnerabilities across programs, semantics-preserving rephrasings, and application domains. We evaluate FSTab on state-of-the-art code LLMs, including GPT-5.2, Claude-4.5 Opus, and Gemini-3 Pro, across diverse application domains. Our results show strong cross-domain transfer: even when the target domain is excluded from training, FSTab achieves up to 94% attack success and 93% vulnerability coverage on Internal Tools (Claude-4.5 Opus). These findings expose an underexplored attack surface in LLM-generated software and highlight the security risks of code generation. Our code is available at https://github.com/fstabicml2026/FSTab
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Vulnerability Attack Analysis | WebGenBench E-commerce target-domain and cross-domain 1.0 | -- | 12 | |
| Vulnerability Attack Analysis | WebGenBench Internal Tools target-domain and cross-domain 1.0 | -- | 12 | |
| Vulnerability Attack Analysis | WebGenBench Social Media target-domain and cross-domain 1.0 | -- | 12 | |
| Vulnerability Attack Analysis | WebGenBench Blogging target-domain and cross-domain 1.0 | -- | 12 | |
| Vulnerability Attack Analysis | WebGenBench Dashboards target-domain and cross-domain 1.0 | -- | 12 | |
| Code-LLM Vulnerability Recurrence Evaluation | FSTab LLM-generated software | -- | 6 | |
| Vulnerability Attack Performance | E2E (dev) | -- | 6 | |
| Vulnerability Attack Performance | E2E Cross-domain (dev) | -- | 6 |