VulStyle: A Multi-Modal Pre-Training for Code Stylometry-Augmented Vulnerability Detection

About

We present VulStyle, a multi-modal software vulnerability detection model that jointly encodes function-level source code, non-terminal Abstract Syntax Tree (AST) structure, and code stylometry (CStyle) features. Prior work in code representation primarily leverages token-level models or full AST trees, often missing stylistic cues indicative of risky programming practices, or incurring high structural overhead. Our approach selects only non-terminal AST nodes, reducing input complexity while preserving semantic hierarchy, and integrates syntactic and lexical CStyle features as auxiliary vulnerability signals. VulStyle is pre-trained using masked language modeling on 4.9M functions across seven programming languages, and fine-tuned across five benchmark datasets: Devign, BigVul, DiverseVul, REVEAL, and VulDeePecker. VulStyle achieves state-of-the-art performance on BigVul and VulDeePecker, improving F1 by 4-48% over strong transformer baselines, and attains competitive or best-average performance across all benchmarks. We contribute an ablation study isolating the effect of CStyle and AST structure, error case analysis, and a threat model situating the detection task in attacker-realistic scenarios.

Chidera Biringa, Ajmal Abbas, Vishnu Selvaraj, Gokhan Kul• 2026

Related benchmarks

Task	Dataset	Result
Vulnerability Detection	BigVul	Precision95.38	42
Vulnerability Detection	VulDeePecker	F1 Score97.76	12
Vulnerability Detection	Reveal	Accuracy85.22	12
Vulnerability Detection	Devign	True Positives (TP)780	4
Vulnerability Detection	DiverseVul	True Positives (TP)868	4

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord