Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VulStyle: A Multi-Modal Pre-Training for Code Stylometry-Augmented Vulnerability Detection

About

We present VulStyle, a multi-modal software vulnerability detection model that jointly encodes function-level source code, non-terminal Abstract Syntax Tree (AST) structure, and code stylometry (CStyle) features. Prior work in code representation primarily leverages token-level models or full AST trees, often missing stylistic cues indicative of risky programming practices, or incurring high structural overhead. Our approach selects only non-terminal AST nodes, reducing input complexity while preserving semantic hierarchy, and integrates syntactic and lexical CStyle features as auxiliary vulnerability signals. VulStyle is pre-trained using masked language modeling on 4.9M functions across seven programming languages, and fine-tuned across five benchmark datasets: Devign, BigVul, DiverseVul, REVEAL, and VulDeePecker. VulStyle achieves state-of-the-art performance on BigVul and VulDeePecker, improving F1 by 4-48% over strong transformer baselines, and attains competitive or best-average performance across all benchmarks. We contribute an ablation study isolating the effect of CStyle and AST structure, error case analysis, and a threat model situating the detection task in attacker-realistic scenarios.

Chidera Biringa, Ajmal Abbas, Vishnu Selvaraj, Gokhan Kul• 2026

Related benchmarks

TaskDatasetResultRank
Vulnerability DetectionBigVul
Precision95.38
42
Vulnerability DetectionVulDeePecker
F1 Score97.76
12
Vulnerability DetectionReveal
Accuracy85.22
12
Vulnerability DetectionDevign
True Positives (TP)780
4
Vulnerability DetectionDiverseVul
True Positives (TP)868
4
Showing 5 of 5 rows

Other info

Follow for update