Let the Trial Begin: A Mock-Court Approach to Vulnerability Detection using LLM-Based Agents

About

Detecting vulnerabilities in source code remains a critical yet challenging task, especially when benign and vulnerable functions share significant similarities. In this work, we introduce VulTrial, a courtroom-inspired multi-agent framework designed to identify vulnerable code and to provide explanations. It employs four role-specific agents, which are security researcher, code author, moderator, and review board. Using GPT-4o as the base LLM, VulTrial almost doubles the efficacy of prior best-performing baselines. Additionally, we show that role-specific instruction tuning with small quantities of data significantly further boosts VulTrial's efficacy. Our extensive experiments demonstrate the efficacy of VulTrial across different LLMs, including an open-source, in-house-deployable model (LLaMA-3.1-8B), as well as the high quality of its generated explanations and its ability to uncover multiple confirmed zero-day vulnerabilities in the wild.

Ratnadira Widyasari, Martin Weyssow, Ivana Clairine Irsan, Han Wei Ang, Frank Liauw, Eng Lieh Ouh, Lwin Khin Shar, Hong Jin Kang, David Lo• 2025

Related benchmarks

Task	Dataset	Result	Rank
Vulnerability Detection	PrimeVul (test)	F1 Score56.18		38
Vulnerability Detection	PrimeVul Paired (test)	Pair-Correct Count96		22

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord