Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Uncovering Vulnerabilities of LLM-Assisted Cyber Threat Intelligence

About

Large language models (LLMs) are increasingly used to help security analysts manage the surge of cyber threats, automating tasks from vulnerability assessment to incident response. Yet in operational CTI workflows, reliability gaps remain substantial. Existing explanations often point to generic model issues (e.g., hallucination), but we argue the dominant bottleneck is the threat landscape itself: CTI is heterogeneous, volatile, and fragmented. Under these conditions, evidence is intertwined, crowdsourced, and temporally unstable, which are properties that standard LLM-based studies rarely capture. In this paper, we present a comprehensive empirical study of LLM vulnerabilities in CTI reasoning. We introduce a human-in-the-loop categorization framework that robustly labels failure modes across the CTI lifecycle, avoiding the brittleness of automated "LLM-as-a-judge" pipelines. We identify three domain-specific cognitive failures: spurious correlations from superficial metadata, contradictory knowledge from conflicting sources, and constrained generalization to emerging threats. We validate these mechanisms via causal interventions and show that targeted defenses reduce failure rates significantly. Together, these results offer a concrete roadmap for building resilient, domain-aware CTI agents.

Yuqiao Meng, Luoxi Tang, Feiyang Yu, Jinyuan Jia, Guanhua Yan, Ping Yang, Zhaohan Xi• 2025

Related benchmarks

TaskDatasetResultRank
Affected SystemsCTI Benchmark
F1 Score6.8
15
Attack InfrastructureCTI Benchmark
F1 Score7.1
15
Campaign AttributionCTI Benchmark
Accuracy7.5
15
Campaign EscalationCTI Benchmark
AUC7.8
15
Countermeasure RankingCTI Benchmark
NDCG8.5
15
Defensive Playbook GenCTI Benchmark
BLEU9.2
15
Event Timeline ConstructionCTI Benchmark
BLEU9.5
15
Evidence WeightingCTI Benchmark
BLEU Score12.5
15
Exploit LikelihoodCTI Benchmark
AUC4.5
15
False Flag DetectionCTI Benchmark
F1 Score11.5
15
Showing 10 of 32 rows

Other info

Follow for update