Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report

About

Large language models (LLMs) have shown remarkable success across many domains, yet their integration into cybersecurity applications remains limited due to a lack of general-purpose cybersecurity data, representational complexity, and safety and regulatory concerns. To address this gap, we previously introduced Foundation-Sec-8B, a cybersecurity-focused LLM suitable for fine-tuning on downstream tasks. That model, however, was not designed for chat-style interactions or instruction-following. In this report, we release Foundation-Sec-8B-Instruct: a model specifically trained for general-purpose cybersecurity dialogue. Built on Foundation-Sec-8B, it combines domain-specific knowledge with instruction-following, conversational capabilities, and alignment with human preferences to produce high-quality, relevant responses. Comprehensive evaluations show that Foundation-Sec-8B-Instruct outperforms Llama 3.1-8B-Instruct on a range of cybersecurity tasks while matching its instruction-following performance. It is also competitive with GPT-4o-mini on cyber threat intelligence and instruction-following tasks. We envision Foundation-Sec-8B-Instruct becoming an indispensable assistant in the daily workflows of cybersecurity professionals. We release the model publicly at https://huggingface.co/fdtn-ai/Foundation-Sec-8B-Instruct.

Sajana Weerawardhena, Paul Kassianik, Blaine Nelson, Baturay Saglam, Anu Vellore, Aman Priyanshu, Supriti Vijay, Massimo Aufiero, Arthur Goldblatt, Fraser Burch, Ed Li, Jianliang He, Dhruv Kedia, Kojin Oshiba, Zhouran Yang, Yaron Singer, Amin Karbasi• 2025

Related benchmarks

TaskDatasetResultRank
ReasoningBBH
Accuracy66.7
507
Mathematical ReasoningGSM8K
Accuracy (GSM8K)84.8
358
Instruction FollowingIFEval--
292
Instruction FollowingAlpacaEval 2.0
LC Win Rate33.1
281
Multi-hop Question Answering2WikiMultihopQA--
278
KnowledgeMMLU
Accuracy66
71
Mathematical ReasoningMATH
Score0.436
50
KnowledgeGPQA
Accuracy31.9
34
CodingHumanEval
HumanEval Mean Score0.823
28
Long-context Question AnsweringHotpotQA
Mean Score58.4
21
Showing 10 of 29 rows

Other info

Follow for update