Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report
About
Large language models (LLMs) have shown remarkable success across many domains, yet their integration into cybersecurity applications remains limited due to a lack of general-purpose cybersecurity data, representational complexity, and safety and regulatory concerns. To address this gap, we previously introduced Foundation-Sec-8B, a cybersecurity-focused LLM suitable for fine-tuning on downstream tasks. That model, however, was not designed for chat-style interactions or instruction-following. In this report, we release Foundation-Sec-8B-Instruct: a model specifically trained for general-purpose cybersecurity dialogue. Built on Foundation-Sec-8B, it combines domain-specific knowledge with instruction-following, conversational capabilities, and alignment with human preferences to produce high-quality, relevant responses. Comprehensive evaluations show that Foundation-Sec-8B-Instruct outperforms Llama 3.1-8B-Instruct on a range of cybersecurity tasks while matching its instruction-following performance. It is also competitive with GPT-4o-mini on cyber threat intelligence and instruction-following tasks. We envision Foundation-Sec-8B-Instruct becoming an indispensable assistant in the daily workflows of cybersecurity professionals. We release the model publicly at https://huggingface.co/fdtn-ai/Foundation-Sec-8B-Instruct.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Reasoning | BBH | Accuracy66.7 | 507 | |
| Mathematical Reasoning | GSM8K | Accuracy (GSM8K)84.8 | 358 | |
| Instruction Following | IFEval | -- | 292 | |
| Instruction Following | AlpacaEval 2.0 | LC Win Rate33.1 | 281 | |
| Multi-hop Question Answering | 2WikiMultihopQA | -- | 278 | |
| Knowledge | MMLU | Accuracy66 | 71 | |
| Mathematical Reasoning | MATH | Score0.436 | 50 | |
| Knowledge | GPQA | Accuracy31.9 | 34 | |
| Coding | HumanEval | HumanEval Mean Score0.823 | 28 | |
| Long-context Question Answering | HotpotQA | Mean Score58.4 | 21 |