Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report
About
As transformer-based large language models (LLMs) increasingly permeate society, they have revolutionized domains such as software engineering, creative writing, and digital arts. However, their adoption in cybersecurity remains limited due to challenges like scarcity of specialized training data and complexity of representing cybersecurity-specific knowledge. To address these gaps, we present Foundation-Sec-8B, a cybersecurity-focused LLM built on the Llama 3.1 architecture and enhanced through continued pretraining on a carefully curated cybersecurity corpus. We evaluate Foundation-Sec-8B across both established and new cybersecurity benchmarks, showing that it matches Llama 3.1-70B and GPT-4o-mini in certain cybersecurity-specific tasks. By releasing our model to the public, we aim to accelerate progress and adoption of AI-driven tools in both public and private cybersecurity contexts.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Fine-grained Hallucination Detection | InFi-Check-FG (test) | BAcc (Normalized)27.69 | 30 | |
| Veracity Assessment | FactCheck-Bench | Macro-F169.8 | 26 | |
| Fact Checking | InFi-Check-FG 1.0 (test) | PredE18.82 | 18 | |
| Hallucination Detection | FRANK | Balanced Acc71.49 | 18 | |
| Cybersecurity Knowledge and Malware Extraction Analysis | SECURE | KCV84.38 | 17 | |
| Cybersecurity Knowledge Question Answering | MMLU CSec | CSec Score80 | 17 | |
| Overall Cybersecurity Performance | Cybersecurity Multi-Benchmark Suite | Overall Mean Score76.9 | 17 | |
| Cybersecurity Knowledge Evaluation | CyMtc (500) | CyMtc (500) Score86.6 | 17 | |
| Cybersecurity Multiple Choice Question Answering | RedSage-MCQ 0-shot (test) | Macro Accuracy78.51 | 17 | |
| Cybersecurity Threat Intelligence Analysis | CTI-Bench | MCQ Score62.4 | 17 |