RedSage: A Cybersecurity Generalist LLM

About

Cybersecurity operations demand assistant LLMs that support diverse workflows without exposing sensitive data. Existing solutions either rely on proprietary APIs with privacy risks or on open models lacking domain adaptation. To bridge this gap, we curate 11.8B tokens of cybersecurity-focused continual pretraining data via large-scale web filtering and manual collection of high-quality resources, spanning 28.6K documents across frameworks, offensive techniques, and security tools. Building on this, we design an agentic augmentation pipeline that simulates expert workflows to generate 266K multi-turn cybersecurity samples for supervised fine-tuning. Combined with general open-source LLM data, these resources enable the training of RedSage, an open-source, locally deployable cybersecurity assistant with domain-aware pretraining and post-training. To rigorously evaluate the models, we introduce RedSage-Bench, a benchmark with 30K multiple-choice and 240 open-ended Q&A items covering cybersecurity knowledge, skills, and tool expertise. RedSage is further evaluated on established cybersecurity benchmarks (e.g., CTI-Bench, CyberMetric, SECURE) and general LLM benchmarks to assess broader generalization. At the 8B scale, RedSage achieves consistently better results, surpassing the baseline models by up to +5.59 points on cybersecurity benchmarks and +5.05 points on Open LLM Leaderboard tasks. These findings demonstrate that domain-aware agentic augmentation and pre/post-training can not only enhance cybersecurity-specific expertise but also help to improve general reasoning and instruction-following. All models, datasets, and code are publicly available.

Naufal Suryanto, Muzammal Naseer, Pengfei Li, Syed Talal Wasim, Jinhui Yi, Juergen Gall, Paolo Ceravolo, Ernesto Damiani• 2026

Related benchmarks

Task	Dataset	Result
Cybersecurity Knowledge Question Answering	MMLU CSec	CSec Score88	21
Cybersecurity Knowledge Evaluation	CyMtc (500)	CyMtc (500) Score93.8	17
Cybersecurity Multiple Choice Question Answering	RedSage-MCQ 0-shot (test)	Macro Accuracy85.73	17
Cybersecurity Threat Intelligence Analysis	CTI-Bench	MCQ Score71.04	17
General Language Understanding and Reasoning	Open LLM Leaderboard Lighteval (test)	Mean Accuracy74.33	17
Overall Cybersecurity Performance	Cybersecurity Multi-Benchmark Suite	Overall Mean Score84.56	17
Cybersecurity Benchmarking	ScBen En	En83.62	17
Cybersecurity Evaluation	ScEva	MCQ Score76.1	17
Cybersecurity Knowledge and Malware Extraction Analysis	SECURE	KCV87.2	17

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord