Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Semantic-Aware Parsing for Security Logs

About

Security logs are foundational to threat detection and post-incident investigation, yet analysts often struggle to fully leverage them due to their heterogeneity and unstructured nature. The standard practice of manually writing parsers to normalize the data in security event management systems is time-consuming and costly due to the long tail of log formats. Meanwhile, querying raw logs without explicit parsing using large language models (LLMs) is impractical at scale. In this paper, we introduce Matryoshka, an end-to-end system leveraging LLMs to automatically generate semantically-aware structured log parsers without labeled examples or human intervention. Matryoshka achieves this by directly inferring log syntax, variable naming, and normalization to common security-specific schemas (e.g., OCSF [1]) from unlabeled log line samples, then generating deterministic parsers and mapping rules that can be efficiently applied during data ingest. This approach provides analysts with semantically-rich data representations at scale, facilitating rapid and precise log search without the traditional burden of manual parser construction. We evaluate Matryoshka's capabilities through both established template generation datasets and new datasets curated to establish end-to-end performance on a realistic distribution of log types. Our experiments show that Matryoshka outperforms prior work on syntax parsing while matching human-generated parsers in both side-by-side comparisons and retrieval for security-relevant queries. These results demonstrate that Matryoshka significantly reduces manual effort by automatically extracting and organizing valuable security data, moving us closer to fully automated, AI-driven analytics.

Julien Piet, Vivian Fang, Rishi Khare, Scott Coull, Vern Paxson, Raluca Ada Popa, David Wagner• 2025

Related benchmarks

TaskDatasetResultRank
Querying security logsCron logs Simple WHERE queries
Macro F1 Score100
4
Querying security logsAudit logs Simple WHERE queries
Macro F199.8
2
Querying security logsDHCP logs Simple WHERE queries
Macro F1100
2
Querying security logsPuppet logs Simple WHERE queries
Macro F199.8
2
Querying security logsSSH logs Simple WHERE queries
Macro F1 Score99.9
2
Querying security logsSecurity Log Suite Macro Average
Macro F199.9
2
Showing 6 of 6 rows

Other info

Follow for update