Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

When Entropy Is Not Enough: Multi-Modal Classification of Encrypted and Compressed Data Fragments

About

Reliable identification of encrypted data fragments is essential in cybersecurity, with applications to ransomware detection, digital forensics, and large-scale data analysis. Distinguishing encrypted from compressed fragments is particularly challenging, as short fragments lack structural data and exhibit low statistical redundancy. Traditional statistical methods based on byte-level distributions show limited effectiveness on this task. Recent machine learning approaches improve performance by learning subtle patterns from raw bytes, but predominantly rely on single-modal representations, implicitly assuming that a single view of the data is sufficient for accurate classification. This paper shows that this assumption becomes a fundamental limitation in low-information settings, when only small fragments of data are available (512--2048 Bytes). We propose Triumvir, a multi-modal, uncertainty-aware ensemble architecture that integrates statistical, sequential, and spatial representations of raw byte fragments. Extensive experimental analysis demonstrates that Triumvir consistently outperforms state-of-the-art methods with gains of up to +4.5pp in binary and +6.4pp in multiclass classification. Ablation studies confirm that combining modalities is critical, yielding improvements of up to +5pp over partial configurations.

Fabio De Gaspari, Dorjan Hitaj, Samuele Salaris, Luigi V. Mancini• 2026

Related benchmarks

TaskDatasetResultRank
Multiclass ClassificationMulticlass File Fragment Dataset (test)
Accuracy91.6
10
Binary ClassificationFile Fragment Dataset 2KB (test)
Accuracy (bz2)99.6
5
Showing 2 of 2 rows

Other info

Follow for update