Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

FOCA: Multimodal Malware Classification via Hyperbolic Cross-Attention

About

In this work, we introduce FOCA, a novel multimodal framework for malware classification that jointly leverages audio and visual modalities. Unlike conventional Euclidean-based fusion methods, FOCA is the first to exploit the intrinsic hierarchical relationships between audio and visual representations within hyperbolic space. To achieve this, raw binaries are transformed into both audio and visual representations, which are then processed through three key components: (i) a hyperbolic projection module that maps Euclidean embeddings into the Poincare ball, (ii) a hyperbolic cross-attention mechanism that aligns multimodal dependencies under curvature-aware constraints, and (iii) a Mobius addition-based fusion layer. Comprehensive experiments on two benchmark datasets-Mal-Net and CICMalDroid2020- show that FOCA consistently outperforms unimodal models, surpasses most Euclidean multimodal baselines, and achieves state-of-the-art performance over existing works.

Nitin Choudhury, Bikrant Bikram Pratap Maurya, Orchid Chetia Phukan, Arun Balaji Buduru• 2026

Related benchmarks

TaskDatasetResultRank
Malware ClassificationMal-Net
Accuracy82.84
38
Malware ClassificationCICMalDroid 2020
Accuracy0.991
38
Showing 2 of 2 rows

Other info

Follow for update