FOCA: Multimodal Malware Classification via Hyperbolic Cross-Attention

About

In this work, we introduce FOCA, a novel multimodal framework for malware classification that jointly leverages audio and visual modalities. Unlike conventional Euclidean-based fusion methods, FOCA is the first to exploit the intrinsic hierarchical relationships between audio and visual representations within hyperbolic space. To achieve this, raw binaries are transformed into both audio and visual representations, which are then processed through three key components: (i) a hyperbolic projection module that maps Euclidean embeddings into the Poincare ball, (ii) a hyperbolic cross-attention mechanism that aligns multimodal dependencies under curvature-aware constraints, and (iii) a Mobius addition-based fusion layer. Comprehensive experiments on two benchmark datasets-Mal-Net and CICMalDroid2020- show that FOCA consistently outperforms unimodal models, surpasses most Euclidean multimodal baselines, and achieves state-of-the-art performance over existing works.

Nitin Choudhury, Bikrant Bikram Pratap Maurya, Orchid Chetia Phukan, Arun Balaji Buduru• 2026

Related benchmarks

Task	Dataset	Result	Rank
Malware Classification	CICMalDroid 2020	Accuracy0.991		46
Malware Classification	Mal-Net	Accuracy82.84		38

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord