Missing-Aware Multimodal Fusion for Unified Microservice Incident Management
About
Automated incident management is critical for microservice reliability. While recent unified frameworks leverage multimodal data for joint optimization, they unrealistically assume perfect data completeness. In practice, network fluctuations and agent failures frequently cause missing modalities. Existing approaches relying on static placeholders introduce imputation noise that masks anomalies and degrades performance. To address this, we propose ARMOR, a robust self-supervised framework designed for missing modality scenarios. ARMOR features: (i) a modality-specific asymmetric encoder that isolates distribution disparities among metrics, logs, and traces; and (ii) a missing-aware gated fusion mechanism utilizing learnable placeholders and dynamic bias compensation to prevent cross-modal interference from incomplete inputs. By employing self-supervised auto-regression with mask-guided reconstruction, ARMOR jointly optimizes anomaly detection (AD), failure triage (FT), and root cause localization (RCL). AD and RCL require no fault labels, while FT relies solely on failure-type annotations for the downstream classifier. Extensive experiments demonstrate that ARMOR achieves state-of-the-art performance under complete data conditions and maintains robust diagnostic accuracy even with severe modality loss.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Root Cause Localization | D1 complete data conditions | Top-1 Score82.1 | 7 | |
| Root Cause Localization | D2 complete data conditions | Top-1 Accuracy81.5 | 7 | |
| Anomaly Detection | D1 complete data conditions | Precision92.5 | 6 | |
| Anomaly Detection | D2 complete data conditions | Precision99.3 | 6 | |
| Failure Triage | D1 complete data conditions | Precision94.6 | 6 | |
| Failure Triage | D2 complete data conditions | Precision88.2 | 6 | |
| Anomaly Detection | D1 (test) | Execution Time (s)5.23 | 2 | |
| Anomaly Detection | D2 (test) | Execution Time (s)6.71 | 2 | |
| Failure Triage | D1 (test) | Execution Time (s)1.56 | 2 | |
| Failure Triage | D2 (test) | Execution Time (s)1.45 | 2 |