Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BALM: A Model-Agnostic Framework for Balanced Multimodal Learning under Imbalanced Missing Rates

About

Learning from multiple modalities often suffers from imbalance, where information-rich modalities dominate optimization while weaker or partially missing modalities contribute less. This imbalance becomes severe in realistic settings with imbalanced missing rates (IMR), where each modality is absent with different probabilities, distorting representation learning and gradient dynamics. We revisit this issue from a training-process perspective and propose BALM, a model-agnostic plug-in framework to achieve balanced multimodal learning under IMR. The framework comprises two complementary modules: the Feature Calibration Module (FCM), which recalibrates unimodal features using global context to establish a shared representation basis across heterogeneous missing patterns; the Gradient Rebalancing Module (GRM), which balances learning dynamics across modalities by modulating gradient magnitudes and directions from both distributional and spatial perspectives. BALM can be seamlessly integrated into diverse backbones, including multimodal emotion recognition (MER) models, without altering their architectures. Experimental results across multiple MER benchmarks confirm that BALM consistently enhances robustness and improves performance under diverse missing and imbalance settings. Code available at: https://github.com/np4s/BALM_CVPR2026.git

Phuong-Anh Nguyen, Tien Anh Pham, Duc-Trong Le, Cam-Van Thi Nguyen• 2026

Related benchmarks

TaskDatasetResultRank
Emotion RecognitionIEMOCAP
Accuracy70.02
115
Multimodal Emotion RecognitionIEMOCAP 6-way
F1 (Avg)68.25
106
Multimodal Sentiment AnalysisCMU-MOSEI (0.3, 0.5, 0.7) (test)
Accuracy84.54
24
Multimodal Sentiment AnalysisCMU-MOSEI (0.3, 0.7, 0.5) (test)
Accuracy78.81
12
Multimodal Sentiment AnalysisCMU-MOSEI (0.5, 0.7, 0.3) (test)
Accuracy79.83
12
Multimodal Sentiment AnalysisCMU-MOSEI (0.7, 0.3, 0.5) (test)
Accuracy84.84
12
Multimodal Sentiment AnalysisCMU-MOSEI (0.7, 0.5, 0.3) (test)
Accuracy82.53
12
Showing 7 of 7 rows

Other info

Follow for update