Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance

About

Multimodal learning methods with targeted unimodal learning objectives have exhibited their superior efficacy in alleviating the imbalanced multimodal learning problem. However, in this paper, we identify the previously ignored gradient conflict between multimodal and unimodal learning objectives, potentially misleading the unimodal encoder optimization. To well diminish these conflicts, we observe the discrepancy between multimodal loss and unimodal loss, where both gradient magnitude and covariance of the easier-to-learn multimodal loss are smaller than the unimodal one. With this property, we analyze Pareto integration under our multimodal scenario and propose MMPareto algorithm, which could ensure a final gradient with direction that is common to all learning objectives and enhanced magnitude to improve generalization, providing innocent unimodal assistance. Finally, experiments across multiple types of modalities and frameworks with dense cross-modal interaction indicate our superior and extendable method performance. Our method is also expected to facilitate multi-task cases with a clear discrepancy in task difficulty, demonstrating its ideal scalability. The source code and dataset are available at https://github.com/GeWu-Lab/MMPareto_ICML2024.

Yake Wei, Di Hu• 2024

Related benchmarks

TaskDatasetResultRank
Audio-Video ClassificationKinetics-Sound
Accuracy69.83
35
Multimodal ClassificationKinetics-Sounds (test)
Multimodal Accuracy69.13
30
Multimodal ClassificationCREMA-D
Accuracy70.19
28
Emotion RecognitionMOSI
Accuracy (7-Class)34.88
26
Emotion RecognitionCH-SIMS
Accuracy (5-Class)53.73
26
Emotion RecognitionCH-SIMS 2
Accuracy (5-class)43.54
26
Emotion RecognitionMOSEI
Accuracy (7-Class)42.09
26
Multimodal ClassificationAVE (test)
Multi Acc68.22
25
Multimodal ClassificationCREMA-D (test)
Multi Accuracy70.3
25
Multimodal ClassificationSymile
AUROC0.6319
24
Showing 10 of 28 rows

Other info

Follow for update