Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AOEPT: Breaking the Implicit Modality-Reduction Bottleneck in Modality-Missing Prompt Tuning

About

Deploying multimodal systems in real-world environments often entails handling modality-missing scenarios, where one or more modalities are unavailable. While recent studies address this challenge for the general Multimodal Transformer (MT) architecture via prompt tuning, we identify a fundamental limitation in these methods: the Implicit Modality-Reduction bottleneck. By conditioning prompts solely on the observed modalities, they inadvertently restrict the reasoning scope of MTs to the modality-reduced subspace, cutting off access to the latent information sources of the missing modalities. To overcome this limitation, we propose AOEPT, which pioneers a novel modal-contextualized prompting fashion. Specifically, we introduce lightweight Modal-Contextualized Prompts (MCPs) that distill global modality-wise priors from training data, serving as latent repositories of the information sources for missing modalities. Conditioned on the remaining modalities, these MCPs are instantiated into instance-aware prompts that selectively augment missing-modality information for each sample, thereby restoring the reasoning scope of MTs beyond the observed-modality-only subspace. Experiments across various multimodal benchmarks and backbones confirm the strong performance of AOEPT, with minimal computational overhead.

Jian Lang, Rongpei Hong, Ting Zhong, Fan Zhou• 2026

Related benchmarks

TaskDatasetResultRank
Multimodal Multilabel ClassificationMM-IMDB (test)
Macro F139.86
94
Multi-label Multimodal ClassificationMM-IMDb 70% missing rate (test)
Text F1 (Macro)51.5
7
Multi-label Multimodal ClassificationMM-IMDb 90% missing rate (test)
Text F1-M50.54
7
Multimodal Food ClassificationFood101 70% missing rate (test)
Text Accuracy80.77
7
Multimodal Food ClassificationFood101 90% missing rate (test)
Text Accuracy77.47
7
Multimodal Hateful Meme DetectionHateMemes 70% missing rate (test)
Text AUC71.12
7
Multimodal Hateful Meme DetectionHateMemes 90% missing rate (test)
Text AUC70.53
7
Showing 7 of 7 rows

Other info

Follow for update