Generalizable Multimodal Large Language Model Editing via Invariant Trajectory Learning
About
Knowledge editing emerges as a crucial technique for efficiently correcting incorrect or outdated knowledge in large language models (LLM). Existing editing methods rely on a rigid mapping from parameter or module modifications to output, which causes the generalization limitation in Multimodal LLM (MLLM). In this paper, we reformulate MLLM editing as an out-of-distribution (OOD) generalization problem, where the goal is to discern semantic shift with factual shift and thus achieve robust editing among diverse cross-modal prompting. The key challenge of this OOD problem lies in identifying invariant causal trajectories that generalize accurately while suppressing spurious correlations. To address it, we propose ODEdit, a plug-and-play invariant learning based framework that optimizes the tripartite OOD risk objective to simultaneously enhance editing reliability, locality, and generality.We further introduce an edit trajectory invariant learning method, which integrates a total variation penalty into the risk minimization objective to stabilize edit trajectories against environmental variations. Theoretical analysis and extensive experiments demonstrate the effectiveness of ODEdit.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Knowledge Editing | E-VQA MMEdit 1.0 (test) | Reliability100 | 24 | |
| Knowledge Editing | MMEdit E-IC 1.0 (test) | Reliability100 | 24 | |
| Sequential Knowledge Editing | Editing VQA (E-VQA) T=5 steps | Relational Accuracy (Rel.)92.63 | 4 | |
| Sequential Knowledge Editing | Editing VQA (E-VQA) T=10 steps | Rel. Score89.79 | 4 | |
| Sequential Knowledge Editing | Editing Image Caption (E-IC) T=5 steps | Relational Score86.53 | 4 | |
| Sequential Knowledge Editing | Editing Image Caption (E-IC) T=10 steps | Relational Score84.64 | 4 |