DSCA: Dynamic Subspace Concept Alignment for Lifelong VLM Editing
About
Model editing aims to update knowledge to add new concepts and change relevant information without retraining. Lifelong editing is a challenging task, prone to disrupting previously learned concepts, especially for Vision Language Models (VLMs), because sequential edits can lead to degraded reasoning and cross modal misalignment. Existing VLM knowledge editing methods based on gated adapters, activation edits, and parameter merging techniques address catastrophic forgetting seen in full fine tuning; however, they still operate in the shared representation space of the VLM, where concepts are entangled, so edits interfere with other non relevant concepts. We hypothesize that this instability persists because current methods algorithmically control edits via optimization rather than structurally separating knowledge. We introduce Dynamic Subspace Concept Alignment (DSCA) which by design mitigates this limitation by decomposing the representation space into a set of orthogonal semantic subspaces and proposing edits only in those transformed spaces. These subspaces are obtained through incremental clustering and PCA on joint vision language representations. This process structurally isolates concepts, enabling precise, non interfering edits by turning isolation from a soft training objective into an architectural property. The surgical edits are guided by a multi term loss function for maintaining task fidelity, edit locality, and cross modal alignment. With the base model frozen, our method achieves 98 percent single edit success, remains over 95 percent after 1000 sequential edits, lowers hallucination by 3 to 5 percent, and achieves the best backward transfer (BWT) scores on continual instruction tuning benchmarks. Extensive experiments demonstrate DSCA state of the art stability and knowledge retention capability in continual lifelong editing across various datasets and benchmarks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Question Answering | VQA v2 | Accuracy84.9 | 1362 | |
| Vision-Language Capability Evaluation | MME | Score76.3 | 26 | |
| Knowledge Editing | MMEdit E-IC | Reliability98 | 22 | |
| Continual Learning | COIN | Backward Transfer (BWT)-9.37 | 20 | |
| Model Editing | E-VQA 5 | Reliability Score98.12 | 11 | |
| Model Editing | E-IC 5 | Reliability (Rel.)98 | 11 | |
| Lifelong Editing | E-VQA Lifelong Editing 5 | Relational Score96.85 | 10 | |
| Lifelong Editing | VLKEB Lifelong Editing 11 | Relational Score98.1 | 10 | |
| Knowledge Editing | E-VQA | Reliability98.12 | 6 | |
| Knowledge Editing | E-VQA 1,000 sequential edits | Reliability96.85 | 5 |