DSCA: Dynamic Subspace Concept Alignment for Lifelong VLM Editing

About

Model editing aims to update knowledge to add new concepts and change relevant information without retraining. Lifelong editing is a challenging task, prone to disrupting previously learned concepts, especially for Vision Language Models (VLMs), because sequential edits can lead to degraded reasoning and cross modal misalignment. Existing VLM knowledge editing methods based on gated adapters, activation edits, and parameter merging techniques address catastrophic forgetting seen in full fine tuning; however, they still operate in the shared representation space of the VLM, where concepts are entangled, so edits interfere with other non relevant concepts. We hypothesize that this instability persists because current methods algorithmically control edits via optimization rather than structurally separating knowledge. We introduce Dynamic Subspace Concept Alignment (DSCA) which by design mitigates this limitation by decomposing the representation space into a set of orthogonal semantic subspaces and proposing edits only in those transformed spaces. These subspaces are obtained through incremental clustering and PCA on joint vision language representations. This process structurally isolates concepts, enabling precise, non interfering edits by turning isolation from a soft training objective into an architectural property. The surgical edits are guided by a multi term loss function for maintaining task fidelity, edit locality, and cross modal alignment. With the base model frozen, our method achieves 98 percent single edit success, remains over 95 percent after 1000 sequential edits, lowers hallucination by 3 to 5 percent, and achieves the best backward transfer (BWT) scores on continual instruction tuning benchmarks. Extensive experiments demonstrate DSCA state of the art stability and knowledge retention capability in continual lifelong editing across various datasets and benchmarks.

Gyanendra Das, Sai Satyam Jena• 2026

Related benchmarks

Task	Dataset	Result
Visual Question Answering	VQA v2	Accuracy84.9	1429
Vision-Language Capability Evaluation	MME	Score76.3	31
Knowledge Editing	MMEdit E-IC	Reliability98	26
Continual Learning	COIN	Backward Transfer (BWT)-9.37	20
Model Editing	E-VQA 5	Reliability Score98.12	11
Model Editing	E-IC 5	Reliability (Rel.)98	11
Lifelong Editing	E-VQA Lifelong Editing 5	Relational Score96.85	10
Lifelong Editing	VLKEB Lifelong Editing 11	Relational Score98.1	10
Knowledge Editing	E-VQA	Reliability98.12	10
Knowledge Editing	E-VQA 1,000 sequential edits	Reliability96.85	5

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord