Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SAKE: Steering Activations for Knowledge Editing

About

As Large Langue Models have been shown to memorize real-world facts, the need to update this knowledge in a controlled and efficient manner arises. Designed with these constraints in mind, Knowledge Editing (KE) approaches propose to alter specific facts in pretrained models. However, they have been shown to suffer from several limitations, including their lack of contextual robustness and their failure to generalize to logical implications related to the fact. To overcome these issues, we propose SAKE, a steering activation method that models a fact to be edited as a distribution rather than a single prompt. Leveraging Optimal Transport, SAKE alters the LLM behavior over a whole fact-related distribution, defined as paraphrases and logical implications. Several numerical experiments demonstrate the effectiveness of this method: SAKE is thus able to perform more robust edits than its existing counterparts.

Marco Scialanga, Thibault Laugel, Vincent Grari, Marcin Detyniecki• 2025

Related benchmarks

TaskDatasetResultRank
Knowledge EditingCounterFact
Efficacy99.71
362
Knowledge EditingzsRE
Generality96.31
268
Knowledge EditingRippleEdits POPULAR (full requested-edit set)
Rel.99.2
30
Knowledge EditingCounterfact (first 2000 edits)
Accuracy97.7
17
Knowledge EditingPopular
Accuracy45
12
Knowledge EditingPopular dataset
CI50
10
Knowledge EditingCounterfact (first 150 edits)
DI Score98.67
8
Knowledge EditingCounterFact GPT2-XL
Accuracy99
6
Knowledge EditingCounterFact LLaMA2-7B
Accuracy (Acc)98
6
Showing 9 of 9 rows

Other info

Code

Follow for update