Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Locating and Editing Factual Associations in GPT

About

We analyze the storage and recall of factual associations in autoregressive transformer language models, finding evidence that these associations correspond to localized, directly-editable computations. We first develop a causal intervention for identifying neuron activations that are decisive in a model's factual predictions. This reveals a distinct set of steps in middle-layer feed-forward modules that mediate factual predictions while processing subject tokens. To test our hypothesis that these computations correspond to factual association recall, we modify feed-forward weights to update specific factual associations using Rank-One Model Editing (ROME). We find that ROME is effective on a standard zero-shot relation extraction (zsRE) model-editing task, comparable to existing methods. To perform a more sensitive evaluation, we also evaluate ROME on a new dataset of counterfactual assertions, on which it simultaneously maintains both specificity and generalization, whereas other methods sacrifice one or another. Our results confirm an important role for mid-layer feed-forward modules in storing factual associations and suggest that direct manipulation of computational mechanisms may be a feasible approach for model editing. The code, dataset, visualizations, and an interactive demo notebook are available at https://rome.baulab.info/

Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov• 2022

Related benchmarks

TaskDatasetResultRank
Lifelong Free-text Knowledge EditingMRLF-Bench
BLEU7.89
140
Knowledge EditingzsRE
Generality84.92
110
Knowledge EditingCounterFact
Efficacy5.10e+3
91
Knowledge InsertionWikiData recent
Edit Success Rate99.24
43
Model EditingRIPE
Reliability48.3
30
Model EditingCounterFact
Reliability41.1
30
Knowledge EditingCounterfact 10,000 facts
Relational Score2.57e+3
27
Knowledge EditingZsRE 10,000 facts
Reliability15.73
27
Personalization EditingUPQA balanced 100-sample
Explicit Accuracy100
24
Sequential Model EditingzsRE
Efficacy56.42
24
Showing 10 of 96 rows
...

Other info

Code

Follow for update