Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ContextCite: Attributing Model Generation to Context

About

How do language models use information provided as context when generating a response? Can we infer whether a particular generated statement is actually grounded in the context, a misinterpretation, or fabricated? To help answer these questions, we introduce the problem of context attribution: pinpointing the parts of the context (if any) that led a model to generate a particular statement. We then present ContextCite, a simple and scalable method for context attribution that can be applied on top of any existing language model. Finally, we showcase the utility of ContextCite through three applications: (1) helping verify generated statements (2) improving response quality by pruning the context and (3) detecting poisoning attacks. We provide code for ContextCite at https://github.com/MadryLab/context-cite.

Benjamin Cohen-Wang, Harshay Shah, Kristian Georgiev, Aleksander Madry• 2024

Related benchmarks

TaskDatasetResultRank
Citation-augmented Question Answeringbar-GT, PK 1.0 (test)
Accuracy63.07
42
Citation-augmented Question AnsweringGT, PK 1.0 (test)
Accuracy72.49
21
AttributionKV Retrieval (test)
Accuracy1
9
Context AttributionCNN Dailymail (1000 examples)
Log Probability Drop1.48
9
Retriever-Generator Attribution AgreementFSS
WARG-0.9165
4
Retriever-Generator Attribution AgreementTC
WARG-0.90.67
4
Prompt injection detectionNeuralExec (test)
Detection Accuracy (Top-1)98.8
3
Showing 7 of 7 rows

Other info

Code

Follow for update