ContextCite: Attributing Model Generation to Context

About

How do language models use information provided as context when generating a response? Can we infer whether a particular generated statement is actually grounded in the context, a misinterpretation, or fabricated? To help answer these questions, we introduce the problem of context attribution: pinpointing the parts of the context (if any) that led a model to generate a particular statement. We then present ContextCite, a simple and scalable method for context attribution that can be applied on top of any existing language model. Finally, we showcase the utility of ContextCite through three applications: (1) helping verify generated statements (2) improving response quality by pruning the context and (3) detecting poisoning attacks. We provide code for ContextCite at https://github.com/MadryLab/context-cite.

Benjamin Cohen-Wang, Harshay Shah, Kristian Georgiev, Aleksander Madry• 2024

Related benchmarks

Task	Dataset	Result
Faithfulness Evaluation	TellMeWhy	AUC π-Soft-NS0.56	67
Faithfulness Evaluation	WikiBio	AUC π-Soft-NS0.79	67
Citation-augmented Question Answering	bar-GT, PK 1.0 (test)	Accuracy63.07	42
Attribution Faithfulness	LongRA	Soft-NC Score1.9	40
Attribution Alignment	Curated Attribution Dataset (NarrativeQA + SciQ)	DSA (Dependent Sentence Attribution)-0.09	40
Knowledge corruption traceback	NQ	Precision83	30
Knowledge corruption traceback	MS Marco	Precision82	26
Traceback (Prompt Injection Attacks)	MuSiQue	Precision (MuSiQue Traceback)72	23
Citation-augmented Question Answering	GT, PK 1.0 (test)	Accuracy72.49	21
Knowledge corruption traceback	HotpotQA	Precision0.74	16

Showing 10 of 36 rows

Other info

Code

Follow for update

@wizwand_team Discord