ContextCite: Attributing Model Generation to Context
About
How do language models use information provided as context when generating a response? Can we infer whether a particular generated statement is actually grounded in the context, a misinterpretation, or fabricated? To help answer these questions, we introduce the problem of context attribution: pinpointing the parts of the context (if any) that led a model to generate a particular statement. We then present ContextCite, a simple and scalable method for context attribution that can be applied on top of any existing language model. Finally, we showcase the utility of ContextCite through three applications: (1) helping verify generated statements (2) improving response quality by pruning the context and (3) detecting poisoning attacks. We provide code for ContextCite at https://github.com/MadryLab/context-cite.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Faithfulness Evaluation | TellMeWhy | AUC π-Soft-NS0.56 | 67 | |
| Faithfulness Evaluation | WikiBio | AUC π-Soft-NS0.79 | 67 | |
| Citation-augmented Question Answering | bar-GT, PK 1.0 (test) | Accuracy63.07 | 42 | |
| Attribution Faithfulness | LongRA | Soft-NC Score1.9 | 40 | |
| Attribution Alignment | Curated Attribution Dataset (NarrativeQA + SciQ) | DSA (Dependent Sentence Attribution)-0.09 | 40 | |
| Knowledge corruption traceback | NQ | Precision83 | 30 | |
| Knowledge corruption traceback | MS Marco | Precision82 | 26 | |
| Traceback (Prompt Injection Attacks) | MuSiQue | Precision (MuSiQue Traceback)72 | 23 | |
| Citation-augmented Question Answering | GT, PK 1.0 (test) | Accuracy72.49 | 21 | |
| Knowledge corruption traceback | HotpotQA | Precision0.74 | 16 |