Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RARR: Researching and Revising What Language Models Say, Using Language Models

About

Language models (LMs) now excel at many tasks such as few-shot learning, question answering, reasoning, and dialog. However, they sometimes generate unsupported or misleading content. A user cannot easily determine whether their outputs are trustworthy or not, because most LMs do not have any built-in mechanism for attribution to external evidence. To enable attribution while still preserving all the powerful advantages of recent generation models, we propose RARR (Retrofit Attribution using Research and Revision), a system that 1) automatically finds attribution for the output of any text generation model and 2) post-edits the output to fix unsupported content while preserving the original output as much as possible. When applied to the output of several state-of-the-art LMs on a diverse set of generation tasks, we find that RARR significantly improves attribution while otherwise preserving the original input to a much greater degree than previously explored edit models. Furthermore, the implementation of RARR requires only a handful of training examples, a large language model, and standard web search.

Luyu Gao, Zhuyun Dai, Panupong Pasupat, Anthony Chen, Arun Tejasvi Chaganty, Yicheng Fan, Vincent Y. Zhao, Ni Lao, Hongrae Lee, Da-Cheng Juan, Kelvin Guu• 2022

Related benchmarks

TaskDatasetResultRank
Multi-document summarizationMDS
Length843.6
14
Long-form Question AnsweringALCE LFQA
ROUGE-L35.2
7
Scientific Fact-CheckingBIONLI 300
Balanced Accuracy66.4
7
Scientific Fact-CheckingCLIMATE-FEVER 2-way Supported Refuted subsets
Balanced Accuracy70.4
7
Cell-level attributionFetaQA (gold set)
Precision0.2005
6
Cell-level attributionAITQA
Precision31.96
6
Column-Level AttributionToTTo
Precision90.51
6
Row-Level AttributionAITQA
Precision66.82
6
Scientific Fact-CheckingPubMedFact1k 3-way
Macro F172.3
6
Cell-level attributionToTTo (gold set)
Precision20.51
6
Showing 10 of 23 rows

Other info

Code

Follow for update