Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

KILT: a Benchmark for Knowledge Intensive Language Tasks

About

Challenging problems such as open-domain question answering, fact checking, slot filling and entity linking require access to large, external knowledge sources. While some models do well on individual tasks, developing general models is difficult as each task might require computationally expensive indexing of custom knowledge sources, in addition to dedicated infrastructure. To catalyze research on models that condition on specific information in large textual resources, we present a benchmark for knowledge-intensive language tasks (KILT). All tasks in KILT are grounded in the same snapshot of Wikipedia, reducing engineering turnaround through the re-use of components, as well as accelerating research into task-agnostic memory architectures. We test both task-specific and general baselines, evaluating downstream performance in addition to the ability of the models to provide provenance. We find that a shared dense vector index coupled with a seq2seq model is a strong baseline, outperforming more tailor-made approaches for fact checking, open-domain question answering and dialogue, and yielding competitive results on entity linking and slot filling, by generating disambiguated text. KILT data and code are available at https://github.com/facebookresearch/KILT.

Fabio Petroni, Aleksandra Piktus, Angela Fan, Patrick Lewis, Majid Yazdani, Nicola De Cao, James Thorne, Yacine Jernite, Vladimir Karpukhin, Jean Maillard, Vassilis Plachouras, Tim Rockt\"aschel, Sebastian Riedel• 2020

Related benchmarks

TaskDatasetResultRank
Long-form Question AnsweringELI5 (test)
ROUGE-L17.41
54
Knowledge-Intensive Language TasksKILT (test)
WoW F1 Score0.152
29
Page-level retrievalKILT (test)
WoW Score8.8
28
Slot FillingzsRE KILT (test)
KILT Accuracy36.83
12
Long-form Question AnsweringELI5 (val)
F118.8
11
RetrievalELI5 KILT (test)
Retrieval Precision10.7
8
Open-domain dialogueWizard-of-Wikipedia KILT (test)
F1 Score15.19
8
Long-form Question AnsweringELI5 KILT (test)
F117.9
8
Relation ExtractionRE Zero-Shot
Accuracy44.74
8
Question AnsweringNQ
EM48.8
5
Showing 10 of 13 rows

Other info

Code

Follow for update