Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

KILT

Benchmarks

Task NameDataset NameSOTA ResultTrend
Knowledge-Intensive Language TasksKILT (test)
WoW F1 Score21.6
29
Page-level retrievalKILT (test)
WoW Score62.9
28
Question AnsweringKILT Benchmark Natural Questions, TriviaQA, HotpotQA (test)
EM Score (TriviaQA)85.6
10
Passage-level RetrievalKILT (dev)
FEV Score67.8
8
Question AnsweringKILT TriviaQA
Match Score91.9
5
Question AnsweringKILT HotpotQA
Match51.9
5
Slot FillingKILT T-REX (test)
Retrieval Score75.5
4
Slot FillingKILT ZSRE (test)
Retrieval80.5
4
Long-form Question AnsweringKILT ELI5 (test)
Retrieval Score36.3
4
Knowledge-grounded DialogueKILT WoW (test)
Retrieval49.8
4
Open Domain Question AnsweringKILT NQ* (test)
Retrieval Rate65.1
4
Paragraph-level RetrievalKILT benchmark
FEVER Score62.8
4
Long-form Question AnsweringKILT ELI5 (dev test)
RL Score26.3
3
Showing 13 of 13 rows