Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

NarrativeQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringNarrativeQA
F1 Score30.41
87
Question AnsweringNarrativeQA (test)
ROUGE-L76.2
61
Question AnsweringNarrativeQA
Score21.96
40
Long-context Question AnsweringNarrativeQA
F1 Score53.56
38
Question AnsweringNarrativeQA
F133.61
36
Single-hop Question AnsweringNarrativeQA
Score23.89
22
Question AnsweringNarrativeQA
EM34.79
18
Question AnsweringNarrativeQA No Trun. latest (test)
Average Score23.96
18
Question AnsweringNarrativeQA
F1 Score28.94
16
Long narrative understanding QANarrativeQA
Accuracy55
14
Multi-session Retrieval-Augmented GenerationNarrativeQA (test)
F1 Score38.4
12
Document RetrievalNarrativeQA (test)
nDCG@1061.7
12
Long-context Question AnsweringNarrativeQA
Exact Match61.7
11
Question AnsweringNarrativeQA Helmet benchmark
F1 Score49.5
9
Question AnsweringNarrativeQA Trun. latest (test)
Average Score21.34
9
RetrievalNarrativeQA
Recall@329.11
8
Reading ComprehensionNarrativeQA (test)
BLEU-154.11
8
Reading ComprehensionNarrativeQA summaries
BLEU-136.55
8
Question AnsweringNarrativeQA
Prefill Throughput (tok/s)24,686.84
6
Latency EvaluationNarrativeQA
End-to-End Latency2.1
6
Question AnsweringNarrativeQA summaries (test)
BLEU-143.63
6
Reading ComprehensionNarrativeQA Story Summaries (val)
BLEU-152.78
6
Question AnsweringNarrativeQA
ROUGE-L0.32
5
Question AnsweringNarrativeQA (dev)
ROUGE-L31.6
4
Multi-mention reading comprehensionNarrativeQA (test)
ROUGE-L58.8
4
Showing 25 of 29 rows