Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NarrativeQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringNarrativeQA
F1 Score32.6
92
Question AnsweringNarrativeQA (test)
ROUGE-L76.2
68
Question AnsweringNarrativeQA
Score21.96
40
Long-context Question AnsweringNarrativeQA
F1 Score53.56
38
Long-context Question AnsweringNarrativeQA
SubEM22
36
Question AnsweringNarrativeQA
F133.61
36
Question AnsweringNarrativeQA LongBench
F1 Score14.76
24
Single-hop Question AnsweringNarrativeQA
Score23.89
22
Long-context Question AnsweringNarrativeQA Passage Split
Score32.64
18
Long-context Question AnsweringNarrativeQA Fixed Chunk 2048
Score32.64
18
Question AnsweringNarrativeQA
EM34.79
18
Question AnsweringNarrativeQA No Trun. latest (test)
Average Score23.96
18
Question AnsweringNarrativeQA
ROUGE-L28.1
17
Question AnsweringNarrativeQA
F1 Score28.94
16
Long narrative understanding QANarrativeQA
Accuracy55
14
Traceback (Prompt Injection Attacks)NarrativeQA
Precision98
13
Multi-session Retrieval-Augmented GenerationNarrativeQA (test)
F1 Score38.4
12
Document RetrievalNarrativeQA (test)
nDCG@1061.7
12
Long-context Question AnsweringNarrativeQA
Exact Match61.7
11
Context TracebackNarrativeQA LongBench
Precision96
10
Question AnsweringNarrativeQA
Ragas Answer Relevance58.08
9
Question AnsweringNarrativeQA Helmet benchmark
F1 Score49.5
9
Question AnsweringNarrativeQA Trun. latest (test)
Average Score21.34
9
RetrievalNarrativeQA
Recall@329.11
8
Reading ComprehensionNarrativeQA (test)
BLEU-154.11
8
Showing 25 of 44 rows