Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NarrativeQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringNarrativeQA
F1 Score38.12
124
Question AnsweringNarrativeQA (test)
ROUGE-L76.2
88
Question AnsweringNarrativeQA
Score21.96
40
Question AnsweringNarrativeQA
EM15
38
Long-context Question AnsweringNarrativeQA
F1 Score53.56
38
Long-context Question AnsweringNarrativeQA
SubEM22
36
Question AnsweringNarrativeQA
F133.61
36
Question AnsweringNarrativeQA
BLEU-121.5
28
Question AnsweringNarrativeQA LongBench
F1 Score14.76
24
Single-hop Question AnsweringNarrativeQA
Score23.89
22
Long-context Question AnsweringNarrativeQA Passage Split
Score32.64
18
Long-context Question AnsweringNarrativeQA Fixed Chunk 2048
Score32.64
18
Question AnsweringNarrativeQA
EM34.79
18
Question AnsweringNarrativeQA No Trun. latest (test)
Average Score23.96
18
Question AnsweringNarrativeQA
F1 Score28.94
16
Question AnsweringNarrativeQA
Rouge-L45
15
Long narrative understanding QANarrativeQA
Accuracy55
14
Traceback (Prompt Injection Attacks)NarrativeQA
Precision98
13
Question AnsweringNarrativeQA
TTFT (ms)355.12
12
Question AnsweringNarrativeQA
Peak GPU Memory (GB)0.58
12
Question AnsweringNarrativeQA LongBench 32K context
F1 Score17.2
12
Multi-session Retrieval-Augmented GenerationNarrativeQA (test)
F1 Score38.4
12
Document RetrievalNarrativeQA (test)
nDCG@1061.7
12
Prompt Injection AttackNarrativeQA
ASR86
11
Long-context Question AnsweringNarrativeQA
Exact Match61.7
11
Showing 25 of 56 rows