Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

QASPER

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringQASPER (test)
F1 Score (Match)61.5
132
Question Answeringqasper
F1 Score36.9
61
Document Question AnsweringQasper
Accuracy40.8
44
Single-document retrievalQasper
F1 Score50.3
44
Text Question AnsweringQasper
Accuracy61.3
37
Language GenerationQASPER
Accuracy15.35
35
RetrievalQASPER (test)
F1 Score30.27
30
Single-hop Question AnsweringQasper
Score44.79
22
Question AnsweringQASPER 1200:251 (test)
Answerable EM28.92
20
Question AnsweringQasper
Precision84
18
Question AnsweringQasper
EM Score65
18
Long-context Question AnsweringQasper 128K context
F1 Score39
18
Long-context Question AnsweringQasper
F183.09
17
Question AnsweringQASPER Long-doc
R@161.91
16
Uncertainty EstimationQASPER
AUROC0.722
16
Question AnsweringQasper
F1 Score0.3677
16
Question AnsweringQASPER
Rouge-L38.8
15
Question AnsweringQasper
Recall67.3
15
Question AnsweringQasper
ASR Score30
14
Question AnsweringQASPER
TTFT (ms)233.2
12
Question AnsweringQASPER
Peak GPU Memory (GB)0.53
12
Multi-session Retrieval-Augmented GenerationQASPER (test)
F1 Score36
12
Speculative DecodingQasper
SR1.66
12
Single-document retrievalQasper
Latency (s)0.0054
11
Long document retrievalQasper (test)
F1 Score46.18
11
Showing 25 of 48 rows