Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

QASPER

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question Answeringqasper
F1 Score36.9
61
Document Question AnsweringQasper
Accuracy40.8
44
Single-document retrievalQasper
F1 Score50.3
44
Language GenerationQASPER
Accuracy15.35
35
Question AnsweringQASPER (test)
F1 Score (Match)58.5
27
Single-hop Question AnsweringQasper
Score44.79
22
Question AnsweringQASPER 1200:251 (test)
Answerable EM28.92
20
Long-context Question AnsweringQasper
F183.09
17
Question AnsweringQasper
F1 Score0.3677
16
Question AnsweringQasper
Recall67.3
15
Question AnsweringQasper
ASR Score30
14
Multi-session Retrieval-Augmented GenerationQASPER (test)
F1 Score36
12
Speculative DecodingQasper
SR1.66
12
Single-document retrievalQasper
Latency (s)0.0054
11
Long document retrievalQasper (test)
F1 Score46.18
11
CompletenessQASPER
Kendall's Tau0.44
11
Long-context Question AnsweringQasper
Extract F154.57
10
Faithfulness EvaluationQasper yes/no question answering
AOPC@100.102
10
Question AnsweringQasper (val)
F128.8
10
Question AnsweringQASPER Multi-Document 4
Accuracy76.2
9
Question AnsweringQASPER Extractive (test)
F153.3
8
Question AnsweringQASPER Extractive (dev)
F129.6
8
Multi-hop ReasoningQASPER
EM15
7
RAG-CompletenessQASPER (test)
Mean Kendall Tau Correlation0.44
6
Question Answering GenerationQASPER NLP (test)
ROUGE-L0.2553
4
Showing 25 of 30 rows