Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

QA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringQA Zero-shot Average
QA Zero-shot Average73.45
57
Question AnsweringQA
Speedup Factor3.66
47
Question AnsweringQA
Average Rank1.73
40
Question AnsweringQA OOD StrQA SciQA
StrQA Accuracy98.3
28
Question AnsweringQA (OBQA, ARC-E, ARC-C, CQA)
OBQA Accuracy53.6
20
Legal Text ClassificationQA
Accuracy85.72
18
Question AnsweringQA ExAnte (test)
1d Leakage Rate1.6
15
Question AnsweringQA
Performance Score63.31
12
Question AnsweringQA
ASR Score (Before)70
12
Question AnsweringQA
Accuracy59.5
12
Question AnsweringQA 8-objective
EM37.6
11
SteeringQA
Steering Success62.5
11
Text GenerationQA
Throughput (tokens/s)117.17
10
Question AnsweringQA benchmarks
ReCoRD Score80.86
9
Question AnsweringQA domain average
Best Accuracy85.2
8
Critique Quality EvaluationQA
Win Rate75
6
Question AnsweringQA 12 languages
Score72.9
5
Question AnsweringQA Qwen2-7B-Instruct v1 (test)
Acceptance Length (τ)2.57
4
Speculative DecodingQa
Speedup2.23
3
Showing 19 of 19 rows