Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

QA Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Open-domain Question Answering5 QA Benchmarks Average
Average Exact Match40.9
14
Question AnsweringQA Benchmarks Zero-shot (BoolQ, Lambada, Piqa, OPQA, Winogrande, ARC-E, ARC-C, Hellaswag)
BoolQ Accuracy74.86
6
Showing 2 of 2 rows