Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

QA Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringQA Benchmarks Zero-shot (BoolQ, Lambada, Piqa, OPQA, Winogrande, ARC-E, ARC-C, Hellaswag)
BoolQ Accuracy74.86
6
Showing 1 of 1 rows