Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CosmosQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Commonsense Question AnsweringCosmosQA
Accuracy94
54
Commonsense Question AnsweringCosmosQA (test)
EM92.25
24
Binary ClassificationCosmosQA
Accuracy90
18
Reading ComprehensionCosmosQA (test)
Accuracy91.8
5
Reasoning trace quality evaluationCosmosQA
Grammar Score2.1
2
Showing 5 of 5 rows