Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

General QA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringGeneral QA NQ, TriviaQA, PopQA (test)
Overall Average Score51.3
49
General Question AnsweringGeneral QA NQ, TriviaQA, PopQA
NQ Accuracy51.8
34
Complexity predictionGENERAL QA MMLU+MMLU-PRO+GSM8K
ROC-AUC89.1
3
Showing 3 of 3 rows