Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WQ

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringWQ (test)
AUROC76.6
90
Question AnsweringWQ
Absolute Execution Time Overhead (s)0.039
90
Question AnsweringWQ
PRR62.8
90
Open-Domain Question AnsweringWQ (test)
EM33.71
37
Reward ModelingWQ Arena
Accuracy65.29
22
Inference EfficiencyWQ
Relative Execution Time Overhead0.014
12
Open-domain retrievalWQ
Recall@2073.2
9
Question AnsweringWQ
Accuracy45.5
8
Showing 8 of 8 rows