Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

StrQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringStrQA
Accuracy84.45
24
Step-level correctness assessmentStrQA (test)
PR-AUC39.5
22
Step-level reasoning verificationStrQA
PR-AUC52.7
19
Common Sense ReasoningStrQA
Accuracy79.65
6
Showing 4 of 4 rows