Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DetectiveQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Long-context Question AnsweringDetectiveQA-En
Accuracy75.5
38
Long-context Question AnsweringDetectiveQA-Zh
Accuracy80
38
logical reasoningDetectiveQA
Accuracy (DetectiveQA)88.31
24
Story Question AnsweringDetectiveQA
Accuracy82.3
12
RetrievalDetectiveQA
Recall@332.22
8
RetrievalDetectiveQA-ZH
R@346.8
6
Question AnsweringDetectiveQA
Accuracy67.25
6
Showing 7 of 7 rows