Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DeepSearch

Benchmarks

Task NameDataset NameSOTA ResultTrend
Retrieval-Augmented Question AnsweringDeepSearch Average
SR57.1
23
Retrieval-Augmented Question AnsweringDeepSearch Bamboogle
Success Rate (SR)72
23
Retrieval-Augmented Question AnsweringDeepSearch Musique
SR46
23
Retrieval-Augmented Question AnsweringDeepSearch 2wiki
Success Rate (SR)68
23
Retrieval-Augmented Question AnsweringDeepSearch HotpotQA
Success Rate56
23
Retrieval-Augmented Question AnsweringDeepSearch PopQA
Success Rate64
23
Retrieval-Augmented Question AnsweringDeepSearch TriviaQA
Success Rate (SR)78
23
Retrieval-Augmented Question AnsweringDeepSearch NQ
SR86
23
Deep Research TaskDeepSearch
Accuracy (%)47
11
Showing 9 of 9 rows