Share your thoughts, 1 month free Claude Pro on usSee more

Deep Research on xbench-DS

71Pass@1

DeepSeek-V3.1

Updated 4mo ago

Evaluation Results

Method	Links
DeepSeek-V3.1 2025.10		71	-
OpenAI-o3 2025.10		66.7	-
Claude-4-Sonnet 2025.10		64.6	-
WebSailor 2025.10		53.3	-
Kimi-K2 2025.10		50	-
Web-30B-E-GRPO 2025.10		46.7	66
Web-30B-GRPO 2025.10		45.3	65
Web-30B-SFT 2025.10		43.7	63
Web-7B-E-GRPO 2025.10		42	59
Web-7B-GRPO 2025.10		40.7	56
WebDancer-QwQ 2025.10		39	-
Web-7B-SFT 2025.10		37.3	55
WebSailor 2025.10		34.3	-
WebThinker-RL 2025.10		24	-
R1-Searcher 2025.10		4	-