Share your thoughts, 1 month free Claude Pro on usSee more

Agentic Web Interaction on xbench DeepSearch 2510 (test)

66Pass@1

GPT-5

Updated 3mo ago

Evaluation Results

Method	Links
GPT-5 2026.04		66	-
DeepSeek-V3.2 2026.04		51	-
GLM-4.6 2026.04		47	-
DeepSeek-V3.1 2026.04		44	-
LThinker++ 2026.04		44	60
Vanilla-Agent 2026.04		38.3	53
Claude-4-Sonnet 2026.04		35	-
Kimi-K2-Instruct 2026.04		30	-
Qwen3-235B-A22B-Instruct 2026.04		27	-
Qwen3-30B-A3B-Thinking 2026.04		8.7	16