Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Search on XBench
Loading...
74
Score
OpenSeeker-v1-Data-11.7k
18.88
33.19
47.5
61.81
Jan 30, 2026
Feb 6, 2026
Feb 14, 2026
Feb 21, 2026
Mar 1, 2026
Mar 8, 2026
Mar 16, 2026
Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score
OpenSeeker-v1-Data-11.7k
# Samples=11.7 k, # OS...
2026.03
74
Openresearcher
# Samples=96k, # OS Sa...
2026.03
65
SYNTHAGENT-8B
Reasoning protocol=non...
2026.01
45
Qwen3-235B
Reasoning protocol=non...
2026.01
43
SYNTHAGENT-14B
Reasoning protocol=non...
2026.01
43
ToolStar-14B
Reasoning protocol=non...
2026.01
40
ToolStar-8B
Reasoning protocol=non...
2026.01
33
Qwen3-32B
Reasoning protocol=non...
2026.01
25
Qwen3-14B
Reasoning protocol=non...
2026.01
21
Feedback
Search any
task
Search any
task