Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Information Extraction on xbench-deepsearch Out-of-Distribution
Loading...
47
Accuracy
Claude Sonnet 4
10.6
20.05
29.5
38.95
Jul 22, 2025
Accuracy
Relevance
F1-Coverage
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Relevance
F1-Coverage
Claude Sonnet 4
Model Category=Closed-...
2025.07
47
71
29
GPT-4.1
Model Category=Closed-...
2025.07
38
43
57
Deliberative Searcher-72B
Model Category=70B Mod...
2025.07
35
77
9
GPT-4o
Model Category=Closed-...
2025.07
32
69
31
InternVL3-78B
Model Category=70B Mod...
2025.07
24
53
47
Qwen2.5-VL-72B
Model Category=70B Mod...
2025.07
23
60
39
Deliberative Searcher-DeepSeek-70B
Model Category=70B Mod...
2025.07
18
80
8
R1-Searcher-7B
Model Category=7B Mode...
2025.07
17
36
63
ReSearch-7B
Model Category=7B Models
2025.07
17
23
77
Deliberative Searcher-7B
Model Category=7B Mode...
2025.07
16
86
1
Deliberative Searcher-7B
Model Category=7B Mode...
2025.07
15
86
1
Search-R1-7B
Model Category=7B Mode...
2025.07
14
48
52
DeepSeek-R1-Distill-70B
Model Category=70B Mod...
2025.07
14
40
60
Qwen2.5-VL-7B
Model Category=7B Mode...
2025.07
12
58
42
Feedback
Search any
task
Search any
task