Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on 2Wiki (In-Distribution)
Loading...
77
Accuracy
GPT-4.1
31.24
43.12
55
66.88
Jul 22, 2025
Accuracy
Relatedness
Factual Consistency
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Relatedness
Factual Consistency
GPT-4.1
Model Category=Closed-...
2025.07
77
81
18
Deliberative Searcher-DeepSeek-70B
Model Category=70B Mod...
2025.07
65
69
5
Deliberative Searcher-72B
Model Category=70B Mod...
2025.07
64
73
4
Claude Sonnet 4
Model Category=Closed-...
2025.07
61
89
8
Deliberative Searcher-7B
Model Category=7B Mode...
2025.07
55
59
3
Deliberative Searcher-7B
Model Category=7B Mode...
2025.07
52
54
6
GPT-4o
Model Category=Closed-...
2025.07
51
86
10
R1-Searcher-7B
Model Category=7B Mode...
2025.07
48
59
40
DeepSeek-R1-Distill-70B
Model Category=70B Mod...
2025.07
46
55
44
InternVL3-78B
Model Category=70B Mod...
2025.07
43
62
36
Qwen2.5-VL-72B
Model Category=70B Mod...
2025.07
41
73
23
Search-R1-7B
Model Category=7B Mode...
2025.07
36
51
43
Qwen2.5-VL-7B
Model Category=7B Mode...
2025.07
33
51
48
ReSearch-7B
Model Category=7B Models
2025.07
33
35
65
Feedback
Search any
task
Search any
task