Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on Overall Average (Acc., Rel., FC)
Loading...
55
Accuracy
Claude Sonnet 4
19.64
28.82
38
47.18
Jul 22, 2025
Accuracy
Relevance
Full Coverage (FC)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Relevance
Full Coverage (FC)
Claude Sonnet 4
Model Category=Closed-...
2025.07
55
75
24
GPT-4.1
Model Category=Closed-...
2025.07
54
58
42
Deliberative Searcher-72B
Model Category=70B Mod...
2025.07
48
75
9
GPT-4o
Model Category=Closed-...
2025.07
41
73
26
Deliberative Searcher-DeepSeek-70B
Model Category=70B Mod...
2025.07
41
75
9
Deliberative Searcher-7B
Model Category=7B Mode...
2025.07
35
75
2
R1-Searcher-7B
Model Category=7B Mode...
2025.07
34
46
54
Deliberative Searcher-7B
Model Category=7B Mode...
2025.07
33
74
3
Qwen2.5-VL-72B
Model Category=70B Mod...
2025.07
31
58
41
InternVL3-78B
Model Category=70B Mod...
2025.07
30
53
46
DeepSeek-R1-Distill-70B
Model Category=70B Mod...
2025.07
30
41
59
ReSearch-7B
Model Category=7B Models
2025.07
26
30
70
Search-R1-7B
Model Category=7B Mode...
2025.07
24
49
49
Qwen2.5-VL-7B
Model Category=7B Mode...
2025.07
21
52
48
Feedback
Search any
task
Search any
task