Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on MuSiQue (In-Distribution)
Loading...
44
Accuracy
Claude Sonnet 4
10.72
19.36
28
36.64
Jul 22, 2025
Accuracy
Relevance
Full Coverage
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Relevance
Full Coverage
Claude Sonnet 4
Model Category=Closed-...
2025.07
44
57
42
GPT-4.1
Model Category=Closed-...
2025.07
43
45
55
Deliberative Searcher-72B
Model Category=70B Mod...
2025.07
37
71
14
Deliberative Searcher-DeepSeek-70B
Model Category=70B Mod...
2025.07
34
72
11
GPT-4o
Model Category=Closed-...
2025.07
33
52
47
Deliberative Searcher-7B
Model Category=7B Mode...
2025.07
29
74
2
Deliberative Searcher-7B
Model Category=7B Mode...
2025.07
27
71
2
R1-Searcher-7B
Model Category=7B Mode...
2025.07
26
35
65
Qwen2.5-VL-72B
Model Category=70B Mod...
2025.07
25
50
49
InternVL3-78B
Model Category=70B Mod...
2025.07
24
41
58
DeepSeek-R1-Distill-70B
Model Category=70B Mod...
2025.07
23
32
68
ReSearch-7B
Model Category=7B Models
2025.07
18
22
78
Search-R1-7B
Model Category=7B Mode...
2025.07
16
45
53
Qwen2.5-VL-7B
Model Category=7B Mode...
2025.07
12
48
52
Feedback
Search any
task
Search any
task