Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on HotpotQA (EM, ROUGE, F1)
Loading...
32
EM
GRIP
7.04
13.52
20
26.48
Apr 13, 2026
EM
ROUGE
F1
Updated 4d ago
Evaluation Results
Method
Method
Links
EM
ROUGE
F1
GRIP
Backbone=Qwen2.5-7B-In...
2026.04
32
37.6
44.6
GRIP
Backbone=Qwen-3-4B-Ins...
2026.04
31.9
39.7
46.2
Single-RAG
Backbone=Qwen2.5-7B-In...
2026.04
30.6
34.9
40.9
Robust-RAG
Backbone=Qwen-3-4B-Ins...
2026.04
27.3
30.8
36.7
Robust-RAG
Backbone=Qwen2.5-7B-In...
2026.04
26.1
30.8
35.5
R1-Searcher
Backbone=Qwen2.5-7B-In...
2026.04
20.2
24.3
28.8
Single-RAG
Backbone=Qwen-3-4B-Ins...
2026.04
18.4
24.6
28.8
Instruct
Backbone=Qwen2.5-7B-In...
2026.04
18.3
21.5
26
R1-Searcher
Backbone=Qwen-3-4B-Ins...
2026.04
18.2
21.2
25.4
Instruct
Backbone=Qwen-3-4B-Ins...
2026.04
8
13.6
16.5
Feedback
Search any
task
Search any
task