Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Single-Hop Question Answering on PopQA out-of-domain
Loading...
52.4
Accuracy
GEPO
-0.848
12.976
26.8
40.624
Oct 30, 2025
Nov 16, 2025
Dec 3, 2025
Dec 20, 2025
Jan 6, 2026
Jan 23, 2026
Feb 9, 2026
Accuracy
Updated 5d ago
Evaluation Results
Method
Method
Links
Accuracy
GEPO
Type=RL Training, Mode...
2025.10
52.4
ZeroSearch
Backbone=Qwen2.5-7B-In...
2026.02
51.5
ZeroSearch
Type=RL Training, Mode...
2025.10
51.5
GEPO
Type=RL Training, Mode...
2025.10
47.2
GiGPO
Type=RL Training, Mode...
2025.10
46.1
SKILLRL
Backbone=Qwen2.5-7B-In...
2026.02
45.9
ZeroSearch
Type=RL Training, Mode...
2025.10
44.8
EvolveR
Backbone=Qwen2.5-7B-In...
2026.02
44.6
GiGPO
Type=RL Training, Mode...
2025.10
42.4
Search-R1
Backbone=Qwen2.5-7B-In...
2026.02
39.7
Search-R1
Type=RL Training, Mode...
2025.10
39.7
Search-R1
Type=RL Training, Mode...
2025.10
37.8
R1-Instruct
Type=RL Training, Mode...
2025.10
19.9
RAG
Backbone=Qwen2.5-7B-In...
2026.02
17.8
R1-Instruct
Backbone=Qwen2.5-7B-In...
2026.02
17.1
R1-Instruct
Type=RL Training, Mode...
2025.10
17.1
Search-o1
Backbone=Qwen2.5-7B-In...
2026.02
11.4
CoT
Backbone=Qwen2.5-7B-In...
2026.02
3.8
Qwen2.5
Backbone=Qwen2.5-7B-In...
2026.02
1.2
Feedback
Search any
task
Search any
task