Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Single-Hop Question Answering on NQ (in-domain)
Loading...
48.2
Accuracy
GEPO
10.136
20.018
29.9
39.782
Oct 30, 2025
Nov 16, 2025
Dec 3, 2025
Dec 20, 2025
Jan 6, 2026
Jan 23, 2026
Feb 9, 2026
Accuracy
Updated 5d ago
Evaluation Results
Method
Method
Links
Accuracy
GEPO
Type=RL Training, Mode...
2025.10
48.2
GiGPO
Type=RL Training, Mode...
2025.10
46.4
SKILLRL
Backbone=Qwen2.5-7B-In...
2026.02
45.9
ZeroSearch
Backbone=Qwen2.5-7B-In...
2026.02
43.6
GEPO
Type=RL Training, Mode...
2025.10
43.6
ZeroSearch
Type=RL Training, Mode...
2025.10
43.6
EvolveR
Backbone=Qwen2.5-7B-In...
2026.02
43.5
GiGPO
Type=RL Training, Mode...
2025.10
42
ZeroSearch
Type=RL Training, Mode...
2025.10
41.4
Search-R1
Backbone=Qwen2.5-7B-In...
2026.02
39.3
Search-R1
Type=RL Training, Mode...
2025.10
39.3
Search-R1
Type=RL Training, Mode...
2025.10
34.1
RAG
Backbone=Qwen2.5-7B-In...
2026.02
27.4
R1-Instruct
Type=RL Training, Mode...
2025.10
27
R1-Instruct
Backbone=Qwen2.5-7B-In...
2026.02
21
R1-Instruct
Type=RL Training, Mode...
2025.10
21
Search-o1
Backbone=Qwen2.5-7B-In...
2026.02
19.4
CoT
Backbone=Qwen2.5-7B-In...
2026.02
12.8
Qwen2.5
Backbone=Qwen2.5-7B-In...
2026.02
11.6
Feedback
Search any
task
Search any
task