Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Single-Objective Task Evaluation on 2WikiMultihopQA, HotpotQA, Bamboogle, Frames, and BrowseComp-Plus

77.5Accuracy (2WikiMultihopQA)

Search-R1 (SFT + RL)

51.81258.48165.1571.819Oct 14, 2025
Updated 26d ago

Evaluation Results

MethodLinks
2025.10
77.572.362.437.617.753.519.3
2025.10
76.77161.838.520.753.78.2
2025.10
76.470.561.635.91652.19.4
2025.10
60.36950.54326.85016.7
2025.10
5865.548.827.511.142.16.5
2025.10
56.5664728.514.142.45
2025.10
5469.244.233.51242.64.1
2025.10
53.55647.221.58.537.33.8
2025.10
52.859.145.3259.438.33.9