Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Question Answering on HELMET RAG subset
Loading...
81.1
HotpotQA Accuracy
RAO(y)
50.212
58.231
66.25
74.269
Jan 26, 2026
HotpotQA Accuracy
NQ Accuracy
TriviaQA Accuracy
Average Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
HotpotQA Accuracy
NQ Accuracy
TriviaQA Accuracy
Average Accuracy
RAO(y)
Context lengths=4k–128...
2026.01
81.1
67.7
92.1
80.3
RID(y)
Context lengths=4k–128...
2026.01
75.2
65.5
91
77.2
RID+Q(y)
Context lengths=4k–128...
2026.01
74.1
59.1
85.5
72.9
Base
Context lengths=4k–128...
2026.01
72.2
70.7
85.8
76.2
RR+Judge(y)
Context lengths=4k–128...
2026.01
70.6
72.8
92.3
78.6
RID+C(y)
Context lengths=4k–128...
2026.01
67.9
56.9
85.8
70.2
Base
Context lengths=4k–128...
2026.01
61
61.1
86.6
69.6
SFT AO
Context lengths=4k–128...
2026.01
51.4
40.3
76.1
55.9
Feedback
Search any
task
Search any
task