Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on HotpotQA (HS and ARR)
Loading...
95.6
HS
AURA
95.288
95.369
95.45
95.531
Jan 1, 2026
HS
ARR
Updated 3mo ago
Evaluation Results
Method
Method
Links
HS
ARR
AURA
Model=GPT-4o
2026.01
95.6
100
AURA
Model=Gemini-2.5-flash
2026.01
95.5
100
AURA
Model=Qwen-2.5-7B
2026.01
95.4
100
AURA
Model=Llama2-7B
2026.01
95.3
100
Feedback
Search any
task
Search any
task