Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-hop Retrieval on HotPotQA (Accuracy, F1, Precision, Recall)
Loading...
82.7
Accuracy
CARE
63.46
68.455
73.45
78.445
Apr 20, 2026
Accuracy
F1-Score
Recall
Precision
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
F1-Score
Recall
Precision
CARE
Underlying LLM=GPT-4.1...
2026.04
82.7
81.4
75.7
88
Direct
Underlying LLM=GPT-4.1...
2026.04
72
65.8
54
84.4
Indirect
Underlying LLM=GPT-4.1...
2026.04
64.2
47.4
32.2
89.8
Feedback
Search any
task
Search any
task