Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-hop Retrieval on MuSiQue
Loading...
75.5
Accuracy
CARE
62.604
65.952
69.3
72.648
Apr 20, 2026
Accuracy
F1-Score
Recall
Precision
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
F1-Score
Recall
Precision
CARE
Underlying LLM=GPT-4.1...
2026.04
75.5
67.8
51.7
98.7
Direct
Underlying LLM=GPT-4.1...
2026.04
70.2
57.8
41
98.4
Indirect
Underlying LLM=GPT-4.1...
2026.04
63.1
41.7
26.4
99.4
Feedback
Search any
task
Search any
task