Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-hop Reasoning on CRAG
Loading...
30.08
F1 Score
TOTAL
16.8096
20.2548
23.7
27.1452
Oct 8, 2025
F1 Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
F1 Score
TOTAL
Model=Claude, Promptin...
2025.10
30.08
TOTAL
Model=Gemini, Promptin...
2025.10
27.71
TOTAL
Model=GPT, Prompting M...
2025.10
26.31
CIC + COT
Model=Gemini, Promptin...
2025.10
25.77
NAÏVE
Model=GPT, Prompting M...
2025.10
25.73
CIC
Model=Gemini, Promptin...
2025.10
25.45
COT
Model=Gemini, Promptin...
2025.10
24.62
COT
Model=GPT, Prompting M...
2025.10
23.24
CIC
Model=GPT, Prompting M...
2025.10
22.12
NAÏVE
Model=Gemini, Promptin...
2025.10
22.03
CIC + COT
Model=GPT, Prompting M...
2025.10
21.72
NAÏVE
Model=Claude, Promptin...
2025.10
20.49
COT
Model=Claude, Promptin...
2025.10
20.32
CIC + COT
Model=Claude, Promptin...
2025.10
18.86
CIC
Model=Claude, Promptin...
2025.10
17.32
Feedback
Search any
task
Search any
task