Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-hop Reasoning on FanOutQA
Loading...
71.84
F1 Score
TOTAL
42.408
50.049
57.69
65.331
Oct 8, 2025
F1 Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
F1 Score
TOTAL
Model=Gemini, Promptin...
2025.10
71.84
TOTAL
Model=Claude, Promptin...
2025.10
69.99
TOTAL
Model=GPT, Prompting M...
2025.10
69.07
CIC + COT
Model=Gemini, Promptin...
2025.10
66.97
CIC
Model=Gemini, Promptin...
2025.10
66.44
CIC + COT
Model=GPT, Prompting M...
2025.10
66.35
CIC + COT
Model=Claude, Promptin...
2025.10
66.29
CIC
Model=Claude, Promptin...
2025.10
63.74
CIC
Model=GPT, Prompting M...
2025.10
63.39
COT
Model=GPT, Prompting M...
2025.10
49.09
NAÏVE
Model=GPT, Prompting M...
2025.10
48.77
NAÏVE
Model=Claude, Promptin...
2025.10
46.72
NAÏVE
Model=Gemini, Promptin...
2025.10
46.54
COT
Model=Claude, Promptin...
2025.10
45.54
COT
Model=Gemini, Promptin...
2025.10
43.54
Feedback
Search any
task
Search any
task