Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-hop Reasoning on PrOntoQA (Hop Accuracy Breakdown)
Loading...
97.3
Accuracy (1 Hop)
GPT-4o
46.548
59.724
72.9
86.076
Oct 21, 2025
Accuracy (1 Hop)
Accuracy (3 Hops)
Accuracy (5 Hops)
Updated 22d ago
Evaluation Results
Method
Method
Links
Accuracy (1 Hop)
Accuracy (3 Hops)
Accuracy (5 Hops)
GPT-4o
2025.10
97.3
74.4
66.4
Llama3.1 70B it
Model Backbone=Llama3....
2025.10
96.2
66.7
62.1
AR
Model Backbone=Llama3....
2025.10
95
95.6
95.3
AR
Model Backbone=Gemma2 9B
2025.10
93.5
93.5
93.5
Gemma2 27B it
Model Backbone=Gemma2...
2025.10
91.3
77.6
73.9
DeepSeek-R1-8B
Model Backbone=DeepSee...
2025.10
80
76
64.4
Llama3.1 8B
Model Backbone=Llama3....
2025.10
51
50.8
50.3
Gemma2 9B
Model Backbone=Gemma2 9B
2025.10
48.5
47.5
47.9
Feedback
Search any
task
Search any
task