Share your thoughts, 1 month free Claude Pro on usSee more

Multi-hop Reasoning on PrOntoQA (Hop Accuracy Breakdown)

97.3Accuracy (1 Hop)

GPT-4o

Updated 2mo ago

Evaluation Results

Method	Links
GPT-4o 2025.10		97.3	74.4	66.4
Llama3.1 70B it 2025.10		96.2	66.7	62.1
AR 2025.10		95	95.6	95.3
AR 2025.10		93.5	93.5	93.5
Gemma2 27B it 2025.10		91.3	77.6	73.9
DeepSeek-R1-8B 2025.10		80	76	64.4
Llama3.1 8B 2025.10		51	50.8	50.3
Gemma2 9B 2025.10		48.5	47.5	47.9