Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-hop Reasoning on PrOntoQA (Hop Accuracy Breakdown)

97.3Accuracy (1 Hop)

GPT-4o

46.54859.72472.986.076Oct 21, 2025
Updated 22d ago

Evaluation Results

MethodLinks
2025.10
97.374.466.4
2025.10
96.266.762.1
2025.10
9595.695.3
2025.10
93.593.593.5
2025.10
91.377.673.9
2025.10
807664.4
2025.10
5150.850.3
2025.10
48.547.547.9