Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-hop Reasoning on HotpotQA (Accuracy, Success Rate)

72.2Accuracy

Qwen3-14B

29.14440.32251.562.678Apr 7, 2026Apr 15, 2026Apr 24, 2026May 3, 2026May 12, 2026May 21, 2026May 30, 2026
Updated 1d ago

Evaluation Results

MethodLinks
2026.05
72.2---
2026.05
71.4---
2026.05
70.9---
2026.05
70.3---
2026.05
69.6---
2026.05
68.7---
2026.05
68.3---
2026.05
67.1---
2026.05
66.5---
2026.05
60.9---
2026.05
60.9---
2026.05
60.6---
2026.05
57.6---
2026.05
56.5---
2026.05
55.8---
2026.04
500.63--
2026.05
49.9---
2026.05
48.5---
2026.05
47.7---
2026.05
41.8---
2026.04
380.41080
2026.05
34.8---
2026.05
30.8---