Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multi-hop Reasoning on StrategyQA

95.6Accuracy

OpenMath2-Llama3.1-70B*

59.61668.95878.387.642Nov 15, 2023Mar 20, 2024Jul 24, 2024Nov 28, 2024Apr 3, 2025Aug 7, 2025Dec 12, 2025
Updated 3d ago

Evaluation Results

MethodLinks
2025.02
95.6
2025.02
94.3
2025.02
90.8
2025.02
88.8
2025.02
88.7
2025.02
88.2
2023.11
83.5
2023.11
82
2023.11
81.5
2025.02
81.2
2023.11
80.5
2025.12
80
2025.02
79
2025.12
79
2023.11
79
2023.11
77
2023.11
76.5
2023.11
75
2023.11
75
2025.12
74
2023.11
74
2023.11
74
2023.11
73.5
2023.11
73.5
2023.11
72
2023.11
72
2023.11
71
2023.11
64
2023.11
63
2025.02
61.1
2023.11
61
2023.11
61