Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-hop Reasoning on StrategyQA

95.6Accuracy

OpenMath2-Llama3.1-70B*

40.89655.09869.383.502Nov 15, 2023Mar 20, 2024Jul 24, 2024Nov 28, 2024Apr 3, 2025Aug 7, 2025Dec 12, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.02
95.6
2025.02
94.3
2025.02
90.8
2025.02
88.8
2025.02
88.7
2025.02
88.2
2023.11
83.5
2023.11
82
2023.11
81.5
2025.02
81.2
2023.11
80.5
2025.12
80
2025.02
79
2025.12
79
2023.11
79
2023.11
77
2023.11
76.5
2023.11
75
2023.11
75
2025.12
74
2023.11
74
2023.11
74
2023.11
73.5
2023.11
73.5
2023.11
72
2023.11
72
2023.11
71
2023.11
64
2023.11
63
2025.03
62.3
2025.02
61.1
2023.11
61
2023.11
61
2025.03
53
2025.03
53
2025.03
43