Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-hop Reasoning on StrategyQA

95.6Accuracy

OpenMath2-Llama3.1-70B*

26.64844.54962.4580.351Nov 15, 2023Apr 17, 2024Sep 18, 2024Feb 19, 2025Jul 23, 2025Dec 24, 2025May 28, 2026
Updated 5d ago

Evaluation Results

MethodLinks
2025.02
95.6
2025.02
94.3
2026.05
92.5
2025.02
90.8
2025.02
88.8
2025.02
88.7
2025.02
88.2
2026.05
87.6
2026.05
86.2
2026.05
85.8
2026.05
84.5
2026.05
84
2023.11
83.5
2026.05
83.3
2023.11
82
2023.11
81.5
2025.02
81.2
2023.11
80.5
2025.12
80
2025.02
79
2025.12
79
2023.11
79
2026.05
78.4
2023.11
77
2023.11
76.5
2023.11
75
2023.11
75
2025.12
74
2023.11
74
2023.11
74
2023.11
73.5
2023.11
73.5
2023.11
72
2023.11
72
2023.11
71
2026.05
65.2
2023.11
64
2023.11
63
2025.03
62.3
2025.02
61.1
2023.11
61
2023.11
61
2025.03
53
2025.03
53
2025.03
43
2026.04
33.9
2026.04
31.5
2026.04
30.9
2026.04
30.6
2026.04
29.3