Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Math Word Problem Solving on GSM8K (200 examples test subset)
Loading...
81.2
Accuracy
ChatGPT (gpt-3.5-turbo)
4.344
24.297
44.25
64.203
Jan 27, 2023
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
ChatGPT (gpt-3.5-turbo)
selection_strategy=Pro...
2023.01
81.2
ChatGPT (gpt-3.5-turbo)
selection_strategy=Pro...
2023.01
80.4
ChatGPT (gpt-3.5-turbo)
selection_strategy=Sim...
2023.01
78.1
ChatGPT (gpt-3.5-turbo)
selection_strategy=Uni...
2023.01
76.5
Llama 2 (70B)
selection_strategy=Pro...
2023.01
54.3
Llama 2 (70B)
selection_strategy=Sim...
2023.01
53.5
Llama 2 (70B)
selection_strategy=Pro...
2023.01
52.9
Llama 2 (70B)
selection_strategy=Uni...
2023.01
50.2
Llama 2 (13B)
selection_strategy=Pro...
2023.01
21.6
Llama 2 (13B)
selection_strategy=Pro...
2023.01
20.5
Llama 2 (7B)
selection_strategy=Pro...
2023.01
19.3
Llama 2 (13B)
selection_strategy=Sim...
2023.01
18.3
Llama 2 (13B)
selection_strategy=Uni...
2023.01
17
Llama 2 (7B)
selection_strategy=Pro...
2023.01
15.9
Prompt tuning
selection_strategy=Pro...
2023.01
15.2
Llama 2 (7B)
selection_strategy=Sim...
2023.01
13.1
Llama 2 (7B)
selection_strategy=Uni...
2023.01
11.4
Prompt tuning
selection_strategy=Pro...
2023.01
7.3
Feedback
Search any
task
Search any
task