Share your thoughts, 1 month free Claude Pro on usSee more

Math Word Problem Solving on GSM8K (200 examples test subset)

81.2Accuracy

ChatGPT (gpt-3.5-turbo)

Updated 4mo ago

Evaluation Results

Method	Links
ChatGPT (gpt-3.5-turbo) 2023.01		81.2
ChatGPT (gpt-3.5-turbo) 2023.01		80.4
ChatGPT (gpt-3.5-turbo) 2023.01		78.1
ChatGPT (gpt-3.5-turbo) 2023.01		76.5
Llama 2 (70B) 2023.01		54.3
Llama 2 (70B) 2023.01		53.5
Llama 2 (70B) 2023.01		52.9
Llama 2 (70B) 2023.01		50.2
Llama 2 (13B) 2023.01		21.6
Llama 2 (13B) 2023.01		20.5
Llama 2 (7B) 2023.01		19.3
Llama 2 (13B) 2023.01		18.3
Llama 2 (13B) 2023.01		17
Llama 2 (7B) 2023.01		15.9
Prompt tuning 2023.01		15.2
Llama 2 (7B) 2023.01		13.1
Llama 2 (7B) 2023.01		11.4
Prompt tuning 2023.01		7.3