Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Open-ended Reasoning on GSM8K Multi
Loading...
64.3
Accuracy
Self-Agreement
59.516
60.758
62
63.242
Nov 14, 2023
Accuracy
Updated 3mo ago
Evaluation Results
Method
Method
Links
Accuracy
Self-Agreement
Model=GPT-3.5-turbo
2023.11
64.3
Zero-Shot CoT
Model=GPT-3.5-turbo
2023.11
59.7
Feedback
Search any
task
Search any
task