Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Average Performance across 10 Task Types on 13 Datasets (test)
Loading...
75.8
Avg. Accuracy
PROMPTED
61.656
65.328
69
72.672
Oct 3, 2023
Avg. Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Avg. Accuracy
PROMPTED
Model=GPT-4
2023.10
75.8
PROMPTED
Model=GPT-3.5
2023.10
68.8
Output Refinement
Model=GPT-4
2023.10
68.6
Zero-Shot CoT
Model=GPT-4
2023.10
67.3
Zero-Shot
Model=GPT-4
2023.10
65.7
Output Refinement
Model=GPT-3.5
2023.10
64.1
Zero-Shot CoT
Model=GPT-3.5
2023.10
63.4
Zero-Shot
Model=GPT-3.5
2023.10
62.2
Feedback
Search any
task
Search any
task