Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Average Performance across 10 Task Types on 13 Datasets (test)
Loading...
75.8
Avg. Accuracy
PROMPTED
61.656
65.328
69
72.672
Oct 3, 2023
Avg. Accuracy
Updated 3d ago
Evaluation Results
Method
Method
Links
Avg. Accuracy
PROMPTED
Model=GPT-4
2023.10
75.8
PROMPTED
Model=GPT-3.5
2023.10
68.8
Output Refinement
Model=GPT-4
2023.10
68.6
Zero-Shot CoT
Model=GPT-4
2023.10
67.3
Zero-Shot
Model=GPT-4
2023.10
65.7
Output Refinement
Model=GPT-3.5
2023.10
64.1
Zero-Shot CoT
Model=GPT-3.5
2023.10
63.4
Zero-Shot
Model=GPT-3.5
2023.10
62.2
Feedback
Search any
task
Search any
task