Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Counterfactual Reasoning on CRASS
Loading...
94.53
Exact Match Performance
GPT-4
59.9812
68.9506
77.92
86.8894
Nov 18, 2023
Exact Match Performance
Updated 4d ago
Evaluation Results
Method
Method
Links
Exact Match Performance
GPT-4
Setting=Zero-shot
2023.11
94.53
Orca-1-13B
Setting=Zero-shot
2023.11
90.15
Orca 2-7B
Setting=Zero-shot, Sys...
2023.11
88.32
Orca 2-13B
Setting=Zero-shot, Sys...
2023.11
87.59
Orca 2-13B
Setting=Zero-shot, Sys...
2023.11
86.86
WizardLM-70B
Setting=Zero-shot
2023.11
86.13
ChatGPT
Setting=Zero-shot
2023.11
85.77
Orca 2-7B
Setting=Zero-shot, Sys...
2023.11
84.31
LLaMA-2-Chat-70B
Setting=Zero-shot
2023.11
74.82
WizardLM-13B
Setting=Zero-shot
2023.11
67.88
LLaMA-2-Chat-13B
Setting=Zero-shot
2023.11
61.31
Feedback
Search any
task
Search any
task