Share your thoughts, 1 month free Claude Pro on usSee more

Counterfactual Reasoning on CRASS

94.53Exact Match Performance

GPT-4

Updated 4mo ago

Evaluation Results

Method	Links
GPT-4 2023.11		94.53
Orca-1-13B 2023.11		90.15
Orca 2-7B 2023.11		88.32
Orca 2-13B 2023.11		87.59
Orca 2-13B 2023.11		86.86
WizardLM-70B 2023.11		86.13
ChatGPT 2023.11		85.77
Orca 2-7B 2023.11		84.31
LLaMA-2-Chat-70B 2023.11		74.82
WizardLM-13B 2023.11		67.88
LLaMA-2-Chat-13B 2023.11		61.31