Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-hop Reasoning on CommaQA-E compositional
Loading...
80.8
Exact Match
ChatGPT (SKiC)
13.72
31.135
48.55
65.965
Aug 1, 2023
Exact Match
Updated 2d ago
Evaluation Results
Method
Method
Links
Exact Match
ChatGPT (SKiC)
Model=ChatGPT, Prompti...
2023.08
80.8
text-davinci-003 (SKiC)
Model=text-davinci-003...
2023.08
74.8
ChatGPT (Decomp)
Model=ChatGPT, Prompti...
2023.08
73.5
text-davinci-003 (Decomp)
Model=text-davinci-003...
2023.08
66.6
LLAMA-65B (SKiC)
Model=LLAMA-65B, Promp...
2023.08
52
ChatGPT (CoT)
Model=ChatGPT, Prompti...
2023.08
46.4
LLAMA-65B (Decomp)
Model=LLAMA-65B, Promp...
2023.08
40.4
ChatGPT (4-shots)
Model=ChatGPT, Prompti...
2023.08
40.3
text-davinci-003 (CoT)
Model=text-davinci-003...
2023.08
38.2
text-davinci-003 (4-shots)
Model=text-davinci-003...
2023.08
33.5
LLAMA-65B (CoT)
Model=LLAMA-65B, Promp...
2023.08
30.8
ChatGPT (zero-shot)
Model=ChatGPT, Prompti...
2023.08
30.6
text-davinci-003 (zero-shot)
Model=text-davinci-003...
2023.08
26.8
LLAMA-65B (4-shots)
Model=LLAMA-65B, Promp...
2023.08
24.6
LLAMA-65B (zero-shot)
Model=LLAMA-65B, Promp...
2023.08
16.3
Feedback
Search any
task
Search any
task