Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Question Answering on HotpotQA subset of 500 questions (dev)
Loading...
31.4
EM (ReAct)
FireAct
14.136
18.618
23.1
27.582
Oct 9, 2023
EM (ReAct)
EM (FireAct)
Absolute Difference
Relative Difference
Updated 4d ago
Evaluation Results
Method
Method
Links
EM (ReAct)
EM (FireAct)
Absolute Difference
Relative Difference
FireAct
Backbone=GPT-3.5
2023.10
31.4
39.2
7.8
25
FireAct
Backbone=CodeLlama-34B
2023.10
22.2
27.8
5.6
25
FireAct
Backbone=Llama-2-13B
2023.10
21.2
34.4
13.1
62
FireAct
Backbone=CodeLlama-13B
2023.10
20.8
29
8.2
39
FireAct
Backbone=CodeLlama-7B
2023.10
17.4
27.8
10.4
60
FireAct
Backbone=Llama-2-7B
2023.10
14.8
26.2
11.4
77
Feedback
Search any
task
Search any
task