Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on HotpotQA subset of 500 questions (dev)
Loading...
31.4
EM (ReAct)
FireAct
14.136
18.618
23.1
27.582
Oct 9, 2023
EM (ReAct)
EM (FireAct)
Absolute Difference
Relative Difference
Updated 1mo ago
Evaluation Results
Method
Method
Links
EM (ReAct)
EM (FireAct)
Absolute Difference
Relative Difference
FireAct
Backbone=GPT-3.5
2023.10
31.4
39.2
7.8
25
FireAct
Backbone=CodeLlama-34B
2023.10
22.2
27.8
5.6
25
FireAct
Backbone=Llama-2-13B
2023.10
21.2
34.4
13.1
62
FireAct
Backbone=CodeLlama-13B
2023.10
20.8
29
8.2
39
FireAct
Backbone=CodeLlama-7B
2023.10
17.4
27.8
10.4
60
FireAct
Backbone=Llama-2-7B
2023.10
14.8
26.2
11.4
77
Feedback
Search any
task
Search any
task