Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-hop Question Answering on MuSiQue (F1, ROUGHL, EM)
Loading...
72
F1 Score
DIRECT QUERY
-0.8
18.1
37
55.9
Dec 30, 2023
F1 Score
ROUGHL Score
Exact Match (EM)
Updated 9d ago
Evaluation Results
Method
Method
Links
F1 Score
ROUGHL Score
Exact Match (EM)
DIRECT QUERY
PRIVATE=NO, Backbone=G...
2023.12
72
72
56
CONFUSIONPROMPT
PRIVATE=YES, Backbone=...
2023.12
68
68
54
DIRECT QUERY
PRIVATE=NO, Backbone=G...
2023.12
66
66
50
CONFUSIONPROMPT
PRIVATE=YES, Backbone=...
2023.12
63
63
50
CONFUSIONPROMPT
PRIVATE=YES, Backbone=...
2023.12
61
61
45
DIRECT QUERY
PRIVATE=NO, Backbone=G...
2023.12
60
61
44
LLAMA2-7B
PRIVATE=YES
2023.12
48
48
32
VICUNA-13B
PRIVATE=YES
2023.12
42
42
23
PARAPHRASER
PRIVATE=YES, Backbone=...
2023.12
8
8
3
PARAPHRASER
PRIVATE=YES, Backbone=...
2023.12
7
7
4
PARAPHRASER
PRIVATE=YES, Backbone=...
2023.12
6
6
3
TEXT2TEXT
PRIVATE=YES, Backbone=...
2023.12
3
3
1
TEXT2TEXT
PRIVATE=YES, Backbone=...
2023.12
2
2
1
TEXT2TEXT
PRIVATE=YES, Backbone=...
2023.12
2
2
1
Feedback
Search any
task
Search any
task