Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on HotpotQA (Helpfulness, Fluency, Accuracy)
Loading...
4.82
Helpfulness
Claude
4.248
4.3965
4.545
4.6935
Aug 24, 2024
Helpfulness
Fluency
Avg Queries
Accuracy
Updated 5d ago
Evaluation Results
Method
Method
Links
Helpfulness
Fluency
Avg Queries
Accuracy
Claude
2024.08
4.82
4.99
1.26
58
GPT4
2024.08
4.78
4.96
1.12
66
TextDavinci
2024.08
4.72
4.87
1.22
45
GPT3.5
2024.08
4.72
4.95
1.49
63
TextBabbage
2024.08
4.7
4.88
1.74
37
Llama2
checkpoint=Llama-2-7B
2024.08
4.7
4.95
1.32
55
Zephyr
checkpoint=Zephyr-alpha
2024.08
4.64
4.88
1.01
40
Davinci
2024.08
4.27
4.52
1.68
32
Feedback
Search any
task
Search any
task