Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on AmbigQA
Loading...
4.96
Helpfulness
Llama2
4.3568
4.5134
4.67
4.8266
Aug 24, 2024
Helpfulness
Fluency
Query Count
Accuracy
Updated 5d ago
Evaluation Results
Method
Method
Links
Helpfulness
Fluency
Query Count
Accuracy
Llama2
checkpoint=Llama-2-7B
2024.08
4.96
4.94
1.79
52
GPT3.5
2024.08
4.91
4.97
1.89
60
GPT4
2024.08
4.89
4.95
1.06
72
Claude
2024.08
4.89
4.94
1.36
62
Zephyr
checkpoint=Zephyr-alpha
2024.08
4.38
4.66
1.03
45
Feedback
Search any
task
Search any
task