Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Interactive Question Answering on Natural Questions
Loading...
4.9
Helpfulness
Llama2
3.132
3.591
4.05
4.509
Aug 24, 2024
Helpfulness
Fluency
Query Count
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Helpfulness
Fluency
Query Count
Accuracy
Llama2
Evaluator Model=Claude...
2024.08
4.9
4.84
3.18
34
Claude
Evaluator Model=Claude...
2024.08
4.88
4.9
3.02
38
GPT3.5
Evaluator Model=Claude...
2024.08
4.86
4.88
2.82
42
Zephyr
Evaluator Model=Claude...
2024.08
4.84
4.9
3.02
28
GPT3.5
Evaluator=GPT-4
2024.08
4.12
5
2.76
44
Claude
Evaluator=GPT-4
2024.08
4.02
5
2.76
40
Zephyr
Evaluator=GPT-4
2024.08
3.3
4.86
2.92
36
Llama2
Evaluator=GPT-4
2024.08
3.2
4.84
3.08
32
Feedback
Search any
task
Search any
task