Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Interactive Question Answering on Natural Questions
Loading...
4.9
Helpfulness
Llama2
3.132
3.591
4.05
4.509
Aug 24, 2024
Helpfulness
Fluency
Query Count
Accuracy
Updated 5d ago
Evaluation Results
Method
Method
Links
Helpfulness
Fluency
Query Count
Accuracy
Llama2
Evaluator Model=Claude...
2024.08
4.9
4.84
3.18
34
Claude
Evaluator Model=Claude...
2024.08
4.88
4.9
3.02
38
GPT3.5
Evaluator Model=Claude...
2024.08
4.86
4.88
2.82
42
Zephyr
Evaluator Model=Claude...
2024.08
4.84
4.9
3.02
28
GPT3.5
Evaluator=GPT-4
2024.08
4.12
5
2.76
44
Claude
Evaluator=GPT-4
2024.08
4.02
5
2.76
40
Zephyr
Evaluator=GPT-4
2024.08
3.3
4.86
2.92
36
Llama2
Evaluator=GPT-4
2024.08
3.2
4.84
3.08
32
Feedback
Search any
task
Search any
task