Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-form Question Answering refinement on HQ^2A (test)
Loading...
0.0065
Error Rate (%)
EIR
-0.001084
0.050108
0.1013
0.152492
Jul 16, 2024
Error Rate (%)
Error Score
Precision
Recall
F1 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Error Rate (%)
Error Score
Precision
Recall
F1 Score
EIR
Feedback=Fine-grained...
2024.07
0.0065
0.03
0.97
1
0.98
Improve
Feedback=Coarse-graine...
2024.07
0.0131
0.05
1
0.83
0.97
Generic
Feedback=Coarse-graine...
2024.07
0.0131
0.05
0.97
0.97
0.97
Human feedback
Type=Expert human feed...
2024.07
0.0261
0.09
0.86
1
0.94
Zero-shot
Model=LLaMA2-13B-chat,...
2024.07
0.1569
0.34
0.56
0.9
0.69
Baseline
Type=Original dataset...
2024.07
0.1961
0.63
-
-
-
Feedback
Search any
task
Search any
task