Share your thoughts, 1 month free Claude Pro on usSee more

Long-form Question Answering refinement on HQ^2A (test)

0.0065Error Rate (%)

EIR

Updated 3mo ago

Evaluation Results

Method	Links
EIR 2024.07		0.0065	0.03	0.97	1	0.98
Improve 2024.07		0.0131	0.05	1	0.83	0.97
Generic 2024.07		0.0131	0.05	0.97	0.97	0.97
Human feedback 2024.07		0.0261	0.09	0.86	1	0.94
Zero-shot 2024.07		0.1569	0.34	0.56	0.9	0.69
Baseline 2024.07		0.1961	0.63	-	-	-