Share your thoughts, 1 month free Claude Pro on usSee more

Question Answering on HotpotQA (Helpfulness, Fluency, Accuracy)

4.82Helpfulness

Claude

Updated 5mo ago

Evaluation Results

Method	Links
Claude 2024.08		4.82	4.99	1.26	58
GPT4 2024.08		4.78	4.96	1.12	66
TextDavinci 2024.08		4.72	4.87	1.22	45
GPT3.5 2024.08		4.72	4.95	1.49	63
TextBabbage 2024.08		4.7	4.88	1.74	37
Llama2 2024.08		4.7	4.95	1.32	55
Zephyr 2024.08		4.64	4.88	1.01	40
Davinci 2024.08		4.27	4.52	1.68	32