HotpotQA

Benchmarks

Task Name	Dataset Name	SOTA Result
Multi-hop Question Answering	HotpotQA (test)	F180.79	334
Multi-hop Question Answering	HotpotQA	F1 Score77.4	294
Hallucination Detection	HotpotQA	AUROC0.928	294
Question Answering	HotpotQA	EM77.2	173
Multi-Hop Question Answering	HotpotQA	Exact Match (EM)50.2	167
Multi-Hop QA	HotPotQA	Exact Match65.6	143
Question Answering	HotpotQA	F184.98	132
Open-domain Question Answering	HotpotQA	Accuracy83.8	103
Question Answering	HotPotQA	F1 Score51.72	93
RAG Performance Prediction	HotpotQA	QE50.78	80
Multi-hop Question Answering	HotpotQA	F174.9	79
Multi-hop Question Answering	HotpotQA	LLM Judge Score80	72
Long-context Question Answering	HotpotQA In-Distribution	Accuracy85.2	72
Multi-hop Question Answering	HotpotQA (dev)	Answer F185.81	72
Uncertainty Quantification	HotpotQA 500 randomly sampled queries (test)	AUROC83.25	70
End-to-End Defense in RAG	HotpotQA	Attack Success Rate (ASR)0	69
Retrieval	HotpotQA	R@596.9	68
Multi-Hop Question Answering	HotpotQA	Exact Match (EM)47.1	66
Multi-hop Question Answering	HotpotQA fullwiki setting (test)	Answer F175.9	64
Retrieval	HotPotQA	AR@579.71	62
Question Answering	HotpotQA PIA (test)	ASR90.2	62
Open-domain Question Answering	HotpotQA in-domain	F1 Score72.4	57
Error Detection	HotpotQA	AUROC81	57
Multi-hop Question Answering	HotPotQA	CoT Match Rate100	54
Multi-Hop Question Answering	HotpotQA	F158.9	54

Showing 25 of 661 rows

...