RAGTruth

Benchmarks

Task Name	Dataset Name	SOTA Result
Hallucination Detection	RAGTruth (test)	AUROC0.9096	99
Hallucination Detection	RAGTruth	AUROC0.8535	79
Hallucination detection	RAGTruth CNN/DM (subsample)	AUROC0.69	45
Hallucination detection	RAGTruth MS MARCO (subsample)	AUROC0.77	45
Hallucination Detection	RAGTruth RT-QA 1.0 (test)	F1 Score0.7885	33
Hallucination Detection	RAGTruth RT-Summ 1.0 (test)	F1 Score0.6966	30
Hallucination Detection	RAGTruth RT-D2T 1.0 (test)	F1 Score0.7383	30
Hallucination Detection	RAGTruth Llama2-13B (test)	Acc83.33	21
Hallucination Detection	RAGTruth Llama2-7B (test)	Accuracy75.76	21
Hallucination Detection	RAGTruth LLaMA3-8B	Recall78.6	19
Hallucination Detection	RAGTruth LLaMA2-13B	Recall80.68	19
Hallucination Detection	RAGTruth LLaMA2-7B	Recall0.8328	19
Token-level hallucination detection	RAGTruth	AP (Token-level)59.4	18
Answer-level hallucination detection	RAGTruth	AP75.96	18
Summarization	RAGTruth summarization (test)	ROUGE-152	18
Question Answering	RAGTruth	F1 Score45.89	17
Response-level hallucination detection	RAGTruth (test)	AUC74.5	15
Hallucination Detection	RAGTruth Span-level, leakage-clean protocol	AUC0.702	15
Hallucination Detection	RAGTruth summarization task	Precision77	14
Response-level Hallucination Detection	RAGTruth QA	AUROC91.89	13
Hallucination Mitigation	RAGTruth	Faithfulness98.4	12
Span-level Hallucination Detection	RagTruth-Avg (test)	F1 Score76.63	12
Grounded Text Generation	RAGTruth	F1 Score33.14	11
Groundedness	RagTruth	Kendall's Tau0.57	11
Faithfulness detection	RAGTruth	Accuracy90.3	10

Showing 25 of 43 rows