Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

RAGTruth

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination DetectionRAGTruth (test)
AUROC0.9096
83
Hallucination DetectionRAGTruth
AUROC0.8535
36
Hallucination DetectionRAGTruth RT-Summ 1.0 (test)
F1 Score0.6966
30
Hallucination DetectionRAGTruth RT-D2T 1.0 (test)
F1 Score0.7383
30
Hallucination DetectionRAGTruth RT-QA 1.0 (test)
F1 Score0.7885
30
Hallucination DetectionRAGTruth Llama2-13B (test)
Acc83.33
21
Hallucination DetectionRAGTruth Llama2-7B (test)
Accuracy75.76
21
Hallucination DetectionRAGTruth LLaMA3-8B
Recall78.6
19
Hallucination DetectionRAGTruth LLaMA2-13B
Recall80.68
19
Hallucination DetectionRAGTruth LLaMA2-7B
Recall0.8328
19
SummarizationRAGTruth summarization (test)
ROUGE-152
18
Question AnsweringRAGTruth
F1 Score45.89
17
Hallucination DetectionRAGTruth summarization task
Precision77
14
Span-level Hallucination DetectionRagTruth-Avg (test)
F1 Score76.63
12
Grounded Text GenerationRAGTruth
F1 Score33.14
11
GroundednessRagTruth
Kendall's Tau0.57
11
Hallucination DetectionRAGTruth Llama-13B
Recall89.47
10
Hallucination DetectionRAGTruth Llama-7B
Recall92.54
10
Hallucination detectionRAGTruth Summarization Mistral-7b
AUCROC74.45
4
Hallucination detectionRAGTruth Summarization (Llama-2-13b)
AUCROC72.9
4
Hallucination detectionRAGTruth Summarization (Llama-2-7b)
AUCROC73.37
4
Showing 21 of 21 rows