Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Ragas: Automated Evaluation of Retrieval Augmented Generation

About

We introduce Ragas (Retrieval Augmented Generation Assessment), a framework for reference-free evaluation of Retrieval Augmented Generation (RAG) pipelines. RAG systems are composed of a retrieval and an LLM based generation module, and provide LLMs with knowledge from a reference textual database, which enables them to act as a natural language layer between a user and textual databases, reducing the risk of hallucinations. Evaluating RAG architectures is, however, challenging because there are several dimensions to consider: the ability of the retrieval system to identify relevant and focused context passages, the ability of the LLM to exploit such passages in a faithful way, or the quality of the generation itself. With Ragas, we put forward a suite of metrics which can be used to evaluate these different dimensions \textit{without having to rely on ground truth human annotations}. We posit that such a framework can crucially contribute to faster evaluation cycles of RAG architectures, which is especially important given the fast adoption of LLMs.

Shahul Es, Jithin James, Luis Espinosa-Anke, Steven Schockaert• 2023

Related benchmarks

TaskDatasetResultRank
Hallucination DetectionRAGTruth (test)
AUROC0.7541
83
Hallucination DetectionDolly AC (test)
AUC36.28
33
Hallucination DetectionRAGTruth Llama2-7B (test)
Accuracy68.22
21
Hallucination DetectionDolly Llama2-7B (test)
Acc65.6
21
Hallucination DetectionRAGTruth Llama2-13B (test)
Acc70.8
21
Hallucination DetectionDolly Llama2-13B (test)
Accuracy64.8
21
Hallucination DetectionDolly AC LLaMA3-8B
Recall80
19
Hallucination DetectionRAGTruth LLaMA2-7B
Recall0.6327
19
Hallucination DetectionRAGTruth LLaMA2-13B
Recall67.63
19
Hallucination DetectionDolly AC LLaMA2-7B
Recall53.45
19
Showing 10 of 13 rows

Other info

Follow for update