AutoRAG: Automated Framework for optimization of Retrieval Augmented Generation Pipeline

About

Using LLMs (Large Language Models) in conjunction with external documents has made RAG (Retrieval-Augmented Generation) an essential technology. Numerous techniques and modules for RAG are being researched, but their performance can vary across different datasets. Finding RAG modules that perform well on specific datasets is challenging. In this paper, we propose the AutoRAG framework, which automatically identifies suitable RAG modules for a given dataset. AutoRAG explores and approximates the optimal combination of RAG modules for the dataset. Additionally, we share the results of optimizing a dataset using AutoRAG. All experimental results and data are publicly available and can be accessed through our GitHub repository https://github.com/Marker-Inc-Korea/AutoRAG_ARAGOG_Paper .

Dongkyu Kim, Byoungwook Kim, Donggeon Han, Matou\v{s} Eibich• 2024

Related benchmarks

Task	Dataset	Result
Question Answering	ARC Challenge	Accuracy69.9	906
Question Answering	OBQA	Accuracy85.1	347
Multi-hop Question Answering	HotpotQA	F1 Score62.7	294
Question Answering	2Wiki	--	260
Multi-hop Question Answering	2Wiki	--	215
Question Answering	PopQA	Accuracy65.3	186
Question Answering	ARC-C	Accuracy0.681	116
Question Answering	TQA	Accuracy70.4	80
Question Answering	HotpotQA	F1 Score67.6	15
Information Retrieval	QASPER NLP	R@1050	14

Showing 10 of 17 rows

Other info

Follow for update

@wizwand_team Discord