Retrieval-Augmented Generation

Benchmarks

Dataset Name	SOTA Method	Metric
All Datasets Aggregated	GuarantRAG	Average Performance Score76.6	55	3mo ago
HotpotQA	CDA-M	Reliability Score (RS)51.8	52	1mo ago
MS MARCO		Accuracy (Clean)90	45	2mo ago
Bio	RADAR	Accuracy74.02	42	2mo ago
LOFT	MiniLM	NQ Score100	42	4mo ago
ICR2	Phi-3-7B-128K	NQ Score87	37	4mo ago
2WikiMultiHopQA	Graph-R1 (ours)	F1 Score65.04	28	1mo ago
NQ	ReFeedL	Accuracy77.1	23	4mo ago
Spec-Bench RAG	SpecBound	CR5.48	21	3mo ago
Average	FastInsight	Win Rate65	18	4mo ago
UltraDomain mix	FastInsight	Win Rate76.2	18	4mo ago
UltraDomain agriculture	FastInsight	Win Rate95	18	4mo ago
BSARD-G	FastInsight	Win Rate85.6	18	4mo ago
LOFT and ICR2 Combined	GPT-4-turbo	Overall Score74	18	4mo ago
Long-context benchmarks		RAG Score (8k Context)53.7	16	4mo ago
ACL-OCL	FastInsight	Win Rate58.2	16	4mo ago
Hotpot ENKB5 (In-Domain Average)	TR-RAG	Language Consistency86.51	15	18d ago
Hotpot-ENKB5 Out-of-Domain Average	Qwen3-30B-A3B-Instruct-2507	Language Consistency96.45	13	18d ago
Retrieval-Augmented Generation		Performance at 8k Context Length65.4	13	4mo ago
Legal Consultation (test)	Legal-DC	Recall78.02	12	4mo ago
News Articles	DA-RAG	Comprehensiveness96.7	12	4mo ago
Mix	DA-RAG	Comprehensiveness95.9	12	4mo ago
Agriculture	DA-RAG	Comprehensiveness97.6	12	4mo ago
TheoremQA	Ours	Accuracy66.3	12	4mo ago
CHAMP	ReFeedL	Accuracy45.2	12	4mo ago

Showing 25 of 116 rows