Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CRAG

Benchmarks

Task NameDataset NameSOTA ResultTrend
Error DetectionCRAG multi-hop subset (train)
Precision92
36
Error DetectionCRAG
F1 Score91
36
Gland SegmentationCRAG (test)
DICE Score89.4
26
Gland SegmentationCRAG
F1 Score87.4
19
Multimodal Retrieval-Augmented GenerationCRAG-MM (Overall)
Truthfulness20.5
18
Multi-hop ReasoningCRAG
F1 Score30.08
15
Question AnsweringCRAG
Finance Score20.1
12
Gland SegmentationCRAG (5-fold cross-validation)
Dice Score79.6
8
Nuclei instance segmentationCRAG Dpath (test)
Dice0.785
8
Question AnsweringCRAG (test)
P@163.3
6
Semantic SegmentationCRAG
Dice Score88.58
5
Retrieval-Augmented GenerationCRAG
Finance Accuracy16.4
5
Multi-source Answer GenerationCRAG Task 2 (test)
Accuracy (%)41
3
Answer Generation (Unstructured Context)CRAG Task 1 (test)
Accuracy34.23
3
Showing 14 of 14 rows