Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CRAG

Benchmarks

Task NameDataset NameSOTA ResultTrend
Error DetectionCRAG multi-hop subset (train)
Precision92
36
Error DetectionCRAG
F1 Score91
36
Gland SegmentationCRAG (test)
DICE Score89.4
26
Gland SegmentationCRAG
F1 Score87.4
19
Multimodal Retrieval-Augmented GenerationCRAG-MM (Overall)
Truthfulness20.5
18
Question AnsweringCRAG
Finance Score20.1
12
Nuclei instance segmentationCRAG Dpath (test)
Dice0.785
8
Question AnsweringCRAG (test)
P@163.3
6
Semantic SegmentationCRAG
Dice Score88.58
5
Retrieval-Augmented GenerationCRAG
Finance Accuracy16.4
5
Multi-source Answer GenerationCRAG Task 2 (test)
Accuracy (%)41
3
Answer Generation (Unstructured Context)CRAG Task 1 (test)
Accuracy34.23
3
Showing 12 of 12 rows