Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mintaka

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-hop Question AnsweringMintaka
Pass@177.4
36
Error DetectionMintaka (val)
Precision100
36
Error DetectionMintaka
F1 Score88
36
Hallucination predictionMintaka refined by question type and domain
AUROC75.51
20
Hallucination predictionMintaka refined by question type
AUROC77.89
20
Multi-Answer Question AnsweringMintaka
Precision64.2
16
Hallucination predictionMintaka (original)
AUROC79.41
10
Hallucination predictionMintaka unrefined (original)
AUROC79.41
10
RetrievalMintaka
Recall82.7
7
Showing 9 of 9 rows