Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mintaka

Benchmarks

Task NameDataset NameSOTA ResultTrend
Error DetectionMintaka (val)
Precision100
36
Error DetectionMintaka
F1 Score88
36
Hallucination predictionMintaka refined by question type and domain
AUROC75.51
20
Hallucination predictionMintaka refined by question type
AUROC77.89
20
Hallucination predictionMintaka (original)
AUROC79.41
10
Hallucination predictionMintaka unrefined (original)
AUROC79.41
10
RetrievalMintaka
Recall82.7
7
Showing 7 of 7 rows