Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CHOCOLATE

Benchmarks

Task NameDataset NameSOTA ResultTrend
Factual Inconsistency DetectionCHOCOLATE (FT)
Kendall's Tau0.291
9
Factual Inconsistency DetectionCHOCOLATE LLM
Kendall's Tau0.205
9
Factual Inconsistency DetectionCHOCOLATE LVLM
Kendall's Tau0.178
9
Factual Error CorrectionCHOCOLATE FT 1.0
Bard (%)81.14
7
Factual Error CorrectionCHOCOLATE LLM 1.0
GPT-4V Score52.35
7
Factual Error CorrectionCHOCOLATE LVLM 1.0
CHARTVE34.34
7
Factual CorrectionCHOCOLATE FT
Factual Correction Score (GPT-4V)74.79
6
Factual CorrectionCHOCOLATE LLM
GPT-4V Score52.35
6
Factual CorrectionCHOCOLATE LVLM
Factual Correction Score (GPT-4V)61.34
6
Factual Error DetectionCHOCOLATE 1.0 (FT)
ROC AUC74.3
5
Factual Error DetectionCHOCOLATE 1.0 (LLM)
ROC AUC73.8
5
Factual Error DetectionCHOCOLATE LVLM 1.0
ROC AUC0.646
5
Showing 12 of 12 rows