| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Factual Inconsistency Detection | CHOCOLATE (FT) | Kendall's Tau0.291 | 9 | |
| Factual Inconsistency Detection | CHOCOLATE LLM | Kendall's Tau0.205 | 9 | |
| Factual Inconsistency Detection | CHOCOLATE LVLM | Kendall's Tau0.178 | 9 | |
| Factual Error Correction | CHOCOLATE FT 1.0 | Bard (%)81.14 | 7 | |
| Factual Error Correction | CHOCOLATE LLM 1.0 | GPT-4V Score52.35 | 7 | |
| Factual Error Correction | CHOCOLATE LVLM 1.0 | CHARTVE34.34 | 7 | |
| Factual Correction | CHOCOLATE FT | Factual Correction Score (GPT-4V)74.79 | 6 | |
| Factual Correction | CHOCOLATE LLM | GPT-4V Score52.35 | 6 | |
| Factual Correction | CHOCOLATE LVLM | Factual Correction Score (GPT-4V)61.34 | 6 | |
| Factual Error Detection | CHOCOLATE 1.0 (FT) | ROC AUC74.3 | 5 | |
| Factual Error Detection | CHOCOLATE 1.0 (LLM) | ROC AUC73.8 | 5 | |
| Factual Error Detection | CHOCOLATE LVLM 1.0 | ROC AUC0.646 | 5 |