| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Hallucination Evaluation | HallusionBench | Average Score93.1 | 108 | |
| Multimodal Reasoning | HallusionBench | Accuracy0.7293 | 42 | |
| Hallucination Assessment | HallusionBench | Answer Accuracy (aAcc)71.6 | 39 | |
| Visual Hallucination Evaluation | HallusionBench | Accuracy76.6 | 37 | |
| Hallucination and Visual Reasoning Evaluation | HallusionBench | Score59.2 | 37 | |
| Hallucination Robustness | HallusionBench | Score57.8 | 32 | |
| Multimodal Hallucination Evaluation | HallusionBench | Hallucination Score70.7 | 22 | |
| Hallucination | HallusionBench | Pass@174 | 16 | |
| Visual Perception | HallusionBench | Accuracy71.08 | 15 | |
| Visual Reasoning | HallusionBench | Accuracy68.19 | 15 | |
| Perception | HallusionBench | Score59.5 | 15 | |
| Hallucination Evaluation | HallusionBench 2024 | Score52.2 | 13 | |
| Visual Illusion and Hallucination Evaluation | HallusionBench (HallB) | HallB Score41.7 | 13 | |
| Hallucination Evaluation | HallusionBench GPT4-assisted (All) | Accuracy (All)49.94 | 11 | |
| Discriminative Hallucination Detection | HallusionBench | Accuracy73 | 10 | |
| Visual Hallucination Evaluation | HallusionBench visual questions | Accuracy65.8 | 10 | |
| General VQA | HallusionBench | Accuracy73.48 | 9 | |
| General VQA | HallusionBench avg | Score67 | 7 | |
| Vision-Language Reasoning | HallusionBench (test) | Simple Accuracy53.31 | 7 | |
| General visual question answering | HallusionBench | Pass@163.7 | 7 | |
| Hallucination control | HallusionBench | General Score60.5 | 6 | |
| Multimodal Hallucination Assessment | HallusionBench | Accuracy70 | 5 | |
| Hallucination Analysis | HallusionBench | fACC18.7 | 4 | |
| Hallucination Evaluation | HallusionBench (test) | Question Pair Accuracy17.8 | 4 | |
| Visual Question Answering | HallusionBench HBI (all) | Score45.21 | 4 |