RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
About
Traditional feedback learning for hallucination reduction relies on labor-intensive manual labeling or expensive proprietary models. This leaves the community without foundational knowledge about how to build high-quality feedback with open-source MLLMs. In this work, we introduce RLAIF-V, a novel framework that aligns MLLMs in a fully open-source paradigm. RLAIF-V maximally explores open-source MLLMs from two perspectives, including high-quality feedback data generation for preference learning and self-feedback guidance for inference-time scaling. Extensive experiments on six benchmarks in both automatic and human evaluation show that RLAIF-V substantially enhances the trustworthiness of models at both preference learning and inference time. RLAIF-V 7B reduces object hallucination by 80.7\% and overall hallucination by 33.7\%. Remarkably, RLAIF-V 12B further reveals the self-alignment potential of open-source MLLMs, where the model can learn from feedback of itself to achieve super GPT-4V trustworthiness.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Question Answering | VQA v2 | Accuracy75.2 | 1165 | |
| Visual Question Answering | TextVQA | Accuracy55.1 | 1117 | |
| Object Hallucination Evaluation | POPE | -- | 935 | |
| Hallucination Evaluation | MMHal-Bench | MMHal Score3.44 | 174 | |
| Hallucination Evaluation | HallusionBench | -- | 93 | |
| Hallucination Evaluation | AMBER | F1 Score90.9 | 71 | |
| Science Question Answering | ScienceQA | IMG Score68.2 | 49 | |
| Object Hallucination Evaluation | CHAIR | CS Score18.1 | 49 | |
| Vision-Language Understanding | MM-Vet | Total Score29.9 | 43 | |
| Hallucination assessment | Object-HalBench | Mention Hallucination Rate2.6 | 39 |