RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

About

Traditional feedback learning for hallucination reduction relies on labor-intensive manual labeling or expensive proprietary models. This leaves the community without foundational knowledge about how to build high-quality feedback with open-source MLLMs. In this work, we introduce RLAIF-V, a novel framework that aligns MLLMs in a fully open-source paradigm. RLAIF-V maximally explores open-source MLLMs from two perspectives, including high-quality feedback data generation for preference learning and self-feedback guidance for inference-time scaling. Extensive experiments on six benchmarks in both automatic and human evaluation show that RLAIF-V substantially enhances the trustworthiness of models at both preference learning and inference time. RLAIF-V 7B reduces object hallucination by 80.7\% and overall hallucination by 33.7\%. Remarkably, RLAIF-V 12B further reveals the self-alignment potential of open-source MLLMs, where the model can learn from feedback of itself to achieve super GPT-4V trustworthiness.

Tianyu Yu, Haoye Zhang, Qiming Li, Qixin Xu, Yuan Yao, Da Chen, Xiaoman Lu, Ganqu Cui, Yunkai Dang, Taiwen He, Xiaocheng Feng, Jun Song, Bo Zheng, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun• 2024

Related benchmarks

Task	Dataset	Result
Object Hallucination Evaluation	POPE	--	2019
Visual Question Answering	TextVQA	Accuracy55.1	1453
Visual Question Answering	VQA v2	Accuracy75.2	1429
Multimodal Capability Evaluation	MM-Vet	Score29.9	393
Object Hallucination	POPE Popular	F1 Score82.92	372
Object Hallucination	POPE Adversarial	Accuracy84.5	353
Hallucination Evaluation	MMHal-Bench	MMHal Score3.44	306
Hallucination Evaluation	AMBER	CHAIR2.8	222
Object Hallucination Evaluation	CHAIR	--	154
Hallucination Evaluation	HallusionBench	Accuracy35.43	153

Showing 10 of 37 rows

Other info

Code

Follow for update

@wizwand_team Discord