ASPO: Adaptive Sentence-Level Preference Optimization for Fine-Grained Multimodal Reasoning

About

Direct Preference Optimization (DPO) has gained significant attention for its simplicity and computational efficiency in aligning large language models (LLMs). Recent advancements have extended DPO to multimodal scenarios, achieving strong performance. However, traditional DPO relies on binary preference optimization, rewarding or penalizing entire responses without considering fine-grained segment correctness, leading to suboptimal solutions. The root of this issue lies in the absence of fine-grained supervision during the optimization process. To address this, we propose Adaptive Sentence-level Preference Optimization (ASPO), which evaluates individual sentences for more precise preference optimization. By dynamically calculating adaptive rewards at the sentence level based on model predictions, ASPO enhances response content assessment without additional models or parameters. This significantly improves the alignment of multimodal features. Extensive experiments show that ASPO substantially enhances the overall performance of multimodal models.

Yeyuan Wang, Dehong Gao, Rujiao Long, Lei Yi, Linbo Jin, Libin Yang, Xiaoyan Cai• 2025

Related benchmarks

Task	Dataset	Result
Object Hallucination Evaluation	POPE	Accuracy86.6	2019
Multimodal Understanding	SEED-Bench Image	Accuracy68.5	143
Multi-modal Understanding	LLaVA-Bench Wild	LLaVA^W Score82	86
Multimodal Understanding	MMBench (test)	--	67
Multi-modal Vision-Language Understanding	GQA	Accuracy63.4	51
Multi-modal Vision-Language Understanding	MMVet	Score41.2	38
Multi-modal Vision-Language Understanding	MMBench (dev)	Score70.4	16
Multi-modal Vision-Language Understanding	MMBench CN	Overall Score64.7	14
Multi-modal Vision-Language Understanding	ScienceQA image	Accuracy71.8	14
Object Hallucination Evaluation	Simple Hallucination Rate (SHR)	SHR33.9	9

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord