Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Towards Robust Endogenous Reasoning: Unifying Drift Adaptation in Non-Stationary Tuning

About

Reinforcement Fine-Tuning (RFT) has established itself as a critical paradigm for the alignment of Multi-modal Large Language Models (MLLMs) with complex human values and domain-specific requirements. Nevertheless, current research primarily focuses on mitigating exogenous distribution shifts arising from data-centric factors, the non-stationarity inherent in the endogenous reasoning remains largely unexplored. In this work, a critical vulnerability is revealed within MLLMs: they are highly susceptible to endogenous reasoning drift, across both thinking and perception perspectives. It manifests as unpredictable distribution changes that emerge spontaneously during the autoregressive generation process, independent of external environmental perturbations. To adapt it, we first theoretically define endogenous reasoning drift within the RFT of MLLMs as the multi-modal concept drift. In this context, this paper proposes Counterfactual Preference Optimization ++ (CPO++), a comprehensive and autonomous framework adapted to the multi-modal concept drift. It integrates counterfactual reasoning with domain knowledge to execute controlled perturbations across thinking and perception, employing preference optimization to disentangle spurious correlations. Extensive empirical evaluations across two highly dynamic and safety-critical domains: medical diagnosis and autonomous driving. They demonstrate that the proposed framework achieves superior performance in reasoning coherence, decision-making precision, and inherent robustness against extreme interference. The methodology also exhibits exceptional zero-shot cross-domain generalization, providing a principled foundation for reliable multi-modal reasoning in safety-critical applications.

Xiaoyu Yang, En Yu, Wei Duan, Jie Lu• 2026

Related benchmarks

TaskDatasetResultRank
Diagnostic report generationMIMIC-CXR 14 (test)
BLEU-416.5
27
Multi-label diagnostic classificationMS-CXR-T
Consolidation Score77
21
Autonomous Driving Robustness EvaluationCODA-LM 94 (test)
General Perception Score55.6
10
Multi-label Disease ClassificationOpen-i
F1 Score85.1
9
Driving decision-makingBDD-X
BLEU-40.363
8
Multi-label Disease ClassificationPadChest
Classification Performance Score82.3
7
Multi-label Disease ClassificationChestXray 14
Classification Score82
7
Multi-label Disease ClassificationCheXDet 10
Classification Score81.4
7
Autonomous DrivingDriveLM
Description Score30
6
Driving ReasoningBDD-X Easy
BLEU-40.211
6
Showing 10 of 13 rows

Other info

Follow for update