One Model, Two Minds: Task-Conditioned Reasoning for Unified Image Quality and Aesthetic Assessment

About

Unifying Image Quality Assessment (IQA) and Image Aesthetic Assessment (IAA) in a single multimodal large language model is appealing, yet existing methods adopt a task-agnostic recipe that applies the same reasoning strategy and reward to both tasks. We show this is fundamentally misaligned: IQA relies on low-level, objective perceptual cues and benefits from concise distortion-focused reasoning, whereas IAA requires deliberative semantic judgment and is poorly served by point-wise score regression. We identify these as a reasoning mismatch and an optimization mismatch, and provide empirical evidence for both through controlled probes. Motivated by these findings, we propose TATAR (Task-Aware Thinking with Asymmetric Rewards), a unified framework that shares the visual-language backbone while conditioning post-training on each task's nature. TATAR combines three components: fast--slow task-specific reasoning construction that pairs IQA with concise perceptual rationales and IAA with deliberative aesthetic narratives; two-stage SFT+GRPO learning that establishes task-aware behavioral priors before reward-driven refinement; and asymmetric rewards that apply Gaussian score shaping for IQA and Thurstone-style completion ranking for IAA. Extensive experiments across eight benchmarks demonstrate that TATAR consistently outperforms prior unified baselines on both tasks under in-domain and cross-domain settings, remains competitive with task-specific specialized models, and yields more stable training dynamics for aesthetic assessment. Our results establish task-conditioned post-training as a principled paradigm for unified perceptual scoring. Our code is publicly available at https://github.com/yinwen2019/TATAR.

Wen Yin, Cencen Liu, Dingrui Liu, Bing Su, Yuan-Fang Li, Tao He• 2026

Related benchmarks

Task	Dataset	Result
Image Quality Assessment	SPAQ	SRCC0.897	275
Image Quality Assessment	KADID	SRCC73.1	164
Image Quality Assessment	PIPAL	SRCC50.1	159
Image Quality Assessment	KonIQ-10k	SRCC0.941	126
Image Aesthetic Assessment	AVA	SRCC0.518	68
Visual Rating (Image Aesthetic Assessment)	TAD66K	SRCC0.334	40
Visual Rating (Image Aesthetic Assessment)	ArtiMuse-10K	SRCC0.586	34
Visual Rating (Image Aesthetic Assessment)	FLICKR-AES	SRCC60.4	33

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord