JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

About

Photo retouching has become integral to contemporary visual storytelling, enabling users to capture aesthetics and express creativity. While professional tools such as Adobe Lightroom offer powerful capabilities, they demand substantial expertise and manual effort. In contrast, existing AI-based solutions provide automation but often suffer from limited adjustability and poor generalization, failing to meet diverse and personalized editing needs. To bridge this gap, we introduce JarvisArt, a multi-modal large language model (MLLM)-driven agent that understands user intent, mimics the reasoning process of professional artists, and intelligently coordinates over 200 retouching tools within Lightroom. JarvisArt undergoes a two-stage training process: an initial Chain-of-Thought supervised fine-tuning to establish basic reasoning and tool-use skills, followed by Group Relative Policy Optimization for Retouching (GRPO-R) to further enhance its decision-making and tool proficiency. We also propose the Agent-to-Lightroom Protocol to facilitate seamless integration with Lightroom. To evaluate performance, we develop MMArt-Bench, a novel benchmark constructed from real-world user edits. JarvisArt demonstrates user-friendly interaction, superior generalization, and fine-grained control over both global and local adjustments, paving a new avenue for intelligent photo retouching. Notably, it outperforms GPT-4o with a 60% improvement in average pixel-level metrics on MMArt-Bench for content fidelity, while maintaining comparable instruction-following capabilities. Project Page: https://jarvisart.vercel.app/.

Yunlong Lin, Zixu Lin, Kunjie Lin, Jinbin Bai, Panwang Pan, Chenxin Li, Haoyu Chen, Zhongdao Wang, Xinghao Ding, Wenbo Li, Shuicheng Yan• 2025

Related benchmarks

Task	Dataset	Result
Photo Retouching	MIT Adobe FiveK	PSNR21.03	25
Quality Improving	RETOUCHEVAL 1.0 (test)	L132.17	16
Instruction-based Image Editing	50 User-study samples	L Score0.22	11
Photo Retouching	PPR10K-Bench	PSNR21.79	10
Photorealistic Preset Transfer	Synthetic dataset	PSNR14.48	9
Photorealistic Preset Transfer	Realistic Dataset (test)	GPT-4o Score1.5321	9
Instruction Following Assessment	GIER 50 user study samples	Rank (A)6.13	8
Instruction-based Image Editing	GIER 50 user study samples	L Metric0.225	8
Style Changing	RETOUCHE val 1.0 (test)	L1 Loss33.9	8
Image Quality Assessment	GIER 50 user study samples	Rank (B)5.36	8

Showing 10 of 17 rows

Other info

Follow for update

@wizwand_team Discord