CharaConsist: Fine-Grained Consistent Character Generation

About

In text-to-image generation, producing a series of consistent contents that preserve the same identity is highly valuable for real-world applications. Although a few works have explored training-free methods to enhance the consistency of generated subjects, we observe that they suffer from the following problems. First, they fail to maintain consistent background details, which limits their applicability. Furthermore, when the foreground character undergoes large motion variations, inconsistencies in identity and clothing details become evident. To address these problems, we propose CharaConsist, which employs point-tracking attention and adaptive token merge along with decoupled control of the foreground and background. CharaConsist enables fine-grained consistency for both foreground and background, supporting the generation of one character in continuous shots within a fixed scene or in discrete shots across different scenes. Moreover, CharaConsist is the first consistent generation method tailored for text-to-image DiT model. Its ability to maintain fine-grained consistency, combined with the larger capacity of latest base model, enables it to produce high-quality visual outputs, broadening its applicability to a wider range of real-world scenarios. The source code has been released at https://github.com/Murray-Wang/CharaConsist

Mengyu Wang, Henghui Ding, Jianing Peng, Yao Zhao, Yunpeng Chen, Yunchao Wei• 2025

Related benchmarks

Task	Dataset	Result
Cinematic Story Generation	ViStoryBench	CSD (Cross)0.282	24
Visual Storytelling	ViStoryBench Lite 2025	CSD (Cross)0.333	21
Single-character story generation	Pororo	D-I56.61	13
Single-character story generation	Frozen	D-I43.92	13
Single-character story generation	User Study	C-A Score2.02	13
Multi-character story generation	Multi-character story generation (test)	CLIP-T31.67	8
Non-rigid image editing	PIE-Bench ChangePose	GPT-4o Score6.3183	6
Story Customization	MSB 1.0 (test)	Inter-Consistency (CLIP-I-fg)90.4	6
Non-rigid image editing	Non-Rigid Editing Benchmark	GPT-4o Score7.2683	6
Identity-consistent generation	Synthesized identity-consistent generation benchmark 1.0 (test)	CQShar0.443	6

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord