Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models

About

Text-to-Image (T2I) generation methods based on diffusion model have garnered significant attention in the last few years. Although these image synthesis methods produce visually appealing results, they frequently exhibit spelling errors when rendering text within the generated images. Such errors manifest as missing, incorrect or extraneous characters, thereby severely constraining the performance of text image generation based on diffusion models. To address the aforementioned issue, this paper proposes a novel approach for text image generation, utilizing a pre-trained diffusion model (i.e., Stable Diffusion [27]). Our approach involves the design and training of a light-weight character-level text encoder, which replaces the original CLIP encoder and provides more robust text embeddings as conditional guidance. Then, we fine-tune the diffusion model using a large-scale dataset, incorporating local attention control under the supervision of character-level segmentation maps. Finally, by employing an inference stage refinement process, we achieve a notably high sequence accuracy when synthesizing text in arbitrarily given images. Both qualitative and quantitative results demonstrate the superiority of our method to the state of the art. Furthermore, we showcase several potential applications of the proposed UDiffText, including text-centric image synthesis, scene text editing, etc. Code and model will be available at https://github.com/ZYM-PKU/UDiffText .

Yiming Zhao, Zhouhui Lian• 2023

Related benchmarks

TaskDatasetResultRank
Scene Text EditingICDAR 8 characters 2013 (test)
Sequence Accuracy84
7
Scene Text EditingICDAR 2013 (test)
SeqAcc83
7
Scene Text EditingTextSeg (test)
SeqAcc84
7
Scene Text EditingLAION-OCR (test)
SeqAcc78
7
Scene Text EditingScene Text Editing Evaluation Set (test)
FID15.79
7
Scene Text ReconstructionICDAR 8 characters 2013 (test)
SeqAcc94
7
Scene Text ReconstructionICDAR 2013 (test)
SeqAcc91
7
Scene Text ReconstructionTextSeg (test)
Sequence Accuracy93
7
Scene Text ReconstructionLAION-OCR (test)
SeqAcc90
7
Text editingUDiffText
SeqAcc (ICDAR13 8ch)84
4
Showing 10 of 11 rows

Other info

Follow for update