AnyText2: Visual Text Generation and Editing With Customizable Attributes

About

As the text-to-image (T2I) domain progresses, generating text that seamlessly integrates with visual content has garnered significant attention. However, even with accurate text generation, the inability to control font and color can greatly limit certain applications, and this issue remains insufficiently addressed. This paper introduces AnyText2, a novel method that enables precise control over multilingual text attributes in natural scene image generation and editing. Our approach consists of two main components. First, we propose a WriteNet+AttnX architecture that injects text rendering capabilities into a pre-trained T2I model. Compared to its predecessor, AnyText, our new approach not only enhances image realism but also achieves a 19.8% increase in inference speed. Second, we explore techniques for extracting fonts and colors from scene images and develop a Text Embedding Module that encodes these text attributes separately as conditions. As an extension of AnyText, this method allows for customization of attributes for each line of text, leading to improvements of 3.3% and 9.3% in text accuracy for Chinese and English, respectively. Through comprehensive experiments, we demonstrate the state-of-the-art performance of our method. The code and model will be made open-source in https://github.com/tyxsspa/AnyText2.

Yuxiang Tuo, Yifeng Geng, Liefeng Bo• 2024

Related benchmarks

Task	Dataset	Result
Product poster generation	InnoComposer-Bench 1.0 (test)	IR-Score0.701	14
Scene Text Editing	AnyText-benchmark Chinese	Sentence Accuracy70.22	13
Graphic design generation	Graphic Design Generation Benchmark 1,000 samples	CLIP-I74.68	13
Poster Generation	PosterDNA (test)	CR32.57	12
Visual Text Generation	AnyText benchmark English 1.0 (test)	Sentence Accuracy81.22	11
Visual Text Generation	AnyText benchmark Chinese 1.0 (test)	Sentence Accuracy71.71	10
Image Text Editing	AnyText Chinese (test)	Sen. Acc70.22	10
Image Text Editing	AnyText English (test)	Sentence Accuracy79.15	10
Multi-line Text Reconstruction	AnyWord CH	Sequence Accuracy28.1	10
Full Design Image Generation	UTDesign-Bench Gen 1.0 (test)	FID90.42	10

Showing 10 of 58 rows

Other info

Follow for update

@wizwand_team Discord