Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion

About

We introduce a novel method to automatically generate an artistic typography by stylizing one or more letter fonts to visually convey the semantics of an input word, while ensuring that the output remains readable. To address an assortment of challenges with our task at hand including conflicting goals (artistic stylization vs. legibility), lack of ground truth, and immense search space, our approach utilizes large language models to bridge texts and visual images for stylization and build an unsupervised generative model with a diffusion model backbone. Specifically, we employ the denoising generator in Latent Diffusion Model (LDM), with the key addition of a CNN-based discriminator to adapt the input style onto the input text. The discriminator uses rasterized images of a given letter/word font as real samples and output of the denoising generator as fake samples. Our model is coined DS-Fusion for discriminated and stylized diffusion. We showcase the quality and versatility of our method through numerous examples, qualitative and quantitative evaluation, as well as ablation studies. User studies comparing to strong baselines including CLIPDraw and DALL-E 2, as well as artist-crafted typographies, demonstrate strong performance of DS-Fusion.

Maham Tanveer, Yizhi Wang, Ali Mahdavi-Amiri, Hao Zhang• 2023

Related benchmarks

TaskDatasetResultRank
Visual Text GenerationMuST-Bench English
OCR Accuracy48.08
4
Stylized Text GenerationMuST-Bench Chinese
Style Fidelity0.1851
4
Visual Text GenerationMuST-Bench Chinese
OCR Accuracy24.71
4
Visual Text GenerationMuST-Bench Korean
OCR Accuracy19.06
4
Style Translation Fidelity EvaluationMuST-Bench
GPT-4V Fidelity (EN)1.89
4
Stylized Text GenerationMuST-Bench English
Style Fidelity0.1809
4
Stylized Text GenerationMuST-Bench Korean
Style Fidelity17.97
4
Showing 7 of 7 rows

Other info

Follow for update