Calligrapher: Freestyle Text Image Customization
About
We introduce Calligrapher, a novel diffusion-based framework that innovatively integrates advanced text customization with artistic typography for digital calligraphy and design applications. Addressing the challenges of precise style control and data dependency in typographic customization, our framework incorporates three key technical contributions. First, we develop a self-distillation mechanism that leverages the pre-trained text-to-image generative model itself alongside the large language model to automatically construct a style-centric typography benchmark. Second, we introduce a localized style injection framework via a trainable style encoder, which comprises both Qformer and linear layers, to extract robust style features from reference images. An in-context generation mechanism is also employed to directly embed reference images into the denoising process, further enhancing the refined alignment of target styles. Extensive quantitative and qualitative evaluations across diverse fonts and design contexts confirm Calligrapher's accurate reproduction of intricate stylistic details and precise glyph positioning. By automating high-quality, visually consistent typography, Calligrapher surpasses traditional models, empowering creative practitioners in digital art, branding, and contextual typographic design.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Text Editing | AnyText English (test) | Sentence Accuracy77.61 | 10 | |
| Image Text Editing | AnyText Chinese (test) | Sen. Acc43.69 | 10 | |
| Scene Text Generation | StyleText-CE English v1 (test) | Sentence Accuracy64 | 8 | |
| Style-conditioned scene text generation | StyleText-CE cn | Sentence Accuracy51 | 8 | |
| Infographic Editing | Crello Edit (test) | FID10.15 | 7 | |
| Infographic Editing | InfoEdit (test) | FID13.37 | 7 | |
| Text Synthesis | SkyReels-Text | Sen. Acc64.04 | 7 | |
| Scene Text Generation | StyleText-CE Chinese v1 (test) | Sen.Acc51.53 | 4 | |
| Style-conditioned scene text generation | StyleText-CE cn→en | Sen.Acc57 | 4 |