Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Beyond Patches: Global-aware Autoregressive Model for Multimodal Few-Shot Font Generation

About

Manual font design is an intricate process that transforms a stylistic visual concept into a coherent glyph set. This challenge persists in automated Few-shot Font Generation (FFG), where models often struggle to preserve both the structural integrity and stylistic fidelity from limited references. While autoregressive (AR) models have demonstrated impressive generative capabilities, their application to FFG is constrained by conventional patch-level tokenization, which neglects global dependencies crucial for coherent font synthesis. Moreover, existing FFG methods remain within the image-to-image paradigm, relying solely on visual references and overlooking the role of language in conveying stylistic intent during font design. To address these limitations, we propose GAR-Font, a novel AR framework for multimodal few-shot font generation. GAR-Font introduces a global-aware tokenizer that effectively captures both local structures and global stylistic patterns, a multimodal style encoder offering flexible style control through a lightweight language-style adapter without requiring intensive multimodal pretraining, and a post-refinement pipeline that further enhances structural fidelity and style coherence. Extensive experiments show that GAR-Font outperforms existing FFG methods, excelling in maintaining global style faithfulness and achieving higher-quality results with textual stylistic guidance.

Haonan Cai, Yuxuan Luo, Zhouhui Lian• 2026

Related benchmarks

TaskDatasetResultRank
Vision-only Few-shot Font GenerationChinese font dataset Small (UFSC)
RMSE0.2671
10
Vision-only Few-shot Font GenerationChinese font dataset Large (UFSC)
RMSE0.2398
10
Vision-only Few-shot Font GenerationChinese font dataset Large (UFUC)
RMSE0.2496
9
Vision-only Few-shot Font GenerationChinese font dataset Small (UFUC)
RMSE0.2788
9
Few-shot Font GenerationLarge dataset (Unseen Fonts Seen Characters (UFSC))
RMSE0.2358
5
Few-shot Font Generationdataset Large (Unseen Fonts Unseen Characters (UFUC))
RMSE0.2524
5
Showing 6 of 6 rows

Other info

Follow for update