Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering

About

Visual Text Rendering (VTR) remains a critical challenge in text-to-image generation, where even advanced models frequently produce text with structural anomalies such as distortion, blurriness, and misalignment. However, we find that leading MLLMs and specialist OCR models largely fail to perceive these structural anomalies, creating a critical bottleneck for both VTR evaluation and RL-based optimization. As a result, even state-of-the-art generators (e.g., Seedream4.0, Qwen-Image) still struggle to render structurally faithful text. To address this, we propose TextPecker, a plug-and-play structural anomaly perceptive RL strategy that mitigates noisy reward signals and works with any textto-image generator. To enable this capability, we construct a recognition dataset with character-level structural-anomaly annotations and develop a stroke-editing synthesis engine to expand structural-error coverage. Experiments show that TextPecker consistently improves diverse text-to-image models; even on the well-optimized Qwen-Image, it significantly yields average gains of 4% in structural fidelity and 8.7% in semantic alignment for Chinese text rendering, establishing a new state-of-the-art in high-fidelity VTR. Our work fills a gap in VTR optimization, providing a foundational step towards reliable and structural faithful visual text generation.

Hanshen Zhu, Yuliang Liu, Xuecheng Wu, An-Lan Wang, Hao Feng, Dingkang Yang, Chao Feng, Can Huang, Jingqun Tang, Xiang Bai• 2026

Related benchmarks

TaskDatasetResultRank
Canonical Text RecognitionEnglish recognition
R95.3
19
Canonical Text RecognitionChinese recognition
R99.1
19
Text Structural Anomaly PerceptionEnglish recognition
Precision79.5
19
Text Structural Anomaly PerceptionChinese recognition
Precision91.2
19
Visual Text RenderingOneIG-Bench English Rendering
Avg. Score99
9
Visual Text RenderingLongText-Bench English Rendering
Average Score94.9
9
Visual Text RenderingCVTG-2K English Rendering
Avg Score89.9
9
Visual Text RenderingGenTextEval-Bench English Rendering
Quality Score99.2
9
Visual Text RenderingOneIG-Bench Chinese Rendering
Avg. Score0.988
3
Visual Text RenderingLongText-Bench Chinese Rendering
Average Score97.4
3
Showing 10 of 11 rows

Other info

GitHub

Follow for update