Text-Aware Image Restoration with Diffusion Models

About

Image restoration aims to recover degraded images. However, existing diffusion-based restoration methods, despite great success in natural image restoration, often struggle to faithfully reconstruct textual regions in degraded images. Those methods frequently generate plausible but incorrect text-like patterns, a phenomenon we refer to as text-image hallucination. In this paper, we introduce Text-Aware Image Restoration (TAIR), a novel restoration task that requires the simultaneous recovery of visual contents and textual fidelity. To tackle this task, we present SA-Text, a large-scale benchmark of 100K high-quality scene images densely annotated with diverse and complex text instances. Furthermore, we propose a multi-task diffusion framework, called TeReDiff, that integrates internal features from diffusion models into a text-spotting module, enabling both components to benefit from joint training. This allows for the extraction of rich text representations, which are utilized as prompts in subsequent denoising steps. Extensive experiments demonstrate that our approach consistently outperforms state-of-the-art restoration methods, achieving significant gains in text recognition accuracy. See our project page: https://cvlab-kaist.github.io/TAIR/

Jaewon Min, Jin Hyeon Kim, Paul Hyunbin Cho, Jaeeun Lee, Jihye Park, Minkyu Park, Sangpil Kim, Hyunhee Park, Seungryong Kim• 2025

Related benchmarks

Task	Dataset	Result
Text Image Super-Resolution	Real-CE (val)	PSNR17.81	40
End-to-End Text Spotting	Real-Text	Accuracy (None Config)49.39	26
Text Detection	Real-Text	Precision84.3	26
Text Image Super-Resolution	BTL (test)	PSNR22.38	16
Image Super-resolution	RealCE (val)	Inference Time (ms)5.27e+3	7
Image Restoration	SA-Text 1K (test)	PSNR19.71	4
Image Restoration	Real-Text Real-world LQ-HQ pairs (test)	PSNR23	4

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord