Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Prefix-Adaptive Block Diffusion for Efficient Document Recognition

About

Block Diffusion Models (BDMs) support parallel generation, flexible-length output, and KV caching, making them promising for efficient document parsing. However, existing BDMs bind denoising and cache commitment to fixed block boundaries: parallelism shrinks during intra-block denoising, while generated tokens cannot be cached until the whole block is completed. Moreover, intra-block bidirectional denoising conflicts with inter-block autoregression, creating inconsistent information flow that can challenge structure-sensitive recognition. We propose the Prefix-Adaptive Block Diffusion Model (PA-BDM), which replaces intra-block bidirectional denoising with causal denoising from prefix to suffix and treats the block size as a maximum candidate range rather than a fixed commitment unit. PA-BDM uses Confidence-gated Structural Loss (CSL) to build low-entropy prefixes before extending training to longer continuations. During inference, Progressive Prefix Commitment (PPC) then dynamically commits the longest reliable prefix into the KV cache and resets the next candidate range from the updated prefix, restoring a large parallel decoding space at each step. Experiments show that the 3B PA-BDM achieves higher recognition scores on several benchmarks and improves inference throughput by 71.6\% over the 2.5B MinerU-Diffusion.

Mingxu Chai, Ziyu Shen, Chenyu Liu, Kaidi Zhang, Jiazheng Zhang, Dingwei Zhu, Zhiheng Xi, Ruoyu Chen, Jun Long, Jihua Kang, Tao Gui, Qi Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Page-level OCROmniDocBench English DODO evaluation setting
Normalized Edit Distance6.1
17
Formula RecognitionUniMERNet CPE
CDM94.7
17
Formula RecognitionUniMER SPE (test)
CDM98.7
8
Formula RecognitionUniMER HWE (test)
CDM93.8
8
Table RecognitionFinTabNet
TEDS Score88.3
8
Table RecognitionPubtableNet
TEDS89.6
8
Text RecognitionDocLayNet
Edit Distance8.7
8
Formula RecognitionUniMER SCE (test)
CDM94.3
8
Text RecognitionOmniDoc
Edit Distance0.093
8
Layout DetectionDocLayNet
Precision91.8
5
Showing 10 of 11 rows

Other info

Follow for update