HSD: Training-Free Acceleration for Document Parsing Vision-Language Models with Hierarchical Speculative Decoding

About

Document parsing is a fundamental task in multimodal understanding, supporting a wide range of downstream applications such as information extraction and intelligent document analysis. Benefiting from strong semantic modeling and robust generalization, VLM-based end-to-end approaches have emerged as the mainstream paradigm in recent years. However, these models often suffer from substantial inference latency, as they must autoregressively generate long, full-page sequences when processing long-form documents. While recent hybrid methods mitigate this issue via region-level parallel decoding with VLMs, independent region decoding loses full-page context and might weaken global coherence. To address this issue, we propose Hierarchical Speculative Decoding (HSD), a two-stage local-to-global framework for document parsing. HSD first employs a lightweight pipeline drafter to predict region partitions and generate coarse drafts for each region. The first stage verifies the generated region-level drafts in parallel for efficiency, while the second stage further performs page-level verification on these refined outputs to preserve full-page coherence. Experimental results show that HSD achieves a near-lossless 2.7x speedup with HunyuanOCR on OmniDocBench v1.5 and up to 7.04x speedup on long-document parsing tasks, demonstrating the effectiveness of the proposed method. The code is available at https://github.com/whlscut/HSD.

Wenhui Liao, Hongliang Li, Pengyu Xie, Xinyu Cai, Yufan Shen, Yi Xin, Qi Qin, Shenglong Ye, Tianbin Li, Ming Hu, Junjun He, Yihao Liu, Wenhai Wang, Min Dou, Bin Fu, Botian Shi, Yu Qiao, Lianwen Jin• 2026

Related benchmarks

Task	Dataset	Result
Document Parsing	OmniDocBench v1.5	AAL1.96	6
Document Parsing	Ocean-OCR-Bench v1 (test)	AAL (Overall)7.88	3
Document Processing Acceleration	olmOCR-Bench long documents filtered > 8192 tokens	AAL4.08	2
Speculative Decoding Acceleration	OmniDocBench > 8192 tokens v1.5	AAL (Overall)4.98	2

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord