Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model

About

Document Layout Parsing serves as a critical gateway for Artificial Intelligence (AI) to access and interpret the world's vast stores of structured knowledge. This process,which encompasses layout detection, text recognition, and relational understanding, is particularly crucial for empowering next-generation Vision-Language Models. Current methods, however, rely on fragmented, multi-stage pipelines that suffer from error propagation and fail to leverage the synergies of joint training. In this paper, we introduce dots_ocr, a single Vision-Language Model that, for the first time, demonstrates the advantages of jointly learning three core tasks within a unified, end-to-end framework. This is made possible by a highly scalable data engine that synthesizes a vast multilingual corpus, empowering the model to deliver robust performance across a wide array of tasks, encompassing diverse languages, layouts, and domains. The efficacy of our unified paradigm is validated by state-of-the-art performance on the comprehensive OmniDocBench. Furthermore, to catalyze research in global document intelligence, we introduce XDocParse, a challenging new benchmark spanning 126 languages. On this benchmark, dots_ocr achieves state-of-the-art performance, delivering an approximately 10% relative improvement and demonstrating strong multilingual capability.

Yumeng Li, Guang Yang, Hao Liu, Bowen Wang, Colin Zhang• 2025

Related benchmarks

TaskDatasetResultRank
Document ParsingOmniDocBench v1.5
Overall Score88.41
126
Document ParsingolmOCR-bench
ArXiv Processing Accuracy82.1
36
Reading Order DetectionOmniDocBench EN v1.0
Edit Distance0.04
28
Reading Order DetectionOmniDocBench ZH v1.0
Edit Distance0.067
28
Document ParsingOmniDocBench 1.5 (test)
Overall Score88.41
27
OCR-related Parsing TasksOmniDocBench English
Edit Distance0.032
23
Reading Order DetectionOmniDocBench v1.5
Edit Distance0.053
21
Document ParsingReal5-OmniDocBench (screen-photography)
Overall Score87.18
19
Document ParsingReal5-OmniDocBench 5-distortion types (test)
Overall Accuracy86.38
19
Document ParsingOmniDocBench Real5 skewing variation
Overall Score84.27
19
Showing 10 of 30 rows

Other info

Follow for update