Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GLM-OCR Technical Report

About

GLM-OCR is an efficient 0.9B-parameter compact multimodal model designed for real-world document understanding. It combines a 0.4B-parameter CogViT visual encoder with a 0.5B-parameter GLM language decoder, achieving a strong balance between computational efficiency and recognition performance. To address the inefficiency of standard autoregressive decoding in deterministic OCR tasks, GLM-OCR introduces a Multi-Token Prediction (MTP) mechanism that predicts multiple tokens per step, significantly improving decoding throughput while keeping memory overhead low through shared parameters. At the system level, a two-stage pipeline is adopted: PP-DocLayout-V3 first performs layout analysis, followed by parallel region-level recognition. Extensive evaluations on public benchmarks and industrial scenarios show that GLM-OCR achieves competitive or state-of-the-art performance in document parsing, text and formula transcription, table structure recovery, and key information extraction. Its compact architecture and structured generation make it suitable for both resource-constrained edge deployment and large-scale production systems.

Shuaiqi Duan, Yadong Xue, Weihan Wang, Zhe Su, Huan Liu, Sheng Yang, Guobing Gan, Guo Wang, Zihan Wang, Shengdong Yan, Dexin Jin, Yuxuan Zhang, Guohong Wen, Yanfeng Wang, Yutao Zhang, Xiaohan Zhang, Wenyi Hong, Yukuo Cen, Da Yin, Bin Chen, Wenmeng Yu, Xiaotao Gu, Jie Tang• 2026

Related benchmarks

TaskDatasetResultRank
Document ParsingOmniDocBench v1.5
Overall Score94.62
195
Document ParsingOmniDocBench 1.5 (test)
Text Edit Error0.04
111
Document ParsingOmniDocBench Real5 warping
Overall Score90.68
32
Document ParsingReal5-OmniDocBench (screen-photography)
Overall Score91.75
32
Document ParsingOmniDocBench Real5 skewing variation
Overall Score85.39
32
Document ParsingOmniDocBench Real5
Score91.12
26
Document ParsingOmniDocBench Full v1.6
Overall Accuracy95.15
21
Table RecognitionPubTabNet
Overall Score85.2
14
Document ParsingOmniDocBench Scanning Real5
Overall Score92.67
13
Formula RecognitionUniMERNet SCE
CDM97.77
9
Showing 10 of 35 rows

Other info

Follow for update