GLM-OCR Technical Report

About

GLM-OCR is an efficient 0.9B-parameter compact multimodal model designed for real-world document understanding. It combines a 0.4B-parameter CogViT visual encoder with a 0.5B-parameter GLM language decoder, achieving a strong balance between computational efficiency and recognition performance. To address the inefficiency of standard autoregressive decoding in deterministic OCR tasks, GLM-OCR introduces a Multi-Token Prediction (MTP) mechanism that predicts multiple tokens per step, significantly improving decoding throughput while keeping memory overhead low through shared parameters. At the system level, a two-stage pipeline is adopted: PP-DocLayout-V3 first performs layout analysis, followed by parallel region-level recognition. Extensive evaluations on public benchmarks and industrial scenarios show that GLM-OCR achieves competitive or state-of-the-art performance in document parsing, text and formula transcription, table structure recovery, and key information extraction. Its compact architecture and structured generation make it suitable for both resource-constrained edge deployment and large-scale production systems.

Shuaiqi Duan, Yadong Xue, Weihan Wang, Zhe Su, Huan Liu, Sheng Yang, Guobing Gan, Guo Wang, Zihan Wang, Shengdong Yan, Dexin Jin, Yuxuan Zhang, Guohong Wen, Yanfeng Wang, Yutao Zhang, Xiaohan Zhang, Wenyi Hong, Yukuo Cen, Da Yin, Bin Chen, Wenmeng Yu, Xiaotao Gu, Jie Tang• 2026

Related benchmarks

Task	Dataset	Result
Document Parsing	OmniDocBench v1.5	Overall Score94.62	195
Document Parsing	OmniDocBench 1.5 (test)	Text Edit Error0.04	132
Document Parsing	OmniDocBench Full v1.6	Overall Accuracy95.15	44
Document Parsing	OmniDocBench Real5 warping	Overall Score90.68	32
Document Parsing	Real5-OmniDocBench (screen-photography)	Overall Score91.75	32
Document Parsing	OmniDocBench Real5 skewing variation	Overall Score85.39	32
Document Parsing	OmniDocBench Real5	Score91.12	26
Formula Recognition	UniMERNet CPE	CDM96.74	17
Table Recognition	PubTabNet	Overall Score85.2	14
Document Parsing	OmniDocBench Scanning Real5	Overall Score92.67	13

Showing 10 of 35 rows

Other info

Follow for update

@wizwand_team Discord