Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR

About

We present \textbf{LightOnOCR-2-1B}, a 1B-parameter end-to-end multilingual vision--language model that converts document images (e.g., PDFs) into clean, naturally ordered text without brittle OCR pipelines. Trained on a large-scale, high-quality distillation mix with strong coverage of scans, French documents, and scientific PDFs, LightOnOCR-2 achieves state-of-the-art results on OlmOCR-Bench while being 9$\times$ smaller and substantially faster than prior best-performing models. We further extend the output format to predict normalized bounding boxes for embedded images, introducing localization during pretraining via a resume strategy and refining it with RLVR using IoU-based rewards. Finally, we improve robustness with checkpoint averaging and task-arithmetic merging. We release model checkpoints under Apache 2.0, and publicly release the dataset and \textbf{LightOnOCR-bbox-bench} evaluation under their respective licenses.

Said Taghadouini, Adrien Cavaill\`es, Baptiste Aubertin• 2026

Related benchmarks

TaskDatasetResultRank
Document ParsingolmOCR-bench
ArXiv Processing Accuracy89.6
45
Table Extraction100 pages (451 tables) synthetic (test)
LLM Score (Overall)9.08
21
Document ParsingOmniDocBench EN v1.0
Overall Edit Distance0.146
15
Document ParsingOmniDocBench ZH v1.0
Overall Edit0.255
15
Bounding box detectionLightOnOCR-bbox-bench OlmOCR (290)
F1@0.578
3
Bounding box detectionLightOnOCR-bbox-bench arXiv
F1@0.583
3
Showing 6 of 6 rows

Other info

Follow for update