Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR

About

We present \textbf{LightOnOCR-2-1B}, a 1B-parameter end-to-end multilingual vision--language model that converts document images (e.g., PDFs) into clean, naturally ordered text without brittle OCR pipelines. Trained on a large-scale, high-quality distillation mix with strong coverage of scans, French documents, and scientific PDFs, LightOnOCR-2 achieves state-of-the-art results on OlmOCR-Bench while being 9$\times$ smaller and substantially faster than prior best-performing models. We further extend the output format to predict normalized bounding boxes for embedded images, introducing localization during pretraining via a resume strategy and refining it with RLVR using IoU-based rewards. Finally, we improve robustness with checkpoint averaging and task-arithmetic merging. We release model checkpoints under Apache 2.0, and publicly release the dataset and \textbf{LightOnOCR-bbox-bench} evaluation under their respective licenses.

Said Taghadouini, Adrien Cavaill\`es, Baptiste Aubertin• 2026

Related benchmarks

TaskDatasetResultRank
Document ParsingolmOCR-bench
ArXiv Processing Accuracy89.6
36
Document ParsingOmniDocBench EN v1.0
Overall Edit Distance0.146
15
Document ParsingOmniDocBench ZH v1.0
Overall Edit0.255
15
Bounding box detectionLightOnOCR-bbox-bench OlmOCR (290)
F1@0.578
3
Bounding box detectionLightOnOCR-bbox-bench arXiv
F1@0.583
3
Showing 5 of 5 rows

Other info

Follow for update