Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR

About

Optical Character Recognition (OCR) is fundamental to Vision-Language Models (VLMs) and high-quality data generation for LLM training. Yet, despite progress in average OCR accuracy, state-of-the-art VLMs still struggle with detecting sample-level errors and lack effective unsupervised quality control. We introduce Consensus Entropy (CE), a training-free, model-agnostic metric that estimates output reliability by measuring inter-model agreement entropy. The core insight is that correct predictions converge in output space, while errors diverge. Based on CE, we develop CE-OCR, a lightweight multi-model framework that verifies outputs by ensemble agreement, selects the best outputs, and further improves efficiency through adaptive routing. Experiments demonstrate that CE is robust for quality verification, improving F1 scores by 42.1\% over VLM-as-Judge. CE-OCR achieves consistent OCR gains, outperforming self-consistency and single-model baselines at the same cost. Notably, CE requires no training or supervision, enabling plug-and-play integration.

Yulong Zhang, Tianyi Liang, Xinyue Huang, Erfei Cui, Guoqing Wang, Xu Guo, Chenhui Li, Gongshen Liu• 2025

Related benchmarks

TaskDatasetResultRank
Optical Character RecognitionOCRBench
Score922
232
Visual Question AnsweringScienceVQA--
36
OCR VerificationCurated dataset of 1,000 PDF pages 1.0 (test)
F1 Score70.88
30
Multimodal Optical Character RecognitionOCRBench v2
En Accuracy71.6
5
Visual Question AnsweringScene-VQA easy
Accuracy98
2
Visual Question AnsweringDoc-VQA easy
Accuracy90.5
2
Visual Question AnsweringFormula easy
Accuracy88
2
Visual Question AnsweringMath-VQA
Accuracy45.6
2
Visual Question AnsweringKnowledge-Reasoning
Accuracy66.3
2
Visual Question AnsweringVisual-Understanding
Accuracy82.4
2
Showing 10 of 10 rows

Other info

Follow for update