Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

H2OVL-Mississippi Vision Language Models Technical Report

About

Smaller vision-language models (VLMs) are becoming increasingly important for privacy-focused, on-device applications due to their ability to run efficiently on consumer hardware for processing enterprise commercial documents and images. These models require strong language understanding and visual capabilities to enhance human-machine interaction. To address this need, we present H2OVL-Mississippi, a pair of small VLMs trained on 37 million image-text pairs using 240 hours of compute on 8 x H100 GPUs. H2OVL-Mississippi-0.8B is a tiny model with 0.8 billion parameters that specializes in text recognition, achieving state of the art performance on the Text Recognition portion of OCRBench and surpassing much larger models in this area. Additionally, we are releasing H2OVL-Mississippi-2B, a 2 billion parameter model for general use cases, exhibiting highly competitive metrics across various academic benchmarks. Both models build upon our prior work with H2O-Danube language models, extending their capabilities into the visual domain. We release them under the Apache 2.0 license, making VLMs accessible to everyone, democratizing document AI and visual LLMs.

Shaikat Galib, Shanshan Wang, Guanshuo Xu, Pascal Pfeiffer, Ryan Chesler, Mark Landry, Sri Satish Ambati• 2024

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMathVista
Score56.8
322
OCR EvaluationOCRBench
Score782
296
Multimodal Capability EvaluationMM-Vet
Score44.7
282
Document Visual Question AnsweringDocVQA (test)--
192
Multi-discipline Multimodal UnderstandingMMMU (val)--
167
Text-based Visual Question AnsweringTextVQA (val)
Accuracy75.1
146
Diagram UnderstandingAI2D (test)
Accuracy69.9
107
Information Visual Question AnsweringInfoVQA (test)--
92
Multimodal ReasoningMMStar--
81
Hallucination and Visual Reasoning EvaluationHallusionBench
Score36.4
37
Showing 10 of 16 rows

Other info

Code

Follow for update