Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AuroraEdge-V-2B: A Faster And Stronger Edge Visual Large Language Model

About

Recently, due to the advancement of multimodal technology, people are attempting to use visual large language models (VLLMs) in industrial production. Many deep learning models (DLMs) deployed in the production environment are gradually being replaced by VLLMs. Compared with DLMs, VLLMs have some advantages in industrial applications: (1) Their strong generalization ability enables them to perform well across a wide range of tasks. (2) They are flexible and can deal with unfamiliar samples through context learning quickly. However, VLLMs also have obvious drawbacks: (1) VLLMs do not perform as well as custom-developed DLMs in specific domains. (2) The number of parameters in VLLMs is generally quite large, and their deployment requires substantial computational resources. (3) VLLMs generally operate much slower than DLMs, making real-time response challenging to achieve. To better utilize VLLMs in industrial applications, we introduce AuroraEdge-V-2B in this work, a compact, robust, and high-speed VLLM designed for edge deployment. To make the model run faster, we also propose a compression-fusion method to improve inference efficiency. AuroraEdge-V-2B has the following notable features: (1) Easy deployment and faster: It has only 2B parameters and is highly suitable for edge deployment, offering better real-time performance. (2) Fewer visual tokens and cheaper: It significantly reduces the number of visual tokens in the decoding process, thereby reducing the floating-point operations by half during inference and making it cheaper to use. (3) Strong performance: It gets a higher score on 9 benchmarks than models with the same number of parameter (e.g., Qwen2-VL-2B, Qwen2.5-VL-3B, InternVL-2.5-2B).

Xiang Chen• 2026

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringVQA v2
Accuracy83.21
1165
Visual Question AnsweringTextVQA--
1117
Visual Question AnsweringVizWiz
Accuracy71.05
1043
Visual Question AnsweringOKVQA
Top-1 Accuracy73.95
283
Science Question AnsweringScienceQA--
229
Diagram Question AnsweringAI2D--
196
Multimodal BenchmarkingMMBench CN
Score92.39
73
Multimodal BenchmarkingMMBench English--
61
Visual Question AnsweringGQA
Score65.75
47
Visual Question AnsweringOCRVQA
Accuracy68.15
47
Showing 10 of 12 rows

Other info

Follow for update