AuroraEdge-V-2B: A Faster And Stronger Edge Visual Large Language Model

About

Recently, due to the advancement of multimodal technology, people are attempting to use visual large language models (VLLMs) in industrial production. Many deep learning models (DLMs) deployed in the production environment are gradually being replaced by VLLMs. Compared with DLMs, VLLMs have some advantages in industrial applications: (1) Their strong generalization ability enables them to perform well across a wide range of tasks. (2) They are flexible and can deal with unfamiliar samples through context learning quickly. However, VLLMs also have obvious drawbacks: (1) VLLMs do not perform as well as custom-developed DLMs in specific domains. (2) The number of parameters in VLLMs is generally quite large, and their deployment requires substantial computational resources. (3) VLLMs generally operate much slower than DLMs, making real-time response challenging to achieve. To better utilize VLLMs in industrial applications, we introduce AuroraEdge-V-2B in this work, a compact, robust, and high-speed VLLM designed for edge deployment. To make the model run faster, we also propose a compression-fusion method to improve inference efficiency. AuroraEdge-V-2B has the following notable features: (1) Easy deployment and faster: It has only 2B parameters and is highly suitable for edge deployment, offering better real-time performance. (2) Fewer visual tokens and cheaper: It significantly reduces the number of visual tokens in the decoding process, thereby reducing the floating-point operations by half during inference and making it cheaper to use. (3) Strong performance: It gets a higher score on 9 benchmarks than models with the same number of parameter (e.g., Qwen2-VL-2B, Qwen2.5-VL-3B, InternVL-2.5-2B).

Xiang Chen• 2026

Related benchmarks

Task	Dataset	Result
Visual Question Answering	VizWiz	Accuracy71.05	1863
Visual Question Answering	TextVQA	--	1455
Visual Question Answering	VQA v2	Accuracy83.21	1429
Science Question Answering	ScienceQA	--	916
Diagram Question Answering	AI2D	--	509
Visual Question Answering	OKVQA	Top-1 Accuracy73.95	283
Visual Question Answering	GQA	Score65.75	193
Multimodal Benchmarking	MMBench CN	Score92.39	151
Multimodal Benchmarking	MMBench English	--	125
Visual Question Answering	OCRVQA	Accuracy68.15	62

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord