Evo: Autoregressive-Diffusion Large Language Models with Evolving Balance

About

We introduce \textbf{Evo}, a duality latent trajectory model that bridges autoregressive (AR) and diffusion-based language generation within a continuous evolutionary generative framework. Rather than treating AR decoding and diffusion generation as separate paradigms, Evo reconceptualizes text generation as a latent flow: each token is associated with a vector-valued embedding that evolves over a progression variable $t_i \in [0, 1]$, indicating its semantic maturity. Low $t_i$ values correspond to confident AR-like refinement, while high values invoke diffusion-style planning, allowing the model to adaptively balance AR and diffusion based on uncertainty. Theoretically, we show that both AR and diffusion models emerge as discretizations of a shared probability flow, and we derive Evo's training objective from a unified variational ELBO. The model is implemented as a time-conditioned Transformer governed by a shared vector field, trained end-to-end to jointly infer latent codes and their progression times. During decoding, Evo performs efficient, semantics-aware refinement, achieving high-quality outputs without sacrificing speed. Empirically, Evo 8B achieves state-of-the-art or highly competitive results on 15 diverse benchmarks, including reasoning (GSM8K, ARC-C), code generation (HumanEval, MBPP), and general language understanding, while maintaining fast inference speed. Our results demonstrate that Evo delivers a new paradigm for LLM design with strong generation quality, robust symbolic reasoning, and decoding efficiency.

Junde Wu, Minhao Hu, Jiayuan Zhu, Yuyuan Liu, Tianyi Zhang, Kang Li, Jingkun Chen, Jiazhen Pan, Min Xu, Yueming Jin• 2026

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	Accuracy82.1	1896
Commonsense Reasoning	WinoGrande	Accuracy76.3	1442
Code Generation	HumanEval	--	1043
Question Answering	ARC Challenge	Accuracy65.6	906
Language Understanding	MMLU	Accuracy78.6	844
Reasoning	BBH	Accuracy68.4	726
Physical Commonsense Reasoning	PIQA	Accuracy81.2	696
Common Sense Reasoning	HellaSwag	Accuracy86.4	213
Scientific Reasoning	GPQA	Accuracy39.1	75
Science Question Answering	GPQA	Accuracy38.4	69

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord