ZAYA1-8B Technical Report

About

We present ZAYA1-8B, a reasoning-focused mixture-of-experts (MoE) model with 700M active and 8B total parameters, built on Zyphra's MoE++ architecture. ZAYA1-8B's core pretraining, midtraining, and supervised fine-tuning (SFT) were performed on a full-stack AMD compute, networking, and software platform. With under 1B active parameters, ZAYA1-8B matches or exceeds DeepSeek-R1-0528 on several challenging mathematics and coding benchmarks, and remains competitive with substantially larger open-weight reasoning models. ZAYA1-8B was trained from scratch for reasoning, with reasoning data included from pretraining onward using an answer-preserving trimming scheme. Post-training uses a four-stage RL cascade: reasoning warmup on math and puzzles; a 400-task RLVE-Gym curriculum; math and code RL with test-time compute traces and synthetic code environments built from competitive-programming references; and behavioral RL for chat and instruction following. We also introduce Markovian RSA, a test-time compute method that recursively aggregates parallel reasoning traces while carrying forward only bounded-length reasoning tails between rounds. In TTC evaluation, Markovian RSA raises ZAYA1-8B to 91.9\% on AIME'25 and 89.6\% on HMMT'25 while carrying forward only a 4K-token tail, narrowing the gap to much larger reasoning models including Gemini-2.5 Pro, DeepSeek-V3.2, and GPT-5-High.

Robert Washbourne, Rishi Iyer, Tomas Figliolia, Henry Zheng, Ryan Lorig-Roach, Sungyeon Yang, Pritish Yuvraj, Quentin Anthony, Yury Tokpanov, Xiao Yang, Ganesh Nanduru, Stephen Ebert, Praneeth Medepalli, Skyler Szot, Srivatsan Rajagopal, Alex Ong, Bhavana Mehta, Beren Millidge• 2026

Related benchmarks

Task	Dataset	Result
Instruction Following	IFEval	--	854
Mathematical Reasoning	AIME 2025	Accuracy91.9	353
Knowledge	MMLU-Pro	Score74.2	98
Instruction Following	IFBench	IFBench Score52.6	68
Mathematical Reasoning	HMMT Feb 2025	Accuracy89.6	54
Coding	LiveCodeBench v6	Score (%)64.8	51
Knowledge	GPQA Diamond	Accuracy (GPQA Knowledge)71	49
Code Generation	LiveCodeBench v6 (2025-02 to 2025-05)	Accuracy71.1	31
Emotional Intelligence Evaluation	EQ-Bench	Overall Score73	19
Math	IMO-AnswerBench	Score59.3	18

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord