Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ZAYA1-8B Technical Report

About

We present ZAYA1-8B, a reasoning-focused mixture-of-experts (MoE) model with 700M active and 8B total parameters, built on Zyphra's MoE++ architecture. ZAYA1-8B's core pretraining, midtraining, and supervised fine-tuning (SFT) were performed on a full-stack AMD compute, networking, and software platform. With under 1B active parameters, ZAYA1-8B matches or exceeds DeepSeek-R1-0528 on several challenging mathematics and coding benchmarks, and remains competitive with substantially larger open-weight reasoning models. ZAYA1-8B was trained from scratch for reasoning, with reasoning data included from pretraining onward using an answer-preserving trimming scheme. Post-training uses a four-stage RL cascade: reasoning warmup on math and puzzles; a 400-task RLVE-Gym curriculum; math and code RL with test-time compute traces and synthetic code environments built from competitive-programming references; and behavioral RL for chat and instruction following. We also introduce Markovian RSA, a test-time compute method that recursively aggregates parallel reasoning traces while carrying forward only bounded-length reasoning tails between rounds. In TTC evaluation, Markovian RSA raises ZAYA1-8B to 91.9\% on AIME'25 and 89.6\% on HMMT'25 while carrying forward only a 4K-token tail, narrowing the gap to much larger reasoning models including Gemini-2.5 Pro, DeepSeek-V3.2, and GPT-5-High.

Robert Washbourne, Rishi Iyer, Tomas Figliolia, Henry Zheng, Ryan Lorig-Roach, Sungyeon Yang, Pritish Yuvraj, Quentin Anthony, Yury Tokpanov, Xiao Yang, Ganesh Nanduru, Stephen Ebert, Praneeth Medepalli, Skyler Szot, Srivatsan Rajagopal, Alex Ong, Bhavana Mehta, Beren Millidge• 2026

Related benchmarks

TaskDatasetResultRank
Instruction FollowingIFEval--
836
Mathematical ReasoningAIME 2025
Accuracy91.9
311
KnowledgeMMLU-Pro
Score74.2
63
Instruction FollowingIFBench
IFBench Score52.6
56
Mathematical ReasoningHMMT Feb 2025
Accuracy89.6
45
KnowledgeGPQA Diamond
Accuracy (GPQA Knowledge)71
37
Code GenerationLiveCodeBench v6 (2025-02 to 2025-05)
Accuracy71.1
31
CodingLiveCodeBench v6
Score (%)64.8
31
Emotional Intelligence EvaluationEQ-Bench
Overall Score73
19
InstructionIFEval
Score85.6
17
Showing 10 of 20 rows

Other info

Follow for update