Arcee Trinity Large Technical Report

About

We present the technical report for Arcee Trinity Large, a sparse Mixture-of-Experts model with 400B total parameters and 13B activated per token. Additionally, we report on Trinity Nano and Trinity Mini, with Trinity Nano having 6B total parameters with 1B activated per token, Trinity Mini having 26B total parameters with 3B activated per token. The models' modern architecture includes interleaved local and global attention, gated attention, depth-scaled sandwich norm, and sigmoid routing for Mixture-of-Experts. For Trinity Large, we also introduce a new MoE load balancing strategy titled Soft-clamped Momentum Expert Bias Updates (SMEBU). We train the models using the Muon optimizer. All three models completed training with zero loss spikes. Trinity Nano and Trinity Mini were pre-trained on 10 trillion tokens, and Trinity Large was pre-trained on 17 trillion tokens. The model checkpoints are available at https://huggingface.co/arcee-ai.

Varun Singh, Lucas Krauss, Sami Jaghouar, Matej Sirovatka, Charles Goddard, Fares Obied, Jack Min Ong, Jannik Straube, Fern, Aria Harley, Conner Stewart, Colin Kealty, Maziyar Panahi, Simon Kirsten, Anushka Deshpande, Anneketh Vij, Arthur Bresnu, Pranav Veldurthi, Raghav Ravishankar, Hardik Bishnoi, DatologyAI Team, Arcee AI Team, Prime Intellect Team, Mark McQuade, Johannes Hagemann, Lucas Atkins• 2026

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	WinoGrande	--	1581
Language Understanding	MMLU	Accuracy82.58	844
Reasoning	BBH	--	770
Science Question Answering	ARC Challenge	Accuracy65.44	354
Multitask Language Understanding	MMLU-Pro	Accuracy75.25	303
Mathematical Problem Solving	AIME 25	--	84
Commonsense Reasoning	HellaSwag	HellaSwag Score90.11	55
Language Understanding	MMLU-Pro	MMLU-Pro Score66.02	44
Code Generation	MBPP+	Score88.62	43
Mathematical Reasoning	Minerva MATH500	Score65.2	36

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord