Mixtral of Experts

About

We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. Even though each token only sees two experts, the selected experts can be different at each timestep. As a result, each token has access to 47B parameters, but only uses 13B active parameters during inference. Mixtral was trained with a context size of 32k tokens and it outperforms or matches Llama 2 70B and GPT-3.5 across all evaluated benchmarks. In particular, Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks. We also provide a model fine-tuned to follow instructions, Mixtral 8x7B - Instruct, that surpasses GPT-3.5 Turbo, Claude-2.1, Gemini Pro, and Llama 2 70B - chat model on human benchmarks. Both the base and instruct models are released under the Apache 2.0 license.

Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, L\'elio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Th\'eophile Gervet, Thibaut Lavril, Thomas Wang, Timoth\'ee Lacroix, William El Sayed• 2024

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	Accuracy86.7	1896
Commonsense Reasoning	WinoGrande	Accuracy81.2	1581
Mathematical Reasoning	GSM8K	Accuracy58.4	1424
Code Generation	HumanEval	Pass@140.2	1048
Question Answering	ARC Challenge	Accuracy85.8	906
Multi-task Language Understanding	MMLU	Accuracy70.6	881
Instruction Following	IFEval	--	854
Commonsense Reasoning	PIQA	Accuracy83.6	757
Instruction Following	AlpacaEval 2.0	Win Rate23.7	752
Question Answering	ARC Challenge	Accuracy (ARC)48.98	631

Showing 10 of 175 rows

...

Other info

Code

Follow for update

@wizwand_team Discord