The Zamba2 Suite: Technical Report

About

In this technical report, we present the Zamba2 series -- a suite of 1.2B, 2.7B, and 7.4B parameter hybrid Mamba2-transformer models that achieve state of the art performance against the leading open-weights models of their class, while achieving substantial gains in inference latency, throughput, and memory efficiency. The Zamba2 series builds upon our initial work with Zamba1-7B, optimizing its architecture, training and annealing datasets, and training for up to three trillion tokens. We provide open-source weights for all models of the Zamba2 series as well as instruction-tuned variants that are strongly competitive against comparable instruct-tuned models of their class. We additionally open-source the pretraining dataset, which we call Zyda-2, used to train the Zamba2 series of models. The models and datasets used in this work are openly available at https://huggingface.co/Zyphra

Paolo Glorioso, Quentin Anthony, Yury Tokpanov, Anna Golubeva, Vasudev Shyam, James Whittington, Jonathan Pilault, Beren Millidge• 2024

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	WinoGrande	Accuracy73.88	1442
Commonsense Reasoning	HellaSwag	HellaSwag Accuracy57.57	711
Physical Commonsense Reasoning	PIQA	Accuracy79.27	696
Multi-task Language Understanding	MMLU	MMLU Accuracy56.54	442
Question Answering	OpenBookQA	Accuracy32.2	305
Reasoning	ARC Easy	--	233
Reasoning	ARC Challenge	Accuracy48.98	81
Zero/Few-shot Language Modeling	Standard Downstream Tasks (arc-c, arc-e, boolq, hellaswag, piqa, siqa, winogrande)	ARC-C Accuracy68.34	55
Truthfulness	TruthfulQA	Truthfulness Accuracy45.77	51
Recall-intensive retrieval	Recall-intensive retrieval tasks SWDE, SQUADE, FDA, Trivial QA, NQ, Drop	Performance on SWDE64.48	31

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord