Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

The Zamba2 Suite: Technical Report

About

In this technical report, we present the Zamba2 series -- a suite of 1.2B, 2.7B, and 7.4B parameter hybrid Mamba2-transformer models that achieve state of the art performance against the leading open-weights models of their class, while achieving substantial gains in inference latency, throughput, and memory efficiency. The Zamba2 series builds upon our initial work with Zamba1-7B, optimizing its architecture, training and annealing datasets, and training for up to three trillion tokens. We provide open-source weights for all models of the Zamba2 series as well as instruction-tuned variants that are strongly competitive against comparable instruct-tuned models of their class. We additionally open-source the pretraining dataset, which we call Zyda-2, used to train the Zamba2 series of models. The models and datasets used in this work are openly available at https://huggingface.co/Zyphra

Paolo Glorioso, Quentin Anthony, Yury Tokpanov, Anna Golubeva, Vasudev Shyam, James Whittington, Jonathan Pilault, Beren Millidge• 2024

Related benchmarks

TaskDatasetResultRank
Zero/Few-shot Language ModelingStandard Downstream Tasks (arc-c, arc-e, boolq, hellaswag, piqa, siqa, winogrande)
ARC-C Accuracy68.34
55
Recall-intensive retrievalRecall-intensive retrieval tasks SWDE, SQUADE, FDA, Trivial QA, NQ, Drop
Performance on SWDE64.48
24
Commonsense Reasoning and Knowledge UnderstandingCommonsense Reasoning and Knowledge Suite (ARC, HellaSwag, LAMBADA, PIQA, WinoGrande, MMLU)
ARC-e Accuracy80.13
13
Showing 3 of 3 rows

Other info

Follow for update