Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

The Zamba2 Suite: Technical Report

About

In this technical report, we present the Zamba2 series -- a suite of 1.2B, 2.7B, and 7.4B parameter hybrid Mamba2-transformer models that achieve state of the art performance against the leading open-weights models of their class, while achieving substantial gains in inference latency, throughput, and memory efficiency. The Zamba2 series builds upon our initial work with Zamba1-7B, optimizing its architecture, training and annealing datasets, and training for up to three trillion tokens. We provide open-source weights for all models of the Zamba2 series as well as instruction-tuned variants that are strongly competitive against comparable instruct-tuned models of their class. We additionally open-source the pretraining dataset, which we call Zyda-2, used to train the Zamba2 series of models. The models and datasets used in this work are openly available at https://huggingface.co/Zyphra

Paolo Glorioso, Quentin Anthony, Yury Tokpanov, Anna Golubeva, Vasudev Shyam, James Whittington, Jonathan Pilault, Beren Millidge• 2024

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningWinoGrande
Accuracy73.88
1442
Commonsense ReasoningHellaSwag
HellaSwag Accuracy57.57
711
Physical Commonsense ReasoningPIQA
Accuracy79.27
696
Multi-task Language UnderstandingMMLU
MMLU Accuracy56.54
442
Question AnsweringOpenBookQA
Accuracy32.2
305
ReasoningARC Easy--
233
ReasoningARC Challenge
Accuracy48.98
81
Zero/Few-shot Language ModelingStandard Downstream Tasks (arc-c, arc-e, boolq, hellaswag, piqa, siqa, winogrande)
ARC-C Accuracy68.34
55
TruthfulnessTruthfulQA
Truthfulness Accuracy45.77
51
Recall-intensive retrievalRecall-intensive retrieval tasks SWDE, SQUADE, FDA, Trivial QA, NQ, Drop
Performance on SWDE64.48
31
Showing 10 of 11 rows

Other info

Follow for update