Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Zamba: A Compact 7B SSM Hybrid Model

About

In this technical report, we present Zamba, a novel 7B SSM-transformer hybrid model which achieves competitive performance against leading open-weight models at a comparable scale. Zamba is trained on 1T tokens from openly available datasets and is the best non-transformer model at this scale. Zamba pioneers a unique architecture combining a Mamba backbone with a single shared attention module, thus obtaining the benefits of attention at minimal parameter cost. Due to its architecture, Zamba is significantly faster at inference than comparable transformer models and requires substantially less memory for generation of long sequences. Zamba is pretrained in two phases: the first phase is based on existing web datasets, while the second one consists of annealing the model over high-quality instruct and synthetic datasets, and is characterized by a rapid learning rate decay. We open-source the weights and all checkpoints for Zamba, through both phase 1 and annealing phases.

Paolo Glorioso, Quentin Anthony, Yury Tokpanov, James Whittington, Jonathan Pilault, Adam Ibrahim, Beren Millidge• 2024

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningWinoGrande--
1442
Multi-task Language UnderstandingMMLU
Accuracy57.7
881
Question AnsweringARC-E
Accuracy74.5
523
Multitask Language UnderstandingMMLU
Accuracy58.19
263
Question AnsweringARC-C--
116
ReasoningARC-C
Accuracy37.18
112
Common Sense ReasoningPIQA
Accuracy81.4
100
Chinese Language UnderstandingC-Eval
Accuracy36.4
68
Chinese Multitask Language UnderstandingCMMLU
Accuracy38.42
67
Common Sense ReasoningHellaSwag
Accuracy (acc_n)76.4
47
Showing 10 of 13 rows

Other info

Follow for update