Falcon Mamba: The First Competitive Attention-free 7B Language Model

About

In this technical report, we present Falcon Mamba 7B, a new base large language model based on the novel Mamba architecture. Falcon Mamba 7B is trained on 5.8 trillion tokens with carefully selected data mixtures. As a pure Mamba-based model, Falcon Mamba 7B surpasses leading open-weight models based on Transformers, such as Mistral 7B, Llama3.1 8B, and Falcon2 11B. It is on par with Gemma 7B and outperforms models with different architecture designs, such as RecurrentGemma 9B and RWKV-v6 Finch 7B/14B. Currently, Falcon Mamba 7B is the best-performing Mamba model in the literature at this scale, surpassing both existing Mamba and hybrid Mamba-Transformer models, according to the Open LLM Leaderboard. Due to its architecture, Falcon Mamba 7B is significantly faster at inference and requires substantially less memory for long sequence generation. Despite recent studies suggesting that hybrid Mamba-Transformer models outperform pure architecture designs, we demonstrate that even the pure Mamba design can achieve similar, or even superior results compared to the Transformer and hybrid designs. We make the weights of our implementation of Falcon Mamba 7B publicly available on https://huggingface.co/tiiuae/falcon-mamba-7b, under a permissive license.

Jingwei Zuo, Maksim Velikanov, Dhia Eddine Rhaiem, Ilyas Chahed, Younes Belkada, Guillaume Kunsch, Hakim Hacid• 2024

Related benchmarks

Task	Dataset	Result
Multitask Language Understanding	MMLU	Accuracy63.24	263
Reasoning	ARC-C	Accuracy47.53	112
Chinese Language Understanding	C-Eval	Accuracy41.93	68
Chinese Multitask Language Understanding	CMMLU	Accuracy42.5	67
Long-context retrieval	RULER	Retrieval Accuracy (8K)49	44
Generative Question Answering	Bolmo Evaluation Suite GenQA 7B	GenQA Average68.5	39
Mathematical Reasoning	OlmoBaseEval Math (GSM8k, GSM Symbolic, MATH)	Math Aggregate Score33.7	34
Multiple Choice Non-STEM Question Answering	OlmoBaseEval MC Non-STEM (MMLU Humanities/Social Sci, CSQA, PiQA, SocialIQA, CoQA, DROP, Jeopardy, NaturalQs, SQuAD)	Aggregate Score74.2	34
Code Generation	OlmoBaseEval Code BigCodeBench, HumanEval, DeepSeek LeetCode, DS 1000, MBPP, MultiPL	OlmoBaseEval Code Score14.6	34
Multiple Choice STEM Question Answering	OlmoBaseEval MCSTEM	MCSTEM Score64.2	22

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord