RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

About

We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens.

Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, L\'eonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivi\`ere, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Armand Joulin, Noah Fiedel, Evan Senter, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, David Budden, Arnaud Doucet, Sharad Vikram, Adam Paszke, Trevor Gale, Sebastian Borgeaud, Charlie Chen, Andy Brock, Antonia Paterson, Jenny Brennan, Meg Risdal, Raj Gundluru, Nesh Devanathan, Paul Mooney, Nilay Chauhan, Phil Culliton, Luiz Gustavo Martins, Elisa Bandy, David Huntsperger, Glenn Cameron, Arthur Zucker, Tris Warkentin, Ludovic Peran, Minh Giang, Zoubin Ghahramani, Cl\'ement Farabet, Koray Kavukcuoglu, Demis Hassabis, Raia Hadsell, Yee Whye Teh, Nando de Frietas• 2024

Related benchmarks

Task	Dataset	Result
Long-context retrieval	RULER	--	44
Generative Question Answering	Bolmo Evaluation Suite GenQA 7B	GenQA Average68.5	39
Code Generation	OlmoBaseEval Code BigCodeBench, HumanEval, DeepSeek LeetCode, DS 1000, MBPP, MultiPL	OlmoBaseEval Code Score23.7	34
Mathematical Reasoning	OlmoBaseEval Math (GSM8k, GSM Symbolic, MATH)	Math Aggregate Score32.1	34
Multiple Choice Non-STEM Question Answering	OlmoBaseEval MC Non-STEM (MMLU Humanities/Social Sci, CSQA, PiQA, SocialIQA, CoQA, DROP, Jeopardy, NaturalQs, SQuAD)	Aggregate Score71.1	34
Multiple Choice STEM Question Answering	OlmoBaseEval MCSTEM	MCSTEM Score61.6	22
General Language Model Evaluation	OlmoBaseEval HeldOut (LBPP, BBH, MMLU Pro, etc.)	LBPP Score5.8	10
General Language Understanding	Open LLM Leaderboard (test)	ARC52	9
Mathematical and Code Reasoning	ZeroEval (test)	GSM8K Accuracy38.51	8

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord