RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
About
We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens.
Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, L\'eonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivi\`ere, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Armand Joulin, Noah Fiedel, Evan Senter, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, David Budden, Arnaud Doucet, Sharad Vikram, Adam Paszke, Trevor Gale, Sebastian Borgeaud, Charlie Chen, Andy Brock, Antonia Paterson, Jenny Brennan, Meg Risdal, Raj Gundluru, Nesh Devanathan, Paul Mooney, Nilay Chauhan, Phil Culliton, Luiz Gustavo Martins, Elisa Bandy, David Huntsperger, Glenn Cameron, Arthur Zucker, Tris Warkentin, Ludovic Peran, Minh Giang, Zoubin Ghahramani, Cl\'ement Farabet, Koray Kavukcuoglu, Demis Hassabis, Raia Hadsell, Yee Whye Teh, Nando de Frietas• 2024
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Generative Question Answering | Bolmo Evaluation Suite GenQA 7B | GenQA Average68.5 | 39 | |
| Code Generation | OlmoBaseEval Code BigCodeBench, HumanEval, DeepSeek LeetCode, DS 1000, MBPP, MultiPL | OlmoBaseEval Code Score23.7 | 34 | |
| Mathematical Reasoning | OlmoBaseEval Math (GSM8k, GSM Symbolic, MATH) | Math Aggregate Score32.1 | 34 | |
| Multiple Choice Non-STEM Question Answering | OlmoBaseEval MC Non-STEM (MMLU Humanities/Social Sci, CSQA, PiQA, SocialIQA, CoQA, DROP, Jeopardy, NaturalQs, SQuAD) | Aggregate Score71.1 | 34 | |
| Long-context retrieval | RULER | -- | 34 | |
| Multiple Choice STEM Question Answering | OlmoBaseEval MCSTEM | MCSTEM Score61.6 | 22 | |
| General Language Model Evaluation | OlmoBaseEval HeldOut (LBPP, BBH, MMLU Pro, etc.) | LBPP Score5.8 | 10 | |
| General Language Understanding | Open LLM Leaderboard (test) | ARC52 | 9 | |
| Mathematical and Code Reasoning | ZeroEval (test) | GSM8K Accuracy38.51 | 8 |
Showing 9 of 9 rows