Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RWKV-7 "Goose" with Expressive Dynamic State Evolution

About

We present RWKV-7 "Goose", a new sequence modeling architecture with constant memory usage and constant inference time per token. Despite being trained on dramatically fewer tokens than other top models, our 2.9 billion parameter language model achieves a new 3B SoTA on multilingual tasks and matches the current 3B SoTA on English language downstream performance. RWKV-7 introduces a newly generalized formulation of the delta rule with vector-valued gating and in-context learning rates, as well as a relaxed value replacement rule. We show that RWKV-7 can perform state tracking and recognize all regular languages, while retaining parallelizability of training. This exceeds the capabilities of Transformers under standard complexity conjectures, which are limited to $\mathsf{TC}^0$. To demonstrate RWKV-7's language modeling capability, we also present an extended open source 3.1 trillion token multilingual corpus, and train four RWKV-7 models ranging from 0.19 billion to 2.9 billion parameters on this dataset. To foster openness, reproduction, and adoption, we release our models and dataset component listing at https://huggingface.co/RWKV, and our training and inference code at https://github.com/RWKV/RWKV-LM all under the Apache 2.0 License.

Bo Peng, Ruichong Zhang, Daniel Goldstein, Eric Alcaide, Xingjian Du, Haowen Hou, Jiaju Lin, Jiaxing Liu, Janna Lu, William Merrill, Guangyu Song, Kaifeng Tan, Saiteja Utpala, Nathan Wilce, Johan S. Wind, Tianyi Wu, Daniel Wuttke, Christian Zhou-Zheng• 2025

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningWinoGrande
Accuracy71.67
1442
Commonsense ReasoningHellaSwag
HellaSwag Accuracy57.23
711
Physical Commonsense ReasoningPIQA
Accuracy80.7
696
Question AnsweringARC Challenge
Accuracy (ARC)25.7
598
Question AnsweringARC Easy--
597
Multi-task Language UnderstandingMMLU
MMLU Accuracy53.25
442
Language ModelingLAMBADA
Accuracy27.9
412
Question AnsweringOpenBookQA
Accuracy34
305
ReasoningARC Easy--
233
Graduate-level Question AnsweringGPQA
Accuracy30.8
215
Showing 10 of 19 rows

Other info

Follow for update