RWKV-7 "Goose" with Expressive Dynamic State Evolution

About

We present RWKV-7 "Goose", a new sequence modeling architecture with constant memory usage and constant inference time per token. Despite being trained on dramatically fewer tokens than other top models, our 2.9 billion parameter language model achieves a new 3B SoTA on multilingual tasks and matches the current 3B SoTA on English language downstream performance. RWKV-7 introduces a newly generalized formulation of the delta rule with vector-valued gating and in-context learning rates, as well as a relaxed value replacement rule. We show that RWKV-7 can perform state tracking and recognize all regular languages, while retaining parallelizability of training. This exceeds the capabilities of Transformers under standard complexity conjectures, which are limited to $\mathsf{TC}^0$. To demonstrate RWKV-7's language modeling capability, we also present an extended open source 3.1 trillion token multilingual corpus, and train four RWKV-7 models ranging from 0.19 billion to 2.9 billion parameters on this dataset. To foster openness, reproduction, and adoption, we release our models and dataset component listing at https://huggingface.co/RWKV, and our training and inference code at https://github.com/RWKV/RWKV-LM all under the Apache 2.0 License.

Bo Peng, Ruichong Zhang, Daniel Goldstein, Eric Alcaide, Xingjian Du, Haowen Hou, Jiaju Lin, Jiaxing Liu, Janna Lu, William Merrill, Guangyu Song, Kaifeng Tan, Saiteja Utpala, Nathan Wilce, Johan S. Wind, Tianyi Wu, Daniel Wuttke, Christian Zhou-Zheng• 2025

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	WinoGrande	Accuracy71.67	1442
Commonsense Reasoning	HellaSwag	HellaSwag Accuracy57.23	711
Physical Commonsense Reasoning	PIQA	Accuracy80.7	696
Question Answering	ARC Challenge	Accuracy (ARC)25.7	598
Question Answering	ARC Easy	--	597
Multi-task Language Understanding	MMLU	MMLU Accuracy53.25	442
Language Modeling	LAMBADA	Accuracy27.9	412
Question Answering	OpenBookQA	Accuracy34	305
Reasoning	ARC Easy	--	233
Graduate-level Question Answering	GPQA	Accuracy30.8	215

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord