KromHC: Manifold-Constrained Hyper-Connections with Kronecker-Product Residual Matrices

About

The success of Hyper-Connections (HC) in neural networks (NN) has also highlighted issues related to training instability and restricted scalability. The Manifold-Constrained Hyper-Connections (mHC) mitigate these challenges by projecting the residual connection space onto a Birkhoff polytope, however, it faces two issues: 1) its iterative Sinkhorn-Knopp (SK) algorithm does not always yield exactly doubly stochastic residual matrices; 2) mHC incurs a prohibitive $O(n^3C)$ parameter complexity with $n$ as the width of the residual stream and $C$ as the feature dimension. The recently proposed mHC-lite reparametrizes the residual matrix via the Birkhoff-von-Neumann theorem to guarantee double stochasticity, but also faces a factorial explosion in its parameter complexity, $O \left( nC \cdot n! \right)$. To address both challenges, we propose KromHC, which uses the Kronecker products of smaller doubly stochastic matrices to parametrize the residual matrix in mHC. By enforcing manifold constraints across the factor residual matrices along each mode of the tensorized residual stream, KromHC guarantees exact double stochasticity of the residual matrices while reducing parameter complexity to only $O(n^2C)$. Experiments show that KromHC matches or even outperforms other state-of-the-art (SOTA) mHC variants, while requiring significantly fewer trainable parameters. The code is at https://github.com/wz1119/KromHC.

Wuyang Zhou, Yuxuan Gu, Giorgos Iacovides, Danilo Mandic• 2026

Related benchmarks

Task	Dataset	Result
Language Modeling	OpenWebText (val)	Validation Loss3.2759	114
Commonsense Reasoning	Commonsense Reasoning Suite (test)	HellaSwag Accuracy0.364	62
Downstream Performance Evaluation	CORE	CORE Score16.872	53
Language Modeling and Reasoning	BigBench (Lamb, SQuAD, CoQA, BBH, LSAT, LangID)	Avg Score24	8
LLM Pretraining	FineWeb-Edu (train)	Training Loss2.966	8
LLM Pretraining	FineWeb-Edu (val)	BPB0.862	8
Language Modeling Evaluation	TinyStories	Grammar6.56	5
Story Generation	TinyStories	Grammar Score6.04	5
Story Generation Evaluation	TinyStories GPT-4.1 Nano	Grammar6.26	5
Language Modeling	Experiment 4 medium scale (train)	Loss3.2709	4

Showing 10 of 11 rows

Other info

GitHub

Follow for update

@wizwand_team Discord