Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

mHC: Manifold-Constrained Hyper-Connections

About

Recently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and diversifying connectivity patterns. While yielding substantial performance gains, this diversification fundamentally compromises the identity mapping property intrinsic to the residual connection, which causes severe training instability and restricted scalability, and additionally incurs notable memory access overhead. To address these challenges, we propose Manifold-Constrained Hyper-Connections (mHC), a general framework that projects the residual connection space of HC onto a specific manifold to restore the identity mapping property, while incorporating rigorous infrastructure optimization to ensure efficiency. Empirical experiments demonstrate that mHC is effective for training at scale, offering tangible performance improvements and superior scalability. We anticipate that mHC, as a flexible and practical extension of HC, will contribute to a deeper understanding of topological architecture design and suggest promising directions for the evolution of foundational models.

Zhenda Xie, Yixuan Wei, Huanqi Cao, Chenggang Zhao, Chengqi Deng, Jiashi Li, Damai Dai, Huazuo Gao, Jiang Chang, Kuai Yu, Liang Zhao, Shangyan Zhou, Zhean Xu, Zhengyan Zhang, Wangding Zeng, Shengding Hu, Yuqing Wang, Jingyang Yuan, Lean Wang, Wenfeng Liang• 2025

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
Accuracy74.7
1891
Language ModelingC4
Perplexity98.9
1071
Multi-task Language UnderstandingMMLU
Accuracy63.4
876
Commonsense ReasoningPIQA
Accuracy80.5
751
Logical reasoningBBH
Accuracy51
201
Mathematical ReasoningGSM8K
EM53.8
123
Language ModelingOpenWebText (val)
Validation Loss3.023
80
Reading ComprehensionDROP
F1 Score53.9
73
Commonsense ReasoningCommonsense Reasoning Suite (test)
HellaSwag Accuracy0.362
62
Language ModelingWikiText
Wikitext PPL58.8
45
Showing 10 of 20 rows

Other info

Follow for update