Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

mHC: Manifold-Constrained Hyper-Connections

About

Recently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and diversifying connectivity patterns. While yielding substantial performance gains, this diversification fundamentally compromises the identity mapping property intrinsic to the residual connection, which causes severe training instability and restricted scalability, and additionally incurs notable memory access overhead. To address these challenges, we propose Manifold-Constrained Hyper-Connections (mHC), a general framework that projects the residual connection space of HC onto a specific manifold to restore the identity mapping property, while incorporating rigorous infrastructure optimization to ensure efficiency. Empirical experiments demonstrate that mHC is effective for training at scale, offering tangible performance improvements and superior scalability. We anticipate that mHC, as a flexible and practical extension of HC, will contribute to a deeper understanding of topological architecture design and suggest promising directions for the evolution of foundational models.

Zhenda Xie, Yixuan Wei, Huanqi Cao, Chenggang Zhao, Chengqi Deng, Jiashi Li, Damai Dai, Huazuo Gao, Jiang Chang, Kuai Yu, Liang Zhao, Shangyan Zhou, Zhean Xu, Zhengyan Zhang, Wangding Zeng, Shengding Hu, Yuqing Wang, Jingyang Yuan, Lean Wang, Wenfeng Liang• 2025

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
Accuracy74.7
1460
Multi-task Language UnderstandingMMLU
Accuracy63.4
842
Commonsense ReasoningPIQA
Accuracy80.5
647
Mathematical ReasoningGSM8K
EM53.8
115
Logical reasoningBBH
Accuracy51
93
Reading ComprehensionDROP
F1 Score53.9
55
Commonsense ReasoningCommonsense Reasoning Suite (test)
Avg Accuracy0.445
22
LLM PretrainingFineWeb-Edu (train)
Training Loss2.964
8
LLM PretrainingFineWeb-Edu (val)
BPB0.861
8
Downstream Performance EvaluationCORE
CORE Score16.023
8
Showing 10 of 11 rows

Other info

Follow for update