Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HTMuon: Improving Muon via Heavy-Tailed Spectral Correction

About

Muon has recently shown promising results in LLM training. In this work, we study how to further improve Muon. We argue that Muon's orthogonalized update rule suppresses the emergence of heavy-tailed weight spectra and over-emphasizes the training along noise-dominated directions. Motivated by the Heavy-Tailed Self-Regularization (HT-SR) theory, we propose HTMuon. HTMuon preserves Muon's ability to capture parameter interdependencies while producing heavier-tailed updates and inducing heavier-tailed weight spectra. Experiments on LLM pretraining and image classification show that HTMuon consistently improves performance over state-of-the-art baselines and can also serve as a plug-in on top of existing Muon variants. For example, on LLaMA pretraining on the C4 dataset, HTMuon reduces perplexity by up to $0.98$ compared to Muon. We further theoretically show that HTMuon corresponds to steepest descent under the Schatten-$q$ norm constraint and provide convergence analysis in smooth non-convex settings. The implementation of HTMuon is available at https://github.com/TDCSZ327/HTmuon.

Tianyu Pang, Yujie Fang, Zihang Liu, Shenyang Deng, Lei Hsiung, Shuhua Yu, Yaoqing Yang• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingC4
Perplexity14.17
1422
Image ClassificationImageNet-1K
Top-1 Acc71.16
1239
Commonsense ReasoningPIQA
Accuracy66.59
751
Common Sense ReasoningBoolQ
Accuracy61.93
212
Commonsense ReasoningARC Challenge
Accuracy22.18
190
Image ClassificationCIFAR-10 (test)
Test Accuracy96.13
154
Commonsense ReasoningOBQA
Accuracy17.2
117
Image ClassificationCIFAR-100 (test)
Acc80.22
110
Commonsense ReasoningARC-E
Accuracy33.16
106
Language ModelingOpenWebText
Perplexity22.2
91
Showing 10 of 13 rows

Other info

Follow for update