Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Muon-OGD: Muon-based Spectral Orthogonal Gradient Projection for LLM Continual Learning

About

A central challenge in continual learning for large language models (LLMs) is catastrophic forgetting, where adapting to new tasks can substantially degrade performance on previously learned ones. Existing projection-based methods mitigate such interference by restricting parameter updates to subspaces that are orthogonal to directions associated with past tasks. However, these methods are typically formulated under Euclidean parameter geometry, with update magnitudes and projections governed by the Frobenius norm. The recent empirical success of the Muon optimizer, which applies orthogonalized matrix updates and admits a spectral-norm interpretation, suggests that Frobenius geometry may not be the most effective choice for matrix-valued LLM parameters. Motivated by this observation, we propose Muon-OGD, a spectral-norm-aware continual learning framework that integrates Muon-style operator-norm geometry with orthogonal projection constraints. Our method formulates each update as a spectral-norm-constrained optimization problem with linear non-interference constraints, and solves it efficiently through dual iterations and Newton--Schulz matrix-sign approximations. By applying orthogonalized momentum updates that avoid protected directions associated with prior tasks, Muon-OGD aims to improve the stability--plasticity trade-off in sequential LLM adaptation. We evaluate the proposed method on standard continual learning benchmarks, TRACE, and domain-specific Coding--Math--Medical curricula using both encoder--decoder and decoder-only architectures. Empirically, Muon-OGD consistently improves over sequential fine-tuning and competitive orthogonal-gradient baselines, while remaining computationally scalable. These results suggest that spectral-norm-aware update geometry provides a practical and effective alternative to Frobenius-norm projection for continual learning in LLMs.

Binghang Lu, Zheyuan Deng, Runyu Zhang, Bing Hu, Yunhan Zhao, Yuan Tian, Changhong Mou, Guang Lin, Xiaomin Li• 2026

Related benchmarks

TaskDatasetResultRank
Continual LearningStandard CL Benchmark
Avg Final Acc0.789
71
Continual LearningContinual Learning Benchmark 15-Task
Average Accuracy72
28
Continual LearningCurriculum Coding -> Math -> Medical
Code Score32.9
24
Continual LearningContinual learning sequential three-stage curriculum Coding → Math → Medical
Accuracy (Coding 800 Q)28.3
12
Continual LearningSequential three-stage curriculum (Coding -> Math -> Medical) - Stage A (Coding)
Coding Stage Score40.8
8
Instruction FollowingTRACE
AA49.4
7
Continual LearningSequential three-stage curriculum Coding -> Math -> Medical Stage C
Coding Accuracy (Stage C)38.1
4
Continual LearningSequential Curriculum Coding → Math → Medical Stage C LLaMA3.2-1B-instruct (test)
Coding Accuracy19.7
4
Continual LearningSequential Curriculum Coding → Math → Medical Stage B LLaMA3.2-1B-instruct (test)
Coding Score21.7
4
Continual LearningSequential three-stage curriculum (Coding -> Math -> Medical) Stage B (Math)
Coding Accuracy38.2
4
Showing 10 of 10 rows

Other info

Follow for update