Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

COMPOT: Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers Compression

About

Post-training compression of Transformer models commonly relies on truncated singular value decomposition (SVD). However, enforcing a single shared subspace can degrade accuracy even at moderate compression. Sparse dictionary learning provides a more flexible union-of-subspaces representation, but existing approaches often suffer from iterative dictionary and coefficient updates. We propose COMPOT (Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers), a training-free compression framework that uses a small calibration dataset to estimate a sparse weight factorization. COMPOT employs orthogonal dictionaries that enable closed-form Procrustes updates for the dictionary and analytical single-step sparse coding for the coefficients, eliminating iterative optimization. To handle heterogeneous layer sensitivity under a global compression budget, COMPOT further introduces a one-shot dynamic allocation strategy that adaptively redistributes layer-wise compression rates. Extensive experiments across diverse architectures and tasks show that COMPOT consistently delivers a superior quality-compression trade-off over strong low-rank and sparse baselines, while remaining fully compatible with post-training quantization for extreme compression. Code is available $\href{https://github.com/mts-ai/COMPOT}{here}$.

Denis Makhov, Dmitriy Shopkhoev, Magauiya Zhussip, Ammar Ali, Baher Mohammad, Stamatios Lefkimmiatis• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2 (test)
PPL5
2333
Language ModelingWikiText-2
Perplexity (PPL)6.22
2320
Language ModelingC4
Perplexity9.34
1565
Automatic Speech RecognitionLibriSpeech clean (test)
WER2.46
1207
Automatic Speech RecognitionLibriSpeech (test-other)
WER4.51
1206
Language ModelingWikiText
PPL12
740
Physical Commonsense ReasoningPIQA
Accuracy62.2
696
Question AnsweringARC-E
Accuracy42.9
523
Question AnsweringPIQA
Accuracy78
505
Optical Character RecognitionOCRBench
Score0.669
433
Showing 10 of 35 rows

Other info

Follow for update