COMPOT: Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers Compression
About
Post-training compression of Transformer models commonly relies on truncated singular value decomposition (SVD). However, enforcing a single shared subspace can degrade accuracy even at moderate compression. Sparse dictionary learning provides a more flexible union-of-subspaces representation, but existing approaches often suffer from iterative dictionary and coefficient updates. We propose COMPOT (Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers), a training-free compression framework that uses a small calibration dataset to estimate a sparse weight factorization. COMPOT employs orthogonal dictionaries that enable closed-form Procrustes updates for the dictionary and analytical single-step sparse coding for the coefficients, eliminating iterative optimization. To handle heterogeneous layer sensitivity under a global compression budget, COMPOT further introduces a one-shot dynamic allocation strategy that adaptively redistributes layer-wise compression rates. Extensive experiments across diverse architectures and tasks show that COMPOT consistently delivers a superior quality-compression trade-off over strong low-rank and sparse baselines, while remaining fully compatible with post-training quantization for extreme compression. Code is available $\href{https://github.com/mts-ai/COMPOT}{here}$.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText-2 (test) | PPL5 | 2333 | |
| Language Modeling | WikiText-2 | Perplexity (PPL)6.22 | 2320 | |
| Language Modeling | C4 | Perplexity9.34 | 1565 | |
| Automatic Speech Recognition | LibriSpeech clean (test) | WER2.46 | 1207 | |
| Automatic Speech Recognition | LibriSpeech (test-other) | WER4.51 | 1206 | |
| Language Modeling | WikiText | PPL12 | 740 | |
| Physical Commonsense Reasoning | PIQA | Accuracy62.2 | 696 | |
| Question Answering | ARC-E | Accuracy42.9 | 523 | |
| Question Answering | PIQA | Accuracy78 | 505 | |
| Optical Character Recognition | OCRBench | Score0.669 | 433 |