On Disentangled Training for Nonlinear Transform in Learned Image Compression
About
Learned image compression (LIC) has demonstrated superior rate-distortion (R-D) performance compared to traditional codecs, but is challenged by training inefficiency that could incur more than two weeks to train a state-of-the-art model from scratch. Existing LIC methods overlook the slow convergence caused by compacting energy in learning nonlinear transforms. In this paper, we first reveal that such energy compaction consists of two components, i.e., feature decorrelation and uneven energy modulation. On such basis, we propose a linear auxiliary transform (AuxT) to disentangle energy compaction in training nonlinear transforms. The proposed AuxT obtains coarse approximation to achieve efficient energy compaction such that distribution fitting with the nonlinear transforms can be simplified to fine details. We then develop wavelet-based linear shortcuts (WLSs) for AuxT that leverages wavelet-based downsampling and orthogonal linear projection for feature decorrelation and subband-aware scaling for
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Compression | Kodak | -- | 50 | |
| Image Compression | Tecnick | -- | 36 | |
| Image Compression | CLIC 2022 | BD-Rate-1.66 | 6 |