EDPC: Accelerating Lossless Compression via Lightweight Probability Models and Decoupled Parallel Dataflow
About
The explosive growth of multi-source multimedia data has significantly increased the demands for transmission and storage, placing substantial pressure on bandwidth and storage infrastructures. While Autoregressive Compression Models (ACMs) have markedly improved compression efficiency through probabilistic prediction, current approaches remain constrained by two critical limitations: suboptimal compression ratios due to insufficient fine-grained feature extraction during probability modeling, and real-time processing bottlenecks caused by high resource consumption and low compression speeds. To address these challenges, we propose Efficient Dual-path Parallel Compression (EDPC), a hierarchically optimized compression framework that synergistically enhances modeling capability and execution efficiency via coordinated dual-path operations. At the modeling level, we introduce the Information Flow Refinement (IFR) metric grounded in mutual information theory, and design a Multi-path Byte Refinement Block (MBRB) to strengthen cross-byte dependency modeling via heterogeneous feature propagation. At the system level, we develop a Latent Transformation Engine (LTE) for compact high-dimensional feature representation and a Decoupled Pipeline Compression Architecture (DPCA) to eliminate encoding-decoding latency through pipelined parallelization. Experimental results demonstrate that EDPC achieves comprehensive improvements over state-of-the-art methods, including a 2.7x faster compression speed, and a 3.2% higher compression ratio. These advancements establish EDPC as an efficient solution for real-time processing of large-scale multimedia data in bandwidth-constrained scenarios. Our code is available at https://github.com/Magie0/EDPC.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Lossless Data Compression | Enwik9 text | Compression Ratio6.176 | 11 | |
| Lossless Data Compression | LJSpeech | Compression Ratio1.879 | 11 | |
| Lossless Data Compression | TestImages image | Compression Ratio2.392 | 11 | |
| Lossless Data Compression | UVG video | Compression Ratio2.52 | 11 | |
| Lossless Data Compression | CESM float | Compression Ratio2.91 | 11 | |
| Lossless Data Compression | DNACorpus genome | Compression Ratio4.472 | 11 | |
| Lossless Data Compression | Silesia heterogeneous | Compression Ratio5.321 | 11 | |
| Lossless Data Compression | Silesia | Compression Throughput4.39e+3 | 7 |