Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Benchmarking Language Modeling for Lossless Compression of Full-Fidelity Audio

About

Autoregressive "language" models (LMs) trained on raw waveforms can be repurposed for lossless audio compression, but prior work is limited to 8-bit audio, leaving open whether such approaches work for practical settings (16/24-bit) and can compete with existing codecs. We benchmark LM-based compression on full-fidelity audio across diverse domains (music, speech, bioacoustics), sampling rates (16kHz-48kHz), and bit depths (8, 16, 24-bit). Standard sample-level tokenization becomes intractable at higher bit depths due to vocabulary size (65K for 16-bit; 16.7M for 24-bit). We propose Trilobyte, a byte-level tokenization schema for full resolution audio, improving vocabulary scaling from $O(2^{b})$ to $O(1)$ and enabling the first tractable 24-bit LM-based lossless compression. While LMs consistently outperform FLAC and yield state-of-the-art compression at 8-bit and 16-bit, we observe that compression gains become more modest as bit depth increases beyond 8-bit.

Phillip Long, Zachary Novack, Chris Donahue• 2026

Related benchmarks

TaskDatasetResultRank
Lossless Audio CompressionSC09 8-bit
Compression Rate2.88
5
Lossless Audio CompressionYouTube Mix 8-bit
Compression Rate5.14
5
Lossless Audio CompressionVCTK 16-bit
Compression Rate2.68
5
Lossless Audio CompressionLJSpeech 16-bit
Compression Rate2.08
5
Lossless Audio CompressionLibriSpeech 16-bit
Compression Rate2.11
5
Lossless Audio CompressionBirdvox 16-bit
Compression Rate2.48
5
Lossless Audio CompressionEpidemic Sound 16-bit
Compression Rate3.4
5
Lossless Audio CompressionMusDB18 All 16-bit
Compression Rate2.82
5
Lossless Audio CompressionMusDB18 Mixes 16-bit
Compression Rate2.08
5
Lossless Audio CompressionCommercial 16-bit
Compression Rate1.86
5
Showing 10 of 12 rows

Other info

Follow for update