Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression

About

Current image tokenization methods require a large number of tokens to capture the information contained within images. Although the amount of information varies across images, most image tokenizers only support fixed-length tokenization, leading to inefficiency in token allocation. In this study, we introduce One-D-Piece, a discrete image tokenizer designed for variable-length tokenization, achieving quality-controllable mechanism. To enable variable compression rate, we introduce a simple but effective regularization mechanism named "Tail Token Drop" into discrete one-dimensional image tokenizers. This method encourages critical information to concentrate at the head of the token sequence, enabling support of variadic tokenization, while preserving state-of-the-art reconstruction quality. We evaluate our tokenizer across multiple reconstruction quality metrics and find that it delivers significantly better perceptual quality than existing quality-controllable compression methods, including JPEG and WebP, at smaller byte sizes. Furthermore, we assess our tokenizer on various downstream computer vision tasks, including image classification, object detection, semantic segmentation, and depth estimation, confirming its adaptability to numerous applications compared to other variable-rate methods. Our approach demonstrates the versatility of variable-length discrete image tokenization, establishing a new paradigm in both compression efficiency and reconstruction performance. Finally, we validate the effectiveness of tail token drop via detailed analysis of tokenizers.

Keita Miwa, Kento Sasaki, Hidehisa Arai, Tsubasa Takahashi, Yu Yamaguchi• 2025

Related benchmarks

TaskDatasetResultRank
Semantic segmentationADE20K (val)
mIoU2.9
3069
Semantic segmentationADE20K
mIoU2.9
559
Semantic segmentationCityscapes (val)
mIoU19.9
527
Semantic segmentationCityscapes
mIoU19.9
494
Class-conditional Image GenerationImageNet 256x256 (train)
IS231.7
367
Semantic segmentationPascal VOC
mIoU0.169
280
Image ReconstructionImageNet 256x256
rFID1.08
202
Semantic segmentationPASCAL VOC 2012 (val)
mIoU16.9
166
Image ReconstructionImageNet1K (val)
FID1.08
124
Image ReconstructionImageNet-1k 256 x 256 (val)
rFID1.08
112
Showing 10 of 18 rows

Other info

Follow for update