Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MaskBit: Embedding-free Image Generation via Bit Tokens

About

Masked transformer models for class-conditional image generation have become a compelling alternative to diffusion models. Typically comprising two stages - an initial VQGAN model for transitioning between latent space and image space, and a subsequent Transformer model for image generation within latent space - these frameworks offer promising avenues for image synthesis. In this study, we present two primary contributions: Firstly, an empirical and systematic examination of VQGANs, leading to a modernized VQGAN. Secondly, a novel embedding-free generation network operating directly on bit tokens - a binary quantized representation of tokens with rich semantics. The first contribution furnishes a transparent, reproducible, and high-performing VQGAN model, enhancing accessibility and matching the performance of current state-of-the-art methods while revealing previously undisclosed details. The second contribution demonstrates that embedding-free image generation using bit tokens achieves a new state-of-the-art FID of 1.52 on the ImageNet 256x256 benchmark, with a compact generator model of mere 305M parameters. The code for this project is available on https://github.com/markweberdev/maskbit.

Mark Weber, Lijun Yu, Qihang Yu, Xueqing Deng, Xiaohui Shen, Daniel Cremers, Liang-Chieh Chen• 2024

Related benchmarks

TaskDatasetResultRank
Class-conditional Image GenerationImageNet 256x256
Inception Score (IS)341.8
441
Image GenerationImageNet 256x256 (val)
FID1.52
307
Class-conditional Image GenerationImageNet 256x256 (train)
IS328.6
305
Image ReconstructionImageNet 256x256
rFID1.51
93
Image GenerationImageNet-1K 256x256 (val)
Inception Score328.6
85
Image GenerationImageNet
FID1.52
68
Image GenerationImageNet 256x256 (test)
FID1.52
46
Image GenerationImageNet 256x256 (test val)
FID6.18
35
Class-conditional Image GenerationImageNet 256x256 2012 (train val)--
30
Image ReconstructionCOCO 2014 (val)
rFID8.3
3
Showing 10 of 10 rows

Other info

Code

Follow for update