Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DiC: Rethinking Conv3x3 Designs in Diffusion Models

About

Diffusion models have shown exceptional performance in visual generation tasks. Recently, these models have shifted from traditional U-Shaped CNN-Attention hybrid structures to fully transformer-based isotropic architectures. While these transformers exhibit strong scalability and performance, their reliance on complicated self-attention operation results in slow inference speeds. Contrary to these works, we rethink one of the simplest yet fastest module in deep learning, 3x3 Convolution, to construct a scaled-up purely convolutional diffusion model. We first discover that an Encoder-Decoder Hourglass design outperforms scalable isotropic architectures for Conv3x3, but still under-performing our expectation. Further improving the architecture, we introduce sparse skip connections to reduce redundancy and improve scalability. Based on the architecture, we introduce conditioning improvements including stage-specific embeddings, mid-block condition injection, and conditional gating. These improvements lead to our proposed Diffusion CNN (DiC), which serves as a swift yet competitive diffusion architecture baseline. Experiments on various scales and settings show that DiC surpasses existing diffusion transformers by considerable margins in terms of performance while keeping a good speed advantage. Project page: https://github.com/YuchuanTian/DiC

Yuchuan Tian, Jing Han, Chengcheng Wang, Yuchen Liang, Chao Xu, Hanting Chen• 2024

Related benchmarks

TaskDatasetResultRank
Class-conditional Image GenerationImageNet 256x256
Inception Score (IS)124.3
815
Image GenerationImageNet 256x256--
359
Image GenerationImageNet 512x512
IS101.8
62
Class-conditional generationImageNet 512x512 (test)
FID12.89
21
Conditional Image GenerationImageNet 256x256 2012 400K iterations
FID3.89
9
Image GenerationImageNet 256x256 400K iterations (test)
FID11.36
7
Image SynthesisImageNet 512x512 (train)
FID2.96
3
Showing 7 of 7 rows

Other info

Code

Follow for update