Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

About

Current deep networks are very data-hungry and benefit from training on largescale datasets, which are often time-consuming to collect and annotate. By contrast, synthetic data can be generated infinitely using generative models such as DALL-E and diffusion models, with minimal effort and cost. In this paper, we present DatasetDM, a generic dataset generation model that can produce diverse synthetic images and the corresponding high-quality perception annotations (e.g., segmentation masks, and depth). Our method builds upon the pre-trained diffusion model and extends text-guided image synthesis to perception data generation. We show that the rich latent code of the diffusion model can be effectively decoded as accurate perception annotations using a decoder module. Training the decoder only needs less than 1% (around 100 images) manually labeled images, enabling the generation of an infinitely large annotated dataset. Then these synthetic data can be used for training various perception models for downstream tasks. To showcase the power of the proposed approach, we generate datasets with rich dense pixel-wise labels for a wide range of downstream tasks, including semantic segmentation, instance segmentation, and depth estimation. Notably, it achieves 1) state-of-the-art results on semantic segmentation and instance segmentation; 2) significantly more robust on domain generalization than using the real data alone; and state-of-the-art results in zero-shot segmentation setting; and 3) flexibility for efficient application and novel task composition (e.g., image editing). The project website and code can be found at https://weijiawu.github.io/DatasetDM_page/ and https://github.com/showlab/DatasetDM, respectively

Weijia Wu, Yuzhong Zhao, Hao Chen, Yuchao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, Chunhua Shen• 2023

Related benchmarks

TaskDatasetResultRank
Semantic segmentationPASCAL VOC 2012 (val)
Mean IoU78.5
2142
Semantic segmentationCityscapes (val)
mIoU80.45
374
Semantic segmentationVOC 2012 (val)
mIoU73.6
76
Semantic segmentationCityscapes v1 (test)
mIoU44.9
74
Dichotomous Image SegmentationDIS5K (DIS-VD)
S_alpha0.814
30
Semantic segmentationUrban-scene Domain Generalization Suite Cityscapes to ACDC, Dark Zurich, BDD100K, Mapillary Vistas (test val)
mIoU (ACDC)58.11
21
Dichotomous Image SegmentationDIS5K (DIS-TE3)
S_alpha0.848
16
Dichotomous Image SegmentationDIS5K (DIS-TE4)
S_alpha0.846
16
Dichotomous Image SegmentationDIS5K (DIS-TE1)
S_alpha79.1
16
Dichotomous Image SegmentationDIS5K (DIS-TE2)
S_alpha83.3
16
Showing 10 of 27 rows

Other info

Code

Follow for update