Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for Pixel-Level Semantic Segmentation

About

Preparing training data for deep vision models is a labor-intensive task. To address this, generative models have emerged as an effective solution for generating synthetic data. While current generative models produce image-level category labels, we propose a novel method for generating pixel-level semantic segmentation labels using the text-to-image generative model Stable Diffusion (SD). By utilizing the text prompts, cross-attention, and self-attention of SD, we introduce three new techniques: class-prompt appending, class-prompt cross-attention, and self-attention exponentiation. These techniques enable us to generate segmentation maps corresponding to synthetic images. These maps serve as pseudo-labels for training semantic segmenters, eliminating the need for labor-intensive pixel-wise annotation. To account for the imperfections in our pseudo-labels, we incorporate uncertainty regions into the segmentation, allowing us to disregard loss from those regions. We conduct evaluations on two datasets, PASCAL VOC and MSCOCO, and our approach significantly outperforms concurrent work. Our benchmarks and code will be released at https://github.com/VinAIResearch/Dataset-Diffusion

Quang Nguyen, Truong Vu, Anh Tran, Khoi Nguyen• 2023

Related benchmarks

Task	Dataset	Result
Semantic segmentation	PASCAL VOC 2012 (val)	Mean IoU79.9	2204
Semantic segmentation	PASCAL VOC 2012 (test)	mIoU79.8	1477
Semantic segmentation	Cityscapes (test)	mIoU64.4	1252
Semantic segmentation	PASCAL VOC (val)	mIoU46.85	380
Semantic segmentation	COCO 2017 (val)	mIoU34.2	66
Semantic segmentation	VOC	mIoU82.4	55
Dichotomous Image Segmentation	DIS5K (DIS-VD)	F_beta (Weighted)0.726	44
Face Parsing	CelebAMask-HQ	Nose Accuracy0.972	28
Semantic segmentation	VOC (val)	mIoU64.8	25
Semantic-level object discovery	VOC	mIoU64.8	19

Showing 10 of 19 rows

Other info

Code

Follow for update

@wizwand_team Discord