DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

Wu, Weijia; Zhao, Yuzhong; Chen, Hao; Gu, Yuchao; Zhao, Rui; He, Yefei; Zhou, Hong; Shou, Mike Zheng; Shen, Chunhua

Computer Science > Computer Vision and Pattern Recognition

arXiv:2308.06160 (cs)

[Submitted on 11 Aug 2023 (v1), last revised 10 Oct 2023 (this version, v2)]

Title:DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

Authors:Weijia Wu, Yuzhong Zhao, Hao Chen, Yuchao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, Chunhua Shen

View PDF

Abstract:Current deep networks are very data-hungry and benefit from training on largescale datasets, which are often time-consuming to collect and annotate. By contrast, synthetic data can be generated infinitely using generative models such as DALL-E and diffusion models, with minimal effort and cost. In this paper, we present DatasetDM, a generic dataset generation model that can produce diverse synthetic images and the corresponding high-quality perception annotations (e.g., segmentation masks, and depth). Our method builds upon the pre-trained diffusion model and extends text-guided image synthesis to perception data generation. We show that the rich latent code of the diffusion model can be effectively decoded as accurate perception annotations using a decoder module. Training the decoder only needs less than 1% (around 100 images) manually labeled images, enabling the generation of an infinitely large annotated dataset. Then these synthetic data can be used for training various perception models for downstream tasks. To showcase the power of the proposed approach, we generate datasets with rich dense pixel-wise labels for a wide range of downstream tasks, including semantic segmentation, instance segmentation, and depth estimation. Notably, it achieves 1) state-of-the-art results on semantic segmentation and instance segmentation; 2) significantly more robust on domain generalization than using the real data alone; and state-of-the-art results in zero-shot segmentation setting; and 3) flexibility for efficient application and novel task composition (e.g., image editing). The project website and code can be found at this https URL and this https URL, respectively

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2308.06160 [cs.CV]
	(or arXiv:2308.06160v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2308.06160
Journal reference:	Proc. Advances In Neural Information Processing Systems (NeurIPS 2023)

Submission history

From: Weijia Wu [view email]
[v1] Fri, 11 Aug 2023 14:38:11 UTC (22,135 KB)
[v2] Tue, 10 Oct 2023 03:59:41 UTC (22,138 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators