Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

Guo, Lanqing; He, Yingqing; Chen, Haoxin; Xia, Menghan; Cun, Xiaodong; Wang, Yufei; Huang, Siyu; Zhang, Yong; Wang, Xintao; Chen, Qifeng; Shan, Ying; Wen, Bihan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2402.10491 (cs)

[Submitted on 16 Feb 2024]

Title:Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

Authors:Lanqing Guo, Yingqing He, Haoxin Chen, Menghan Xia, Xiaodong Cun, Yufei Wang, Siyu Huang, Yong Zhang, Xintao Wang, Qifeng Chen, Ying Shan, Bihan Wen

View PDF HTML (experimental)

Abstract:Diffusion models have proven to be highly effective in image and video generation; however, they still face composition challenges when generating images of varying sizes due to single-scale training data. Adapting large pre-trained diffusion models for higher resolution demands substantial computational and optimization resources, yet achieving a generation capability comparable to low-resolution models remains elusive. This paper proposes a novel self-cascade diffusion model that leverages the rich knowledge gained from a well-trained low-resolution model for rapid adaptation to higher-resolution image and video generation, employing either tuning-free or cheap upsampler tuning paradigms. Integrating a sequence of multi-scale upsampler modules, the self-cascade diffusion model can efficiently adapt to a higher resolution, preserving the original composition and generation capabilities. We further propose a pivot-guided noise re-schedule strategy to speed up the inference process and improve local structural details. Compared to full fine-tuning, our approach achieves a 5X training speed-up and requires only an additional 0.002M tuning parameters. Extensive experiments demonstrate that our approach can quickly adapt to higher resolution image and video synthesis by fine-tuning for just 10k steps, with virtually no additional inference time.

Comments:	Project Page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2402.10491 [cs.CV]
	(or arXiv:2402.10491v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2402.10491

Submission history

From: Lanqing Guo [view email]
[v1] Fri, 16 Feb 2024 07:48:35 UTC (10,457 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators