Shallow Diffusion for Fast Speech Enhancement (Student Abstract)

Authors

  • Yue Lei University of Electronic Science and Technology of China
  • Bin Chen University of Electronic Science and Technology of China
  • Wenxin Tai University of Electronic Science and Technology of China Kash Institute of Electronics and Information Industry
  • Ting Zhong University of Electronic Science and Technology of China Kash Institute of Electronics and Information Industry
  • Fan Zhou University of Electronic Science and Technology of China Kash Institute of Electronics and Information Industry

DOI:

https://doi.org/10.1609/aaai.v38i21.30471

Keywords:

Deep Learning, Speech Enhancement, Diffusion Model

Abstract

Recently, the field of Speech Enhancement has witnessed the success of diffusion-based generative models. However, these diffusion-based methods used to take multiple iterations to generate high-quality samples, leading to high computational costs and inefficiency. In this paper, we propose SDFEN (Shallow Diffusion for Fast spEech eNhancement), a novel approach for addressing the inefficiency problem while enhancing the quality of generated samples by reducing the iterative steps in the reverse process of diffusion method. Specifically, we introduce the shallow diffusion strategy initiating the reverse process with an adaptive time step to accelerate inference. In addition, a dedicated noisy predictor is further proposed to guide the adaptive selection of time step. Experiment results demonstrate the superiority of the proposed SDFEN in effectiveness and efficiency.

Published

2024-03-24

How to Cite

Lei, Y., Chen, B., Tai, W., Zhong, T., & Zhou, F. (2024). Shallow Diffusion for Fast Speech Enhancement (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 38(21), 23556-23558. https://doi.org/10.1609/aaai.v38i21.30471