Estimation of pitch period of speech signal using a new dyadic wavelet algorithm

https://doi.org/10.1016/S0020-0255(99)00055-9Get rights and content

Abstract

An algorithm based on dyadic wavelet transform (DyWT) has been developed for detecting pitch period. Pitch period is regarded as an important parameter in designing and developing automatic speaker recognition/identification systems. In this paper, we have developed two methods for detecting pitch period of synthetic signals. In the first method, we estimated the pitch period using the original signal by calculating reasonably accurate values for the dilation parameter L. Whereas, in the second method, pitch period was estimated from the power spectrum of the signal. Several experiments were performed, under noisy and ideal environmental conditions, to evaluate the accuracy and robustness of the proposed methodology. It was observed from the experiments that the proposed techniques were successful in estimating pitch periods. The best case accuracy in the estimation was found to be approximately 100%.

Introduction

The aim of speaker recognition system is to determine the identity of the speaker. In a generic speaker recognition system, the desired features are first extracted from the speech signal. The extracted features are then used as an input to another sub-system, which makes the decision regarding the verification or identification of the speaker. The process of feature extraction consists of extracting characteristic parameters of a signal to be used for speaker recognition. The extraction of salient features is one of the most important steps in solving the problem. In feature extraction, the goal is to extract those features that are invariant with regard to the speaker while maintaining its uniqueness from features of an imposter.

The traditional speaker identification techniques such as the autocorrelation or cepstrum-based methods, failed to provide an accurate results due to the wide range of variations present in the real speech signals [1]. Pitch period is considered an important parameter that can be used reliably for the identification of the speaker [2], [3]. Therefore, good performance of an automatic speaker recognition system is strongly related to the accuracy and reliability with which pitch periods for the speech signals can be detected and estimated.

The detection of the pitch period is the most important task in any automatic speech signal analysis. Once the pitch period has been identified a more detailed examination of speech signal can be performed. The algorithms for pitch detection can generally be divided into two categories: (1) event detection based algorithms, and (2) non-event detection based algorithms. The event detection method is based on calculating the autocorrelation function of a signal that exhibits the same periodicity as that of the signal to be analyzed. The disadvantage of the autocorrelation method is that it is unsuitable for non-stationary signals and is computationally complex. The non-event based pitch detectors are computationally simple as compared to the event based detectors, but are insensitive to pitch period variations during the measurement interval. They are also not suitable for wide range of speakers.

Since the glottal closure is marked by a sharp discontinuity in the speech signal, it can in some sense be related to the edge detection problem in image processing. A procedure for obtaining an optimal edge detector was provided by Canny [4]. In a subsequent work Mallat [5] has shown that the multiscale Canny edge detection is equivalent to finding the local maxima of a wavelet transform. Kadambe and Boudreaux-Bartels [6] recognized the similarity between edge detection in image processing and event based pitch detection in speech recognition. They developed a wavelet based scheme for pitch detection in speech recognition, and have shown that the wavelet based method is superior to the traditional pitch estimation techniques. Obaidat et al. [7] evaluated the performance of Gaussian Window (GW), first derivative Gaussian (DG), Modulated Gaussian (MG), and the one-sided exponential window (EW) wavelet. They observed that DG has the best estimation accuracy of the pitch period of speech signals. Wavelet transform is a very promising technique for time-frequency analysis. Wavelet transform maps a signal from its domain (time or spatial) to another domain using a set of special signals called wavelets (little waves) [8]. There are several different types of wavelets that have been applied to solve problems in many areas of science and engineering [9].

In this paper, we present an algorithm based on the Dyadic wavelet transform for detecting pitch period of synthetic speech signals under ideal and noisy conditions (see Fig. 1). Dyadic wavelet transform (DyWT) is a scale-discretized version of the continuous wavelet transform (CWT), which was previously used to analyze phonocardiogram signals [10], [11]. The next section of this paper contains the theory underlying the algorithmic steps and the criterion for estimating and selecting wavelet dilation parameters. Results and the relevant discussion are given in Section 3 of the paper. Finally, conclusions are presented in Section 4.

Section snippets

Wavelet transform

The wavelet of a signal f(n) is defined as:Wsf(n)=f(n)⊗Ψs(n)=1s−∞f(n)×Ψn−tsdtwhere s is scale factor for Ψs(n)=1/s×Ψ(n/s), which is the dilation of a basic wavelet Ψ(n) by the scale factor s (where s=2L).

We can write the function f(n) asf(n)=1c∫∫Wsf(n)×1s×Ψn−tsdtdss2the function Ψ(n) can be considered a wavelet if it satisfies the following admissibility condition∫Ψ(n)dn=0.In our work we consider wavelets with compact support and vanishing moment due to their many advantages.

Relation between the characteristic points and WT

According to the

Results and discussion

The DyWT was computed at scales L+1,L+2 and L+3 for both schemes. The first scheme estimates the pitch period using the original signal by calculating the values for the dilation parameter L at reasonable accuracy. The second scheme estimates the pitch period from the power spectral of the signal. We detected the instant at which the glottis closes by locating the local maxima of DyWT (local maxima exceeds the threshold, which is equal to 0.3×{globalmaxima}). We then estimated the pitch period

Conclusion

In this paper, we presented algorithms based on WT for the detection of pitch points of speech signal. The algorithms were evaluated on several synthetic speech signals. The performance of the algorithms were found to be excellent even in the noisy environment. It was observed that the DyWT for the power spectral speech signal provided accurate estimates of pitch period for the signal corrupted with by white Gaussian noise. Also, Our approach was found to be successful in calculating the scale L

Acknowledgments

This work was supported by a grant from the Department of Defense (DoD) under contract No. NSA-0-96-5.

References (11)

  • S. Kadambe, The application of time-frequency and time-scale representations in speech analysis, Ph.D. Thesis,...
  • W. Hess, Pitch Determination of Speech Signals: Algorithms and Devices, Springer, Berlin,...
  • L.R. Rabiner, R.W. Schafer, Digital Processing of Speech Signals, Prentice-Hall, Englewood Cliffs, NJ,...
  • J. Canny

    A computational approach to edge detection

    IEEE Trans. PAMI

    (1986)
  • S. Mallat et al.

    Characterization of signals from multiscale edges

    IEEE Trans. PAMI

    (1992)
There are more references available in the full text version of this article.

Cited by (18)

  • Intelligent speech processing in the time-frequency domain

    2019, Intelligent Speech Signal Processing
  • Advances in antibiotic nanotherapy

    2018, Emerging Nanotechnologies in Immunology: The Design, Applications and Toxicology of Nanopharmaceuticals and Nanovaccines
  • A fast algorithm for one-unit ICA-R

    2007, Information Sciences
  • Second generation wavelet transform-based pitch period estimation and voiced/unvoiced decision for speech signals

    2003, Applied Acoustics
    Citation Excerpt :

    A third difficulty in pitch detection is determination of beginning and end points of pitch period during voiced speech segments. The PDA based on classical wavelet transform (CWT) in literature [5–7] estimates the pitch period by determining the glottal closure instant (GCI) and measuring the time period between such two events because when a GCI occurs in a speech waveform, maximum occurrence in the adjacent scales of wavelet transform. However, construction of the CWT relies on the Fourier transform (FT) and needs clumsy mathematical operations.

View all citing articles on Scopus
View full text