Estimation of pitch period of speech signal using a new dyadic wavelet algorithm
Introduction
The aim of speaker recognition system is to determine the identity of the speaker. In a generic speaker recognition system, the desired features are first extracted from the speech signal. The extracted features are then used as an input to another sub-system, which makes the decision regarding the verification or identification of the speaker. The process of feature extraction consists of extracting characteristic parameters of a signal to be used for speaker recognition. The extraction of salient features is one of the most important steps in solving the problem. In feature extraction, the goal is to extract those features that are invariant with regard to the speaker while maintaining its uniqueness from features of an imposter.
The traditional speaker identification techniques such as the autocorrelation or cepstrum-based methods, failed to provide an accurate results due to the wide range of variations present in the real speech signals [1]. Pitch period is considered an important parameter that can be used reliably for the identification of the speaker [2], [3]. Therefore, good performance of an automatic speaker recognition system is strongly related to the accuracy and reliability with which pitch periods for the speech signals can be detected and estimated.
The detection of the pitch period is the most important task in any automatic speech signal analysis. Once the pitch period has been identified a more detailed examination of speech signal can be performed. The algorithms for pitch detection can generally be divided into two categories: (1) event detection based algorithms, and (2) non-event detection based algorithms. The event detection method is based on calculating the autocorrelation function of a signal that exhibits the same periodicity as that of the signal to be analyzed. The disadvantage of the autocorrelation method is that it is unsuitable for non-stationary signals and is computationally complex. The non-event based pitch detectors are computationally simple as compared to the event based detectors, but are insensitive to pitch period variations during the measurement interval. They are also not suitable for wide range of speakers.
Since the glottal closure is marked by a sharp discontinuity in the speech signal, it can in some sense be related to the edge detection problem in image processing. A procedure for obtaining an optimal edge detector was provided by Canny [4]. In a subsequent work Mallat [5] has shown that the multiscale Canny edge detection is equivalent to finding the local maxima of a wavelet transform. Kadambe and Boudreaux-Bartels [6] recognized the similarity between edge detection in image processing and event based pitch detection in speech recognition. They developed a wavelet based scheme for pitch detection in speech recognition, and have shown that the wavelet based method is superior to the traditional pitch estimation techniques. Obaidat et al. [7] evaluated the performance of Gaussian Window (GW), first derivative Gaussian (DG), Modulated Gaussian (MG), and the one-sided exponential window (EW) wavelet. They observed that DG has the best estimation accuracy of the pitch period of speech signals. Wavelet transform is a very promising technique for time-frequency analysis. Wavelet transform maps a signal from its domain (time or spatial) to another domain using a set of special signals called wavelets (little waves) [8]. There are several different types of wavelets that have been applied to solve problems in many areas of science and engineering [9].
In this paper, we present an algorithm based on the Dyadic wavelet transform for detecting pitch period of synthetic speech signals under ideal and noisy conditions (see Fig. 1). Dyadic wavelet transform (DyWT) is a scale-discretized version of the continuous wavelet transform (CWT), which was previously used to analyze phonocardiogram signals [10], [11]. The next section of this paper contains the theory underlying the algorithmic steps and the criterion for estimating and selecting wavelet dilation parameters. Results and the relevant discussion are given in Section 3 of the paper. Finally, conclusions are presented in Section 4.
Section snippets
Wavelet transform
The wavelet of a signal is defined as:where s is scale factor for , which is the dilation of a basic wavelet by the scale factor s (where ).
We can write the function asthe function can be considered a wavelet if it satisfies the following admissibility conditionIn our work we consider wavelets with compact support and vanishing moment due to their many advantages.
Relation between the characteristic points and WT
According to the
Results and discussion
The DyWT was computed at scales and for both schemes. The first scheme estimates the pitch period using the original signal by calculating the values for the dilation parameter L at reasonable accuracy. The second scheme estimates the pitch period from the power spectral of the signal. We detected the instant at which the glottis closes by locating the local maxima of DyWT (local maxima exceeds the threshold, which is equal to ). We then estimated the pitch period
Conclusion
In this paper, we presented algorithms based on WT for the detection of pitch points of speech signal. The algorithms were evaluated on several synthetic speech signals. The performance of the algorithms were found to be excellent even in the noisy environment. It was observed that the DyWT for the power spectral speech signal provided accurate estimates of pitch period for the signal corrupted with by white Gaussian noise. Also, Our approach was found to be successful in calculating the scale L
Acknowledgments
This work was supported by a grant from the Department of Defense (DoD) under contract No. NSA-0-96-5.
References (11)
- S. Kadambe, The application of time-frequency and time-scale representations in speech analysis, Ph.D. Thesis,...
- W. Hess, Pitch Determination of Speech Signals: Algorithms and Devices, Springer, Berlin,...
- L.R. Rabiner, R.W. Schafer, Digital Processing of Speech Signals, Prentice-Hall, Englewood Cliffs, NJ,...
A computational approach to edge detection
IEEE Trans. PAMI
(1986)- et al.
Characterization of signals from multiscale edges
IEEE Trans. PAMI
(1992)
Cited by (18)
Intelligent speech processing in the time-frequency domain
2019, Intelligent Speech Signal ProcessingAdvances in antibiotic nanotherapy
2018, Emerging Nanotechnologies in Immunology: The Design, Applications and Toxicology of Nanopharmaceuticals and NanovaccinesThe application of the Hilbert spectrum to the analysis of electromyographic signals
2008, Information SciencesA fast algorithm for one-unit ICA-R
2007, Information SciencesSecond generation wavelet transform-based pitch period estimation and voiced/unvoiced decision for speech signals
2003, Applied AcousticsCitation Excerpt :A third difficulty in pitch detection is determination of beginning and end points of pitch period during voiced speech segments. The PDA based on classical wavelet transform (CWT) in literature [5–7] estimates the pitch period by determining the glottal closure instant (GCI) and measuring the time period between such two events because when a GCI occurs in a speech waveform, maximum occurrence in the adjacent scales of wavelet transform. However, construction of the CWT relies on the Fourier transform (FT) and needs clumsy mathematical operations.
Quantitative assessment of the use of continuous wavelet transform in the analysis of the fundamental frequency disturbance of the synthetic voice
2002, Medical Engineering and Physics