Alternating minimisation for glottal inverse filtering

A new method is proposed for solving the glottal inverse filtering (GIF) problem. The goal of GIF is to separate an acoustical speech signal into two parts: the glottal airflow excitation and the vocal tract filter. To recover such information one has to deal with a blind deconvolution problem. This ill-posed inverse problem is solved under a deterministic setting, considering unknowns on both sides of the underlying operator equation. A stable reconstruction is obtained using a double regularization strategy, alternating between fixing either the glottal source signal or the vocal tract filter. This enables not only splitting the nonlinear and nonconvex problem into two linear and convex problems, but also allows the use of the best parameters and constraints to recover each variable at a time. This new technique, called alternating minimization glottal inverse filtering (AM-GIF), is compared with two other approaches: Markov chain Monte Carlo glottal inverse filtering (MCMC-GIF), and iterative adaptive inverse filtering (IAIF), using synthetic speech signals. The recent MCMC-GIF has good reconstruction quality but high computational cost. The state-of-the-art IAIF method is computationally fast but its accuracy deteriorates, particularly for speech signals of high fundamental frequency (F0). The results show the competitive performance of the new method: With high F0, the reconstruction quality is better than that of IAIF and close to MCMC-GIF while reducing the computational complexity by two orders of magnitude.


Introduction
We study the estimation and parametrization of the voice source, the excitation waveform of speech sounds produced by the vocal folds in the human larynx. This origin of speech is called the glottal flow, named after the orifice between the vibrating vocal folds, the glottis [16]. We rely on source-filter theory [31] as the mathematical model of human voice production. Briefly, a speech signal, denoted by m(t), can be represented as a convolution of three functions. The first corresponds to the glottal flow, here denoted by g(t). The second is a vocal tract filter function modeling the effects caused by the oral cavity between the vocal folds and the lips, denoted in the time domain by f(t). The third function corresponds to the lip radiation effect, the acoustical conversion of the air flow at the lips into a free-field pressure signal. Since the latter of the three functions can be estimated as a time derivative [31] and can be combined into the first function, the direct problem is formulated through the convolution equation where t represents the time variable.
Our study focuses on the estimation of the glottal flow using a computational inversion methodology called glottal inverse filtering (GIF). The aim of GIF is to remove the effects of the vocal tract and lip radiation from the speech signal, revealing the time-domain waveform of the glottal source. GIF can be seen as the inverse problem related to (1): Given the measured speech signal waveform m(t), find both f(t) and g (t) = p(t) so that for t ∈ [0, 1]. Here δ models additive noise arising from, for instance, the recording environment. Such blind deconvolution problems are known to be ill-posed, or highly sensitive to modeling errors and measurement noise. Therefore, regularization is needed for successful inversion. GIF has been used in several areas of speech science such as in studying vocal emotions and speech prosody as well as in analyzing pathological speech and singing voices (for a review of GIF, see [4]). In the past five years, GIF has also awakened increased interest in the speech technology community, particularly among scientists developing text-to-speech (TTS) synthesis technologies. Recent studies suggest that the naturalness of TTS systems can be improved by using sound excitations that have been computed from natural speech via GIF [33]. The improved naturalness of TTS in turn can help, for example, disabled people who are not capable of preserving natural vocalization due to a disease, accident, or speech disorder. An example is the world-famous physicist Stephen Hawking, who cannot speak without the help of speech synthesis software. The development of new TTS technology would benefit from computation of realistic glottal excitation signals via GIF.
GIF has been studied in the speech science community dating back to 1959; see overviews in [4,39,11], and further references in PhD theses [32,38]. Within the mathematical inverse problems community, the first GIF study [35] appeared in 1970. This study was based on using Webster's horn equation to investigate the shape and the cross-sectional area of the vocal tract. In 1986, a GIF technique was suggested in [29] to study the glottal source for both male and female voices. The topic was revived in the 2000's in [1,2,17,18,21,22], based on the Schrödinger equation, Klein-Gordon equation and various deterministic computational approaches. Uniqueness of GIF is studied in [25]. The first Bayesian inversion approach to GIF is described in [6].
State-of-the-art GIF technologies, such as iterative adaptive inverse filtering (IAIF) [3], are far too simple to yield reliable glottal flow estimates particularly for speech of high F0 (e.g. for utterances spoken by women or children) or for voices conveying extreme emotions such as anger and hate. Bayesian statistical inversion, however, was shown to improve the estimation accuracy of the glottal flow in a recent study proposing a new GIF method called Markov chain Monte Carlo glottal inverse filtering (MCMC-GIF) [6]. However, the computational cost of MCMC-GIF is high due to extensive Markov chain Monte Carlo sampling.
In the current study, we introduce a deterministic approach called alternating minimization glottal inverse filtering (AM-GIF), based on the general analysis published in [8,9]. It requires an initial estimate p (0) for the derivative of the glottal flow signal, and it solves the inverse problem iteratively as follows: (a) given m(t) and a fixed p (k) , find a regularized solution f (k+1) for (2); (b) given m(t) and a fixed f (k+1) , find a regularized solution p (k+1) for (2).
The proposed AM-GIF method matches closely the reconstruction quality of MCMC-GIF while reducing the computational cost significantly.
We study the new method under ideal simulated conditions. The signals considered have precise periodicity, and we deal with boundary conditions in a mathematically convenient way. Natural speech signals are not so regular, and additional steps are needed for applying our results in practice. However, we expect the core properties of the alternating minimization algorithm to carry over to real-world signals.
This paper is organized as follows. In section 2, we present an overview of the sourcefilter theory, including the main concepts and two mathematical models for the source signal. Section 2 is aimed at readers not familiar with the topic and can be omitted by other readers. The inverse problem is defined in section 3, with a description of the novel AM-GIF approach. Moreover, section 3 explains two baseline methods, IAIF and MCMC-GIF. Numerical experiments and comparisons are presented in section 4. Finally, the conclusions of the study are drawn in section 5.

Source-filter theory
Speech carries information about phonemes, the basic units of spoken language. Among phonemes, the current study, like almost all previous GIF investigations, focuses on non-nasalized vowels. This category of speech sounds has been prevalent in GIF studies for two reasons: non-nasalized vowels are always voiced (i.e. generated by the vibration of the vocal folds) and their vocal tract lacks coupling to the nasal tract, thereby justifying the use of all-pole type of models [27] for the vocal tract.

Source-filter theory in the frequency domain
According to source-filter theory [13,36], speech can be interpreted as a linear combination of three processes: where S(z), U(z), V(z), and L(z) denote the z-transforms of the measured speech signal, glottal flow, vocal tract, and the lip radiation effect, respectively.
This linear model has been widely used due its relative simplicity, for both speech synthesis and analysis. Source-filter theory describes, as its name suggests, speech as a two-stage process consisting of the sound source, which is filtered by a filter function.
The output speech waveform S is often described in the complex frequency domain, as stated in the previous equation. Furthermore, it is possible to combine two or more filters into a single one by, for example, representing the vocal tract filter as V(z) = P(z)O(z), where P and O are the transfer functions at the pharynx and at the oral cavity respectively. In this article, the vocal tract is treated simply as a single transfer function, V.
The speech signal is in practice recorded using a free-field microphone that measures the pressure signal, not the air flow. According to [16, p 259], the acoustical conversion from the flow at the lips into a pressure signal in the free field, the so-called lip radiation effect, can be approximated at low frequencies to correspond to a fixed first-order derivative in the time domain. In the frequency domain, this derivative can be modeled by an FIR (finite impulse response) filter with a single zero where 0.96 α < 1. Moreover, it is often useful to combine L and U into a single filter, even though the lip radiation effect occurs physically at the lips when the flow signal has already passed through the vocal tract. This is equivalent to applying the lip radiation effect to the glottal flow by differentiating the glottal flow U [16]. Let us denote the differentiation process as G (z) = L(z)U(z) and let us refer the process output as the glottal flow derivative.
Finally, we can display the two-stage mathematical model as our direct problem where G is the derivative of the glottal flow U. Note also the close relation to equation (1) introduced in the time domain.

Source models
Anatomically speaking, the lungs power an airstream outwards, pushing the air through the narrow opening between the vocal folds, called the glottis, producing puffs of air. The vocal folds are composed of twin infoldings of mucous membrane, stretched horizontally, which vibrate 3 , and therefore the puffs of air are produced pseudo-periodically in time.
Mathematically speaking, we are interested in a function g : [0, 1] → R + that models the volume of air traveling through the glottis at a specific time t, technically known as the glottal flow or the glottal source. Several parametric models of the glottal flow have been proposed in the literature, and this article discusses two of them: the Rosenberg-Klatt (RK) model and the Liljencrants-Fant (LF) model.
The Rosenberg-Klatt model is a straightforward glottal flow model. It models the shape of the glottal airflow signal within one fundamental period using a cubic polynomial function that is defined by one time-domain parameter, in addition to the length of the glottal cycle, and an amplitude-domain scaling factor [23]. This model was originally proposed by Rosenberg [34] and later modified by Klatt who also used the parametric waveform as an excitation in speech synthesis [23].
An example of the glottal flow and its derivative, generated by the RK model, is shown in figure 1. Note that the parametric flow pulse consists of two distinctive phases: the open phase (OP) and the closed phase (CP). The former represents the smooth increase of airflow as the membranes of the vocal folds open up (from bottom to top) and a faster decrease of the volume of air when the vocal folds close, completing one cycle. The latter represents a phase when the glottis is completely closed and there is no air passing through the vocal folds.
The Liljencrants-Fant model requires four parameters in addition to the length of the glottal cycle to create the shape of a glottal flow derivative [15]. It is one of the most widely used parametric time-domain models of the glottal source.
Compared to the RK model, the LF model describes different phases of the glottal cycle more in detail, introducing the so-called return phase, a phase between the instants of maximum closing discontinuity and glottal closure. Figure 2 describes the LF model in different segments: t p is the instant of the maximum airflow (zero derivative), t e is the instant of the maximum excitation (with amplitude E e ) or the instant when the vocal folds collide, t a is the length of the interval [t e , t e + t a ] that measures the effective duration of the return phase, and t c is the instant when the vocal folds reach the maximum closure and the airflow is reduced to its minimum. The interval before t e is the OP, between t e and t c is the return phase, and the section between t c and the end of the cycle is the CP.
A detailed definition of the LF model as a time-domain function can be found in [14]. Fitting a given glottal flow with the LF model is not straightforward since it requires solving nonlinear equations.

The vocal tract as a filter
The vocal tract consists of cavities that have a strong perceptual effect on the produced speech sound. The vocal tract comprises the oral and nasal cavities that, together with the articulators-such as the tongue, soft palate, and lips-filter the produced glottal flow into its final form as an acoustical speech signal that we recognize as a certain phoneme. Mathematically, sounds are usually represented through a waveform (time-pressure) or spectrogram (timefrequency). The latter enables studying the locations of the formants, the spectral peaks of the sound, caused by the vocal tract resonances [12].
The vocal tract can be modeled as a linear time-invariant (LTI) system, defined in the complex z-domain using a transfer function H(z): Note that the roots of the numerator polynomial B are also the roots of H and they correspond to the antiformants (zeros), whereas the roots of the denominator polynomial A correspond to the formants (poles). Figure 3 shows the relation between the poles of the transfer function inside the unit disc and the corresponding formant locations. In general, the pole-zero transfer function H is simplified in speech technology into an all-pole filter, that is, a filter for which M = 0 (i.e. the transfer function has only poles in the z-domain outside the origin). This is justified because (a) all-pole models give an accurate fit to most speech sounds, particularly for vowels and (b) all-pole models are typically easier to solve than pole-zero models.

The inverse problem: GIF
In this section we describe three approaches to GIF. The first one, the novel AM-GIF method, is based on time-frequency domain analysis using the wavelet transform and the inverse problem is solved under a deterministic setting. The second approach, the recently proposed MCMC-GIF algorithm, is based on frequency domain analysis using the z-transform and the inverse problem is solved under a stochastic setting. The third one, the state-of-the-art IAIF method, is based on linear prediction (LP) analysis and the problem is solved using a straightforward signal processing approach. Numerical experiments will be reported in section 4.  The poles closer to z = 1 (ω = 0) correspond to a low-frequency formant and the poles closer to z = −1 (ω = π) correspond to a high-frequency formant.

The AM-GIF approach
Tikhonov-type inversion is a common approach to solve an ill-posed inverse problem in the deterministic setting, whenever a stable solution is needed in the case of noisy data. If, however, there are uncertainties in both the operator and the measured data, the problem should be addressed with a more general approach [7].
The operator on the left-hand side of (1) will be denoted throughout this section as a linear convolution operator between Hilbert spaces U → H, as follows: where 0 t 1 and p o ∈ U is called the characterizing function for the convolution operator. Note that we are interested in recovering the unknown f, but at the same time the characterizing function p o is not precisely known for real data, only mathematical models are available, as discussed in section 2.2. Therefore, solving (1) for the operator (3) should be seen as a blind deconvolution problem. Moreover, parametric models such as the RK and LF models could be good approximations for p o , knowing a priori the glottal opening time. We assume the noise levels of the measurements (m δ , p ) to be known: and The framework of the AM-GIF proposed here is based on the core idea of the double regularized total least squares (dbl-RTLS) method. In short, dbl-RTLS is a deterministic approach introduced recently in [8] to solve inverse problems with noise in both data and operator, as stated in (4). Using this method, we aim at solving the following minimization problem where the term inside the parentheses measures the total discrepancy, τ is a fixed scaling parameter, L is a linear bounded operator, and α > 0 and β > 0 are regularization parameters. The amplitude of measurement noise determines the choice of α and β: Higher noise requires larger parameter values. For further properties and generalizations of the dbl-RTLS approach, see [8].
To overcome the drawback caused by the non-linearity and non-convexity in minimizing (5) with respect to the pair (p, f ) at the same time, we follow the alternating minimization strategy, which has been successfully adopted in solving optimization problems over two variables. This strategy has been implemented for blind-deconvolution [40], and also used as standard for solving the dbl-RTLS problem [9].
The AM-GIF algorithm for equation (5) is an iteration scheme. Going from a current iterate p (k) , f (k) to a new pair p (k+1) , f (k+1) involves two steps. First, keeping p (k) fixed, define f (k+1) as the solution of Second, keep f (k+1) fixed and define p (k+1) as the solution of The convergence of this method is proved in [9, section 3]. Splitting the minimization (5) into two steps transforms a seemingly formidable challenge into two standard problems. Namely, step (6a) is just Tikhonov regularization, and (6b) is an example of the regularized total least squares method, see [19,26].
We need a computational implementation of the convolution operator defined in (3). Assuming that p is periodic, one could follow the time-domain approach of [30] and rewrite equation (3) as matrix convolution. This yields a square circulant matrix with many desirable properties: full column rank, invertibility, and a circulant inverse [30]. One could also model (3) in the frequency domain using fast Fourier transform. However, this can be unstable if the operator is not precisely known.

Convolution and the wavelet transform.
Let us denote the wavelet family by {ϕ λ } λ∈Λ , which constitutes an orthonormal basis for a Hilbert space, where the index set Λ is defined by Furthermore, we denote Here φ is the scaling function and ψ 0,0 is the mother wavelet. Any function f ∈ U can be decomposed as We denote the coefficient sequence of f ∈ U by F( f ): In other words, we represent the signal f by the sequence x = F( f ) which belongs to 2 . For numerical reasons we have to truncate the summation of j at a certain fixed index J, called the maximal level.
The next goal is to express convolution equation (3) in the wavelet domain to obtain the desired data function m by applying the direct operator entirely in the wavelet context. We first compute an operator C that depends only on J and on the choice of the wavelet family.
Let y := F( p) and x := F( f ) be the coefficients from the characterizing and input function respectively. Then the coefficient of d := F(m) is determined via C(y, x) as The major work is to compute the operator C. Once accomplished, for a fixed interval, the maximal wavelet level J, and the number of samples N, one could vary the characterizing function or the input function. So the computation of the convolution operator remains straightforward via a few matrix-vector multiplications. More precisely, the operator C is a sequence of square matrices {C µ } µ , where 1 µ 2 J+1 , in such a way that the sequence coefficient {d µ } µ is computed by The detailed computation of the sequence of matrices {C µ } µ is explained in appendix.
For a fixed characterizing function p, the sequence of matrices can be combined into a unique square matrix A whose μth row A(µ, :) is given by and so By = d solves the forward convolution problem for a fixed function f.

3.1.2.
The AM-GIF algorithm. Now we can rewrite algorithm (6) by taking advantage of the wavelet decomposition. We now have convenient matrix machinery for the appropriate operators for fixed p and f: As seen in figure 4, both A and B are sparse matrices due to the compact support of the Haar wavelet basis. Note that the wavelet coefficients d δ of the measured data m δ may contain errors arising from measurement noise.
The first step of the AM algorithm (6a) in the time domain, for a fixed p, is equivalent to the minimization of the following functional: that is, recovering the wavelet coefficients. Whereas the second step of the alternating minimization algorithm (6b) in the time domain, for a fixed f, is equivalent to the minimization of the following functional: For the first step it is well-known that the solution of the above problem (10a) satisfies where the adjoint operator is the transpose matrix.
It is easy to see that the minimization of (10b) and

The MCMC-GIF approach
The MCMC-GIF method was introduced in [6]. In Bayesian inversion, robustness against modeling errors and measurement noise is achieved by complementing measurement data by a priori information. The measurement process is described probabilistically in the form of a likelihood model. Further, any a priori information about the unknown quantities is represented as prior probability distribution. The product of the likelihood and prior yields the posterior distribution. The solution of the ill-posed inverse problem takes the form of stable computation of the mean of the posterior distribution by Monte Carlo integration.
In the MCMC-GIF approach, the vocal tract model is assumed to be an all-pole filter. Each jth complex pole of the filter (see figure 3, left side) is parametrized by a pair (r j , θ j ) from its complex representation z j = r j exp(iθ j ) where r j and θ j are, respectively, the radius and angle of the pole in the z-domain. The glottal airflow is parametrized with the open quotient parameter of the RK model [34], denoted by Q in the following. Therefore, the inverse problem can be expressed as the recovery of the (combined) parameters vector v := [r 1 , θ 1 , r 2 , θ 2 . . . , r N , θ N , Q] T from a given measurement m, where N denotes the number of poles.
The computational procedure is based on an initial estimate of the vocal tract filter, here given by the IAIF method. Next, the algorithm refines the vocal tract model parameters within the MCMC-GIF method in order to obtain a more accurate glottal flow estimate. This GIF method is known for its good performance in the estimation of the glottal flow from high-pitched signals.

The IAIF Approach
The IAIF method was proposed more than 20 years ago [3] and it is still in use due its simplicity and fast computation.
The method relies on the assumption that the glottal flow is obtained by canceling the effects of the vocal tract and lip radiation from the speech measurement with an iterative structure. The theory of speech production via a chain of filters was introduced in section 2.1. According to [3] the vocal tract transfer function is modeled after eliminating the average glottal contribution. Then the glottal excitation is obtained by canceling the effects of the vocal tract and lip radiation by inverse filtering. This approach is based on LP analysis [27] in order to estimate the vocal tract filter function.

Numerical experiments
In this section, we will examine two vowels segments produced by adult speakers: the vowel [i], similar to the vowel sound in the English word meet, and the vowel [a], similar to the vowel sound in bath. These two vowels were chosen because they represent very different articulations: the former has a low first formant (F1) and high second formant (F2), whereas the latter has a high F1 and low F2.
All the computations were performed using Matlab version 2014b.

Simulation of the data
The forward problem (1) was computed with the following two functions, the synthetic glottal flow: created with sampling frequency f s = 16 kHz, computed using the LF model with the following parameters t p = 0.47, t e = 0.65, t a = 0.01, and t c = 1; and the vocal tract transfer These three F0 values correspond roughly to average pitch in the speech of men, women, and children, respectively. Finally, note that the forward problem is computed via the LF model, which has more degrees of freedom than the RK model. Therefore, the LF pulse is able to better capture the dynamics of the natural glottal airflow. In doing so, we also avoid the 'inverse crime', since the inversion does not utilize the LF model in any form. In this article, the computations were done using the noise level = 0.2873 for F0 = 100 Hz, = 0.3404 for F0 = 200 Hz and = 0.3753 for F0 = 250 Hz, and the noise level δ = 0.0001 for all frequencies, as white noise.

Inversion by AM-GIF
Algorithm [alg:amgif]1 is relative easy to implement and fast. Most of its CPU time is required to pre-compute the C matrix, but it can be stored and loaded for future experiments, with the same wavelet maximal level J. In the numerical tests, we set J = 8 for the Haar wavelet basis. Some additional sub-routines (available in the Matlab wavelet toolbox as cwtft and icwtft) are required to compute the wavelet coefficients and the inverse transformation.
Once matrix C is available, matrices A and B are computed for each step respectively through (8) and (9). Each step requires solving a linear system in order to recover the wavelet coefficients of both the filter and characterizing function. For the former, we need an operator L to enforce the shape of the filter function (fading out to zero in the time domain). For the latter reconstruction we use the classical regularization term with the identity operator. Moreover, for the minimization problem (10b), the required approximation y = F( p ) is indeed fundamental. In our experiments, the best reconstruction quality was obtained by a hybrid approximation. The hybrid approximation was obtained by splitting the signal into two parts: the CP and the OP. In the CP, we used the RK model. In the OP, we used the structure given by our first reconstruction of the filter function.
To solve both linear systems we used ldl Matlab's built-in factorization. Overall, the algorithm also required scaling and regularization parameters, which were heuristically chosen and fixed for all cycles.

Inversion by MCMC-GIF
The MCMC-GIF algorithm is based on a modern variant of the Metropolis-Hastings algorithm called DRAM [20]. The implementation used in the present study was taken from the MCMC Matlab package [24]. MCMC-GIF requires an initial guess for the vocal tract filter, which is provided by the IAIF method. MCMC-GIF has shown good performance, particularly for voices of high pitch. However, the computational cost is high: running the MCMC-GIF algorithm typically takes several hours using a single core processor. During each experiment the MCMC-GIF algorithm produced 10 5 samples. The CPU time could be reduced with parallel coding and by reducing the number of samples.

Inversion by IAIF
The IAIF algorithm [3] can be easily implemented, mainly with two build-in Matlab functions: the lpc and filter. IAIF is the fastest of the three GIF methods compared in the current study and it requires just a few parameter settings (for further details, see REFN). The estimation quality of IAIF is generally good for speech signals of low pitch but the method suffers from poor performance in the estimation of the glottal flow of high-pitched speech. In order to quantify the obtained glottal airflows, two special measurements that are widely used in the speech processing community were adopted in addition to the standard error measured according to the norm of the space (table 1(a)). The first one, H1-H2, measures the spectral tilt of the glottal flow. It is defined in the dB scale as the difference between the amplitude of the first and second harmonic of the glottal flow spectrum. The second one, the normalized amplitude quotient (NAQ), was introduced in [5]. It measures the relative time duration of the glottal closing phase from the ratio of the peak flow and the negative peak amplitude of the glottal flow derivative, normalized with respect to the length of the fundamental period. Estimation errors measured by H1-H2 and NAQ are displayed in tables 1(b) and (c).

Conclusion
A non-invasive inversion method, called AM-GIF, was proposed for the estimation of the glottal flow and vocal tract of a given speech waveform.
The complexity of solving the blind deconvolution problem involves recovering two variables through a nonlinear and nonconvex minimization, which is a nonlinear ill-posed inverse problem. A double regularization strategy was successfully applied, solving instead two linear and convex problems. Additionally, in order to reconstruct the functions with more desirable properties, the alternating minimization technique gave broader control in each step, tailoring constraints and specific regularization parameters. We also remark on the crucial role of the approximation p and the insight to create a hybrid signal, reflecting the reconstruction quality of the glottal flow.
One of the main benefits of the developed alternating minimization technique is to preserve the good quality of the MCMC-GIF method for high-pitched signals using a fraction of the computation time. With high F0, the new AM-GIF method was found to yield reconstruction quality close to MCMC-GIF and better than IAIF while reducing the computational burden of MCMC-GIF by two orders of magnitude. The new method clearly showed a much better fit to the reference in the closed phase of the glottal cycle. This happened particularly for the vowel [i] when the pitch was high, which, importantly, represents difficult material for all GIF methods (due to combination of low F1 and high F0). The objective measures, the NAQ and H1-H2, respectively focus mainly on the temporal and spectral properties of the glottal closing phase. Hence, they are unable to take into account the behavior of the flow waveform in the glottal open phase where GIF-AM shows, as demonstrated visually by figure 5, the best fit to the LF pulse.
The GIF algorithm proposed in the current study can be developed further in several directions. For instance, changing the wavelet family could improve the accuracy of the inverse problem solution. In addition, studying more complex regularization terms could help when searching for improvements in the overall quality.
Finally, we hope that the current study is capable of awakening the general interest of the mathematical inverse problems community in the highly interesting, yet difficult topic of human speech production.
Combining the previous decomposition with equation (3) We are interested to find the coefficients of the function m = Kf, denoted by d. Therefore we apply the operator F defined in (7) Let's give a close look at each component of d Notice that the summations above are finite, both on η and λ, namely 1 η 2 J+1 and 1 λ 2 J+1 . Identically 1 µ 2 J+1 .