Fast Computation of LSP Frequencies Using the Bairstow Method

: Linear prediction is the kernel technology in speech processing. It has been widely applied in speech recognition, synthesis, and coding, and can efficiently and correctly represent the speech frequency spectrum with only a few parameters. Line Spectrum Pairs (LSPs) frequencies, as an alternative representation of Linear Predictive Coding (LPC), have the advantages of good quantization accuracy and low spectral sensitivity. However, computing the LSPs frequencies takes a long time. To address this issue, a fast computation algorithm, based on the Bairstow method for computing LSPs frequencies from linear prediction coefficients, is proposed in this paper. The algorithm process first transforms the symmetric and antisymmetric polynomial to general polynomial, then extracts the polynomial roots. Associated with the short-term stationary property of speech signal, an adaptive initial method is applied to reduce the average iteration numbers by 26%, as compared to the statics in the initial method, with a Perceptual Evaluation of Speech Quality (PESQ) score reaching 3.46. Experimental results show that the proposed method can extract the polynomial roots efficiently and accurately with significantly reduced computation complexity. Compared to previous works, the proposed method is 17 times faster than Tschirnhus Transform, and has a 22% PESQ improvement on the Birge-Vieta method with an almost comparable computation time.


Introduction
Linear predictive analysis of the speech signal is one of the most powerful speech analysis techniques, which can extract the short-time spectral envelope information of speech signals efficiently, and is widely used in the fields of speech representing for low bit rate transmission or storage, automatic speech and speaker recognition [1][2][3][4][5]. The predominant linear predictive analysis method is Linear Prediction Coding (LPC), featuring better fault tolerance during transmitting spectral envelope information. As an alternative representation of LPC, Line Spectrum Pairs (LSPs) were first introduced by Itakura [6], which have the advantages of good quantization accuracy and low spectral sensitivity.
During the computation of LSPs, the LPC polynomial A(z) is decomposed into a pair of symmetric and anti-symmetric polynomials. These two polynomials are called LPC polynomials. It has been proven that the roots of the LPC polynomials, namely the LSPs frequencies, are interleaved on the unit circle [6]. Many methods have been proposed to solve these LPC polynomials. Soong and Juang [7,8] adopted the Discrete Cosine Transform (DCT) to evaluate the cosine functions, based on the bisection method with a fine grid. They employed a closed formula to extract the roots of LPC polynomial. Kang and Fransen [9] first proposed to transform the LPC polynomial into the sum of cosine functions to avoid the complex computation in later processing. They employed the autocorrelation method and the all-pass filter to extract the roots. Despite these methods solving the LPC polynomials correctly, they have the disadvantages of high computational complexity from trigonometric functions, and high memory usage from cosine tables. Kabal and Ramachandram [10] employed the bisection method and the interpolation property to locate the position of zero-crossings with a fine grid. While it requires no prior storages or the calculation of trigonometric functions, the number of bisections cannot be decreased without compromising the search of zero-crossings. Wu and Chen [11] first proposed to transform LPC polynomial into a pair of general-form polynomials, then used the closed-form formulas and the modified Newton-Raphson method to extract polynomial roots. It can avoid the computation of trigonometric functions with a fine grid. However, the mathematical operations with complex numbers are still time-consuming. Chen and Ruan [12] proposed the modified complex-free Ferrari formula to reduce the computation. While this algorithm avoids complex number operations, it still needs modulus operations to get the roots of the polynomials. Chen and Chang [13] employed the Tschirnhaus transform to reduce the degree of the polynomials, then used Descartes-Euler to extract the roots. While it can accelerate the computation processing, it still needs to compute trigonometric functions. Chung-Hsien Chang [14] proposed a method to solve the LPC polynomials, which is suitable for hardware implementation.
Hence, an efficient method to solve LPC polynomials should have the following characteristics: 1. Avoiding evaluation of trigonometric functions, 2. Avoiding using complex operations and a fine grid, 3. Extracting the roots rapidly and accurately.
Considering the above requirements, we proposed a fast LPC computation method, based on the Bairstow method, which can extract the roots of LPC polynomials without using complex operations, a fine grid and trigonometric functions. Luk [15] and Hsiao [16] used the Bairstow method to solve the polynomial, but without specific algorithm implementation and performance evaluation of the algorithm. Additionally, the initial method was stubborn, which limited the convergence speed. Thus, an adaptive initial method, associated with the short-term property of the speech signal, is proposed to guarantee its convergence in this paper. Then, we apply our method to a speech compression codec system. Experiments show that, compared with other methods, including those of Soong and Juang [8], Kabal and Ramachandram [10], the Chen and Wu method [11], the modified Ferrari's formula [12], Tschirnhaus transform [13], the Birge-Vieta method [14] and the previous Braistow-base LPC analysis methods [15,16], the proposed method can estimate the LSPs frequencies accurately with the lowest computational complexity, together with high speech quality.
The rest of this paper is organized as follows: A brief introduction of LSPs frequencies is reviewed in Section 2. In Section 3, a detail description of the Bairstow method is given, then the adaptive initial method is given to boost the estimation accuracy and the convergence speed. Section 4 provides a guideline to estimate the 10-order LSPs, and shows the experimental results of the proposed method, as compared to previous methods. Finally, Section 5 provides the conclusion.

LPC Polynomial
This section introduces the background of the LSPs frequencies and some of the relevant properties briefly. Given a p-order LPC, the minimum-phase LPC polynomial is expressed as follows, according to Reference [14]: where 1 a , 2 a ,…, p a are the direct form of LPC coefficients. The LPC polynomial can be decomposed into a symmetric ( ) P z polynomial and an anti-symmetric ( ) Q z polynomial. The following formulas are their representations: The roots of ( ) Furthermore, because the coefficients of both  [16] and they satisfy the ordering property as follows: The ordering property of LSF can be illustrated in Figure 1.

Z-plane
Re Im w Figure 1. Illustration of the root location 10 P = (even). Combing = jw z e , Equations (6) and (7) can be described as follows:

General-Form Polynomial Transformation for the Computation of LSPs Frequencies
The zero crossings of ( ) p R z and ( ) q R z are the values of LSPs frequencies. In order to find the LSPs frequencies, we need to solve polynomials This paper discusses the commonly used value of P =10, then Equations (12) and (13) can be expressed as a general polynomial: The above formula can be deployed to a general-form polynomial as follows: where ( ) f n are the polynomial coefficients, = 2 P M . We define ( ) = cos x w . As w is in the range ( ) π 0, , then x is in the range ( ) − + 1, 1 . Therefore, the above two polynomials can be written as According to Reference [17], the value of ( ) f n can be found by the recursive relations: x is a general-form polynomial with the same order of ( ) , R w . Comparing Equation (19) to (16), the coefficients relationship of ( ) , R w and ( ) f x can be written as In summary, after a series of steps, LPC polynomials are turned into a general-form polynomial and the relationship between LPC coefficients and general-form polynomial coefficients is found. Therefore, Section 3 introduces the method to solve the general-form polynomial.

Polynomial Solution Based on Bairstow Method
As mentioned in Section 2, the LPC polynomials can be transformed into a general-form polynomial. Furthermore, the relationship between the LPC coefficients and coefficients of the general-form polynomial is presented by Equation (20). To solve the general-form polynomial, this paper employs the Bairstow method.
The Bairstow method [18][19][20] is a rapid root-finding algorithm for general-form polynomial. Compared to other conventional methods, it can efficiently reduce the number of addition and multiplication operations, and avoids the evaluation of trigonometric and other complex computations. Thus, the proposed method can converge rapidly and reduce the computation time significantly.
The Bairstow method follows from the observation that the roots of a real quadratic polynomial: are the roots of a given N -order real polynomial only if ( ) f x can be divided by Equation (21) without the remainder. After deflating, the deflated polynomial can be represented as where the degree of ( ) The remainder is expressed as Ax B + . The coefficients of the remainder depend upon r and q , that is ( ) and the remainder vanishes when r and q satisfy the condition According to the scheme, if 0 r and 0 q are the initial approximations for rand q , then the iterative process can be written as where r A , q A , r B , q B are the partial derivatives with respect to rand q evaluated at i r and i q .
The values A and B can be found by means of a Horner-type [17] scheme. Comparing Equation (22) with (23), we can get the following recursive formula: and the value of dr and dq : The convergence criterion is defined to terminate the iterative procedure when the condition r ε Δ < and q ε is satisfied for a specified value ε . The predefined value ε determines the accuracy of the root.
Then, we can get the root of ( ) To obtain the root of 3 x and 4 x , we employ a further division of ( ) Then, we obtain the deflated polynomial: where the degree of ( ) To find all the roots, we divide  , until the deflation results in a quadratic polynomial. Table 1 shows the pseudocodes of the proposed method according to Equations (26)-(31). r and q are the initial value of r and q , respectively. According to Rererences [17,21], this algorithm is sensitive to the initial value of r and q . It will converge rapidly if the initial value of r and q are sufficiently close to their true values. In Section 4, we will discuss in detail how to choose the initial value of r and q . 1 set r and q as the initial value of rand q After calculating the LSPs with the proposed method, the LSPs should be sorted in descending order according to Equations (10) or (11) with the Insert-Sort [22] algorithm.
Denoting the roots of the general-form polynomial are i x , the LSPs frequencies are given by ( ) ... P w w w < < < < , P is even. (33)

Experimental Environment
As mentioned earlier in Section 2, the LSPs are the representative parameters of the LPC filter. To calculate the LSPs, we use the Aurora data [23] as the benchmark. The Aurora database is a professional speech corpus created by the Aurora project (www.eld.org). There are 1001 clean and 1001 noisy speech files from the Test-A set, spoken by 50 males and 50 females used in this paper.
The test speech signals are sampled at 8 kHz with a resolution of 16 bits per sample. According to Reference [1], the speech signal has a short-term stationary property, so we select every 80 samples (10 ms) as a frame. In order to evaluate the performance of the proposed method, such as computing time, convergence speed, and its impact on speech quality, we choose ITU-T G.729, which is a data compression algorithm using conjugate-structure algebraic-code-excited linear prediction, as a verification platform. The proposed method is implemented using C language and is inserted into the speech encoding module of the ITU-T G.729 platform. Finally, 1,754,330 sets LSPs of 10-order LPC are evaluated on a personal computer (Intel i5-9300H).

Selection and Update of Initial Values
As introduced in Section 3, the proposed method is sensitive to the selection of the initial values of r and q , which greatly influences the convergence speed of polynomial optimization.
The following experiment shows how to select appropriate initial values.
Three strategies are proposed to select the initial values of r and q . The first method, called the fixed initial method, which has been employed by a previous works [15], provides fixed initial values for the roots finding. The second method named statistic initial method calculates the statistical mean values of r and q respectively, then uses the statistical mean values as the initial values for the evaluation of the rest. The third method, called the adaptive initial method, which combines the short-term stationary property of the speech signal, then uses the final result of the current frame as the initial values of the next frame.
Aurora corpus is used as the benchmark set. For the fixed initial method, 1001 speech files from testa_ns_snr15 are evaluated with the initial values being 0, 4 or 8. For the statistic initial method, in order to guarantee the robustness, we use 200 speech files from testa_clean1 as the analytical set, and 1001 speech files from testa_n1_snr15 as the evaluating set, to guarantee the robustness. After the analytical set is processed to satisfy the convergence criterion, the statistic initial method is employed to calculate their statistical mean values ave r and ave q . Then, these values are used to initialize each frame for all speech samples from the evaluating set. For the adaptive initial method, without the necessity of precalculating initializations, 1001 speech files from testa_n1_snr15 are all adopted as the evaluating set. The initial values of the first frame are randomly generated, and the initial values of the subsequent frames are obtained directly from the previous frame.

Mathematic analysis Δr Δq
Mean 0.0419 0.0305 Standard Deviation 0.0078 0.0038 To investigate the convergence speed of the three initial selection strategies, we perform a statistical analysis of the experiment data. Table 4 and Figure 2 illustrate the average iteration numbers of the fixed, statics and adaptive initial method. Where n1 and n2 mean iteration numbers of the iterative loop for the first five-order polynomial, the iterative loop for the first three-order polynomial, respectively; n3 and n4 mean iteration numbers of the iterative loop for the second five-order polynomial, the iterative loop for the second third-order polynomial, respectively; Total means the iteration numbers for calculating all 10 roots for the two LPC polynomials.The second column of Table 4 means the different initial value for and , where the first row applies = 0 and = 0, the second row applies = 4 and = 4, and the third row applies = 8 and = 8 for all test speech frames. In Figure 2, the two vertical lines in each subplot represent the average convergence iterations of the adaptive initial method with the dash, and the static initial method with the dot.  As Table 4 and Figure 2 show, the setting of the initial values directly affect the convergence iterations. The fixed initial method performs worse than the statistic initial method, and the convergence speed of the adaptive initial method is the fastest, with the average iterations of n1, n2 and n4 at about 3.7, the average iterations of n3 at about 5.2. Further, compared with the fixed initial ( = 8 and = 8) method and statics initial method, the total average iteration numbers are reduced by 68% and 26%, respectively, with the same computation accuracy. This also verifies the conclusion in Table 2; the value of and fluctuate slightly in the adjacent frame due to the short term stationary property of speech signal, thus selecting the final result of and for the current frame as the initial value for the next frame can reduce the average iteration obviously. Therefore, the adaptive initial method is effective, and we apply the adaptive initial method as the selected initial strategy.

Solve the 10-order LSP Using the Proposed Method
We randomly select a speech frame to evaluate the computation accuracy of the proposed method. The selection of the initial method follows the strategy mentioned above. We solve the following ( ) x F polynomial and present the experimental results in Table 5, Table 6 and Table 7, where means the jth root of the polynomial, where | | means the threshold value of iteration termination.  r and 2i q .  After the first and the second stages, we obtain the resulting value of , then transform the into the LSF by using the formula = . Combing Table 5, Table 6 and Table 7, the results show that the optimization converges extremely fast, and the value of LSF is accurate.

Performance of the Proposed Method
This section evaluates the performance of the proposed method, including the computation time and the speech quality, by making a comparison between the proposed method and other methods, including the Birge-Vieta method [14], Tschirnhaus Transform [13], original/Modified Ferrari's method [12], Chen and Wu method [11], Soong and Juang [8], and the Full search.
Due to the lack of the algorithm performance parameters Reference [14], we established the Birge-Vieta method model strictly according to the reference paper. Table 8 shows the Perceptual Evaluation of Speech Quality (PESQ) [24] comparison between the proposed method and the Birge-Vieta method. Table 9 shows the comparison of the average computation time between the proposed method and the previous works, under the same computation accuracy (

Test speech PESQ
The proposed method testa_clean1 3.35 testa_n1_snr15 3.46 Birge-Vieta method testa_clean1 2.74 testa_n1_snr15 3.36 Table 9. Comparison of the average computation time with the previous works ( 0.00001

Time (ms) Normalized Clock numbers Environment
The proposed method 0.0016 5920 3. The second column of Table 9 is the average computation time of converting 1,754,330 sets of LPC coefficients to LSF frequencies. Associated Table 7 and Table 8, the results show that the PESQ of the proposed method has improved by 22%, in comparison to the Birge-Vieta method under the testa_clean1, while its computation time is comparable. After normalizing the CPU frequency of system environment to 3.7 GHz and with the same computation accuracy, the proposed method is 17 times faster than the Tschirnhaus transform method, 23 times faster than the Modified Ferrari's method, 47 times faster than the Original Ferrari's method, 47.4 times faster than the Chen and Wu method, 152 times faster than the Soong and Juang method, and is 201 times faster than the Full Search. It is obvious that the proposed method is the fastest, with the same computation accuracy.

Conclusions
This paper proposed an efficient method to calculate the roots of LPC polynomial, which can transform LPC coefficients to LSPs accurately and rapidly. Associated with the short-term stationary property of speech signal, an adaptive initial method is applied to reduce the average iteration numbers by 26%, as compared with the statics initial method, with the PESQ score reaching 3.35. Experiment results show that the proposed method is 17 times faster than the Tschirnhus Transform, and has a 22% PESQ improvement in comparison to the Birge-Vieta method, with almost the same computation time. Compared to other root-finding methods, the proposed method is the fastest one with comparable speech quality.