Adaptive Cancellation of Localised Environmental Noise

Noise cancellation systems are useful in applications such as speech and speaker recognition systems where the effects of environmental noise have to be taken into considerations. A robust method for the cancellation of localised noise in noisy speech signals using subband decomposition and adaptive filtering is presented and described in this paper. The subband decomposition technique is based on low complexity octave filters that split the noisy speech input into subsidiary bands. A thresholding technique is then applied to the subbands to determine the presence or absence of environmental noise. This is used to control an adaptive filter which only responds to the noisy parts of the speech spectrum hence localising the adaptation process only on these segments. The Normalised Least Mean Squares algorithm (NLMS) is used for the adaptation process. A comparison with a similar system without localising the environmental noise shows the superior performance of the proposed system. It has been shown to perform better in terms of computational costs and convergence rate when compared to a system that does not take advantage of the information regarding the presence or absence of noise in a specific part of the speech spectrum. More than 35 dB of noise has been eliminated in less iterations than in conventional approach which needs longer time to reach steady state.


INTRODUCTION
In many established and emerging digital speech applications the effects of environmental noise has to be taken into considerations (Arowitz 2016, Matrouf et al. 2015, Jiang et al. 2017. This is because algorithms that perform well in a noise-free environment may degrade significantly in real-world environments where noise may be prevalent and unavoidable. In particular, environmental noise has a negative impact on the performance of feature extraction techniques and systems that are based on them. For example, the popular Mel-frequency Cepstral Coefficients (MFCC) used in many speech recognition applications is highly succeptible to environmental noise (Hermus & Wambacq 2006;Sahidullah & Saha 2012;Bhattacharjee et al. 2016).
The problem becomes severe when the noise constantly changes with time. This is because the spectrum of the noise changes according to the noise type. For eaxmple, white noise has a flat wideband spectrum while coloured noise may occupy a limited part of the spectrum in any frequency band. Environmental noise such car noise, for example, occupies the lower parts of the speech spectrum (Kozou et al. 2005;Zhao et al. 2014).
Adaptation process such as in adaptive filtering has been used to remove changing noise from speech signals. Adaptive filters use recursive filtering algorithms such as Least Mean Squares (LMS) and its variations to adjust the coefficients of the filter in response to the changing noise in the speech signals (Paolo 2008;Sayed 2011). In order to improve performance, adaptive filtering has been combined with subband decomposition using filter banks of various types (Lee et al. 2009;Noor et al. 2011). The advantage of using subband decomposition is that the overall coefficient update rate can be reduced resulting in lower computational complexity. In addition, in multirate systems, subband signals are usually downsampled which results in the whitening of the input signals and therefore an improved convergence performance.
The choice of filter banks and adaptation algorithms are also of importance to further reduce complexity (Zheng et al. 2017;Cheer & Daley 2017;Yu et al. 2016;Lorente et al. 2014;Reddy et al. 2011;Milani et al. 2009;Wong et al. 2014). However, the implementation of the adaptive filtering process has been assigned to all subbands. This is not necessary if the noise is located in certain bands while not in others which is the case for environmental coloured noise. This paper presents a noise cancellation system that is capable of removing localised noise from speech signals based on subband processing, thus improving convergence and reducing complexity. Low-complexity octave filter banks are used for subband decomposition and the outputs of the filter banks are analysed in order to localise the noisy segments of the input speech. These outputs are fed to the adaptive filter to change the filter coefficients accordingly only when noise is present. As low-complexity is desired, the Normalised Least Mean Squares (NLMS) algorithm is used for the adaptation process while the thresholding technique is based on the calculation of normalized power. The work presented in this paper is aimed to offer a robust method to reduce noise that is localised in certain parts of voice spectrum. Simulation results presented in this paper showed the capability of the proposed tequnique to overcome the localised noise problem with better fast convergence and lower compelxity than conventional fullband systems. The research offers a superior method for the reduction of envirnmental noise from speech signals.

PROPOSED NOISE CANCELLATION APPROACH
BAND SPLITTINg PROCEDURE Figure 1 shows a schematic of the splitting process using an octave filter bank (Vaidyanathan 1993). The octave implantation is a close resemblance of human hearing perception. The spectrum of the noisy speech signal s(n) is split into subbands using analysis filters H(z). Four levels of splitting are created. Although further splitting is possible, this will add to the complexity of the system. In each level of the split band, a Quadrature Mirror Filter (QMF) is used. The representation of the prototype low pass filter in the QMF is described by the following equation: The high pass version of the QMF bank is given by: the coefficients of a given transfer function H(z). Referring to equation (1), this transfer function can be represented in terms of its even and odd coefficients as follows: Therefore, In general, for an n-tap filter, the polyphase component representation is given by: More computational cost reduction can be achieved by exploiting the Noble identities of multirate systems with the shifting of the downsamplers as shown in Figure 2(b) (Vaidyanathan 1993).
Equations (1) and (2) represent the direct implementation of the filter bank. The computational cost of the QMF filter bank can be reduced by half using the polyphase implementation in each stage (Diniz 2012). This is shown in Figure 2(a). The polyphase representation of the filter of equation (1) is expressed by the following: where F k (z) is the k th polyphase component of the prototype filter. Polyphase representation is a method of reorganizing In this work, noise cancellation in the individual subband of the signal is performed using the noise cancellation model shown in Figure 3. An adaptive algorithm is used to control a Finite Impulse Response (FIR) filter in each subband of the split spectrum. The controlling algorithm is designed to adjust the taps of the FIR filter. The Normalised Least Mean Squares (NLMS) algorithm is used for this purpose. The NLMS is selected for its better noise cancellation performance for non-stationary signals compared to the Least Mean Squares (LMS) algorithm, in addition to providing better stability with comparable complexity. (Dhiman et al. 2013). Adaptive process using LMS is described by the following set of equations: Here, w is the filter weight coefficient vector, x is the input noise vector at time n, y represents the output of the adaptive filter, e is the error signal which also is the cleaned output, k is an index representing the subbands and μ is the adaptation step size which is normally limited to (Sayed 2011).
for the speech signal when it is free of noise and is used as a threshold to decide whether a segment is noisy. If the input signal is corrupted with noise, then its average power would be greater than that of the noise-free speech hence triggering adaptive filtering, otherwise no action is taken. Let the average power of the noise-free speech be P s and define the normalized power of the input subband signals as: where s k is the subband signal for k = 1,2,3,….M, with M being the total number of subbands and N k the number of samples in the k-th band. The adaptation condition is set as follows: if P(k) > P s , then perform adaptation.
The procedure described above would restrict the adaptation process to the noisy parts of the speech, hence speeding up the decision and reducing the total computational cost.

EXPERIMENTAL RESULTS
The QMF filter bank structure is applied to noisy speech signals splitting it into four subbands in the octave decomposition as described in Figure 1. The prototype analysis filter is implemented using a 32-tap FIR filter. Adaptive filtering in each subband is carried out using the NLMS algorithm, adjusting a 32-tap FIR filter. Table 1 shows the subband parameters where the step-size factor μ is tuned for each subband iteratively. The adaptive filter is initially tested using a speech signal subjected to white noise. Figure 4 shows the capability of the adaptive filter in filtering out the noise. With NLMS, the step-size μ is divided by the power of the input noise signal, hence producing variable step-size as follows: where σ is a small constant (greater than zero) used to avoid possible division by zero and || x n || is the norm or power of the input vector x k at a certain point in time.
The split process described in section 2.1 is applied to noise signal x(n) in the same way as to the noisy speech s(n). Referring to Figure 3, the noisy signal in each subband is applied as the desired input of the adaptive filter, while the noise x(n) in each subband is applied as the reference input to the adaptive. The final output is the subtraction of the output of the adaptive filter from the desired input, which forms the error signal.

LOCALISED NOISE DETERMINATION
In many situations in practice, the environmental noise is located in one or more subbands of the speech spectrum and not in all subbands simultaneously. Therefore, it is not necessary to perform adaptive filtering in all subbands. In this work, a condition to control the process of adaptation is introduced. If noise is detected in a specific subband, the adaptive process acts on that specific part of the signal, while the other subbands remain idle. For this purpose, a threshold is needed to control the process. The average power is measured  In order to test the capability of each subband to remove noise, coloured noise is generated from white noise to occupy each subband. Figure 5 shows the spectrum of the noise signal in each of the four bands.
The coloured noise is added to the speech signal to simulate localised noisy signals with the results of the corrupted spectrum shown in Figures 6 to 9 where red is the speech signal and blue is the localised noise. Each noisy signal is then used as the desired input to the adaptive filter. In each case, the adaptive filter is able to clean the noise by performing the adaptation only in the subband containing the noise, while the other subbands remain idle.
the subband filter with localised noise when the noise is corrupting one subband. Figure 11 shows the examples of the spectrums, before and after filtering, in this case for combination subbands 1 and 2. The similar spectrums demonstrate that the system is able to filter the noise out. Figure 12 shows the Mean Square Error (MSE) comparisons of the subband filter with a fullband NLMS adaptive filter, when both are subjected to noise localised between 0.5 to 1 kHz. From this figure, it can be observed that while the fullband filter is still converging in a slow manner in response to the localised noise, the subband system with subband 2 in operation has already converged in few iterations, this result is verified by Figure 10. The fast convergence is also observed when experiments are conducted with other subband noise combinations as shown the Figure 12 for combination subbands 1 and 3. Experiments are also carried out with different combinations of noisy signals, for example a signal with noise occupying subbands 1 and 2 at the same time. In all cases, the adaptive filter is able to perform the adaptation process as designed.
With localised noise cancellation, the adaptive process runs only for specific subbands resulting in computational costs that can be reduced by a factor proportional to the number of subbands that the noise occupies. For example, the length of a fullband adaptive filter is four times that of It should be noted here that a similar subband system with adaptive filters in all subbands would perform in the same manner, but the difference is that adaptive process would run in all subbands at the same time whether there is a noise in each subband or not. This has the disadvantage of more power consumption and needs high speed processors to accomplish it.
To support the results of Figure 12 and to demonstrate the success of the proposed method compared to conventional fullband approach, the voice signal processed by the fullband NLMS system is shown in Figure 13. Inspecting Figure 13  carefully and noticing the amount of noise that persists for relatively long time in the filtered signal compared to that shown earlier in Figure 10 for the same frequency band, it is clear that the fullband system has poor noise cancellation performance compared to the proposed subband method in response to localised noise conditions. CONCLUSION A method for adaptive cancellation of localised environmental noise has been presented. Low complexity octave filter banks were used to decompose the noisy speech signals into subbands. The outputs of the filter banks were used to determine the parts of the speech that were noisy using simple threshold to control the adaptation process. The subband adaptive filter was configured to only process the localised noisy parts using the NLMS adaptation algorithm. It has been shown to perform better in terms of computational costs and convergence rate when compared to a system that does not take advantage of the information regarding the presence or absence of noise in a specific part of the speech spectrum. Fast convergence and savings in computational cost suggest that this filter is suitable for implementation in speech applications deployed in situations where environmental noise is prevalent.