Time-Domain Convolutive Blind Source Separation Employing Selective-Tap Adaptive Algorithms

We investigate novel algorithms to improve the convergence and reduce the complexity of time-domain convolutive blind source separation (BSS) algorithms. First, we propose MMax partial update time-domain convolutive BSS (MMax BSS) algorithm. We demonstrate that the partial update scheme applied in the MMax LMS algorithm for single channel can be extended to multichannel time-domain convolutive BSS with little deterioration in performance and possible computational complexity saving. Next, we propose an exclusive maximum selective-tap time-domain convolutive BSS algorithm (XM BSS) that reduces the interchannel coherence of the tap-input vectors and improves the conditioning of the autocorrelation matrix resulting in improved convergence rate and reduced misalignment. Moreover, the computational complexity is reduced since only half of the tap inputs are selected for updating. Simulation results have shown a signiﬁcant improvement in convergence rate compared to existing techniques.


INTRODUCTION
Blind source separation (BSS) [1,2] is an established area of work estimating source signals based on information about observed mixed signals at the sensors, that is, the estimation is performed without exploiting information about either the source signals or the mixing system. Independent component analysis (ICA) [3] is the main statistical tool for dealing with the BSS problem with the assumption that the source signals are mutually independent. In the instantaneous BSS case, signals are mixed instantaneously and ICA algorithms can be directly employed to separate the mixtures. However, in a realistic environment, signals are always mixed in convolutive manner because of propagation delay and reverberation effects. Therefore, much research deals with convolutive blind source separation based on extending instantaneous blind source separation or independent component analysis to convolutive case.
The straightforward choice in time-domain convolutive blind source separation is based on directly extending instantaneous BSS to the convolutive case [4,5]. This natural approach achieves good separation results once the algorithm converges. However, time-domain convolutive blind source separation suffers from high computational complexity and low convergence rate, especially for systems requiring long FIR filters for the separation.
Frequency domain convolutive BSS [6,7] was proposed to deal with the expensive computational complexity problem of time-domain BSS. In frequency domain BSS, complex-valued ICA for instantaneous BSS is employed in every frequency bin independently. The advantage of this approach is that any existing complex-valued instantaneous BSS algorithm can be used and the computational complexity is reduced by exploiting the FFT for the computation of convolution which is the basis of popularity of frequency domain approaches. However, the permutation and scaling ambiguity in the ICA algorithm, which is not a problem for instantaneous BSS, becomes a serious problem in frequency domain convolutive BSS. Since frequency domain convolutive BSS is performed by instantaneous BSS at each frequency bin separately, the order and the scale of the unmixed signals are random because of the inherent ambiguity of ICA algorithms. When we transform the separated signals back from frequency domain to time domain, the components at a given frequency bin may not come from the same source signal and may not have a consistent scale factor. Thus, we need to align these components and adjust the scale in each frequency bin so that a separated signal in time domain is obtained from frequency components of the same source signal and with consistent amplitude. This is well known as the 2 EURASIP Journal on Audio, Speech, and Music Processing permutation and scaling problem of frequency domain convolutive BSS [8,9]. These built-in problems in frequency domain approaches make it worthwhile to reconsider ways of reducing the complexity of time-domain approaches and improving their convergence rates.
In recent years, several partial update adaptive algorithms were proposed to model single-channel systems with reduced overall system complexity by updating only a subset of coefficients. Within these partial update algorithms, the MMax NLMS in [10] was reported to have the closest performance to the full update case for any given number of coefficients to be updated. In [11], the MMax selective-tap strategy was extended to the two-channel case to exclusively select coefficients corresponding to the maximum inputs as a means to reduce interchannel coherence in stereophonic acoustic echo cancellation rather than as a way to reduce complexity. Simulation results for this exclusive maximum adaptive algorithm show that it can significantly improve the convergence rate compared with existing stereophonic echo cancellation techniques.
In this paper, we propose using these reduced complexity approaches in time-domain BSS to address complexity and low convergence problems. First, we propose MMax natural gradient-based partial update time-domain convolutive BSS algorithm (MMax BSS). In this algorithm, only a subset of coefficients in the separation system gets updated at every iteration. We demonstrate that the partial update scheme applied in the MMax LMS algorithm for a single channel can be extended to the multichannel time-domain convolutive BSS with little deterioration in performance and possible computational complexity saving. By employing selectivetap strategies used for stereophonic acoustic echo cancellation [11], we propose exclusive maximum selective-tap timedomain convolutive BSS algorithm (XM BSS). The exclusive tap-selection update procedure reduces the interchannel coherence of the tap-input vectors and improves the conditioning of the autocorrelation matrix so as to accelerate convergence rate and reduce the misalignment. The computational complexity is reduced as well since only half of the tap inputs are selected for updating (note that some overhead is needed to select the set to be updated). Simulation results have shown a significant improvement in convergence rate compared with existing techniques. As far as we know, the application of partial update and selective-tap update schemes to time-domain BSS algorithm is in itself novel.
BSS algorithms are generally preceded by a prewhitening stage that aims to reduce the correlation between the different input sources (as opposed to regular whitening where correlation between different samples of the same source is reduced). This decorrelation step leads to a subsequent separation matrix that is orthogonal and less ill-conditioned. The proposed partial update BSS algorithm incorporates this whitening concept into the separation process by adaptively reducing the interchannel coherence of the tap-input vectors.
The rest of this paper is organized as follows. In Section 2, we review blind source separation and its challenges in time domain and frequency domain. In Section 3, we review the single-channel MMax partial update adaptive algorithm for linear filters. In Section 4, we review exclusive maximum selective-tap adaptive algorithm for stereophonic echo cancellation. We propose the MMax partial update time-domain convolutive BSS algorithm in Section 5 and the exclusive maximum update time-domain convolutive BSS algorithm in Section 6. The tools for assessing the quality of the separation are presented in Section 7 and simulation results for the proposed algorithms for generated gamma signals and speech signals are presented in Section 8. In Section 9, we draw our conclusions from our work.

Instantaneous time-domain BSS
Blind source separation (BSS) is a very versatile tool for signal separation in a number of applications utilizing observed mixtures and the independence assumption. For instantaneous mixtures, independent component analysis (ICA) can be employed directly to separate the mixed signals. The ICA-based algorithm for instantaneous blind source separation requires the output signals to be as independent as possible. Different algorithms can be obtained based on how this independence is measured. The instantaneous timedomain BSS structure is shown in Figure 1. In this paper, we use the Kullback-Leibler divergence to measure independence and obtain the BSS algorithm as follows: The Kullback-Leibler divergence of the output signal vector Q. Pan and T.  is where p(y) is the probability density of output signals, p i (y i ) is the probability density of output signal y i , q(y) is the joint probability density of output signals: where H(·) is the entropy operation. Using standard gradient where ϕ(y) = [∂p 1 (y 1 )/∂y 1 / p 1 (y 1 ), . . . , ∂p N (y N )/∂y N / p N (y N )] is a nonlinear function related to the probability density function of source signals, the coefficients W in the unmixing system are then updated as follows: However, BSS algorithms have traditionally used the natural gradient [4] which is acknowledged as having better performance. In this case, ΔW is given by

Convolutive BSS algorithm
The convolutive BSS model is illustrated in Figure 2. N source signals {s i (k)}, 1 ≤ i ≤ N, pass through an unknown N-input, M-output linear time-invariant mixing system to yield the M mixed signals {x j (k)}. All source signals s i (k) are assumed to be statistically independent.
Defining the vectors where * is convolution operation.
The jth sensor signal can be obtained by where h ji (l) is the impulse response from source i to sensor j, L defines the order of the FIR filters used to model this impulse response. The task of the convolutive BSS algorithm is to obtain an unmixing system such that the outputs of this system y(k) = [y 1 (k) · · · y N (k)] T become mutually independent as the estimates of the N source signals. The separation system typically consists of a set of FIR filters w i j (k) of length Q each. The unmixing system can also be represented as The ith output of the unmixing system is given as By extending the instantaneous BSS algorithm to the convolutive case, we get the time-domain convolutive BSS algorithm as where W the unmixing matrix with FIR filters as its components. This approach is the natural extension and achieves good separation results once the algorithm converges. However, time-domain convolutive blind source separation suffers from high computational complexity and low convergence rate, especially for systems with long FIR filters.
Convolutive BSS can also be performed in frequency domain by using short-time Fourier transform. This method is very popular for convolutive mixtures and is based on transforming the convolutive blind source separation problem into instantaneous BSS problem at every frequency bin. 4 EURASIP Journal on Audio, Speech, and Music Processing The advantage of frequency domain convolutive BSS lies in three factors. First the computational complexity is reduced since the convolution operations are transferred into multiplication operations by short-time FFT. Second, the separation process can be performed in parallel at all frequency bins. Finally any complex-valued instantaneous ICA algorithm can be employed to deal with the separation at each frequency bin. However, the permutation and scaling ambiguity in ICA algorithm, which is not a problem for instantaneous BSS, becomes a serious problem in frequency domain convolutive BSS.
This problem can be illustrated by Figure 3. Frequency domain convolutive BSS is performed by instantaneous BSS at each frequency bin separately. As a result, the order and the scale of the unmixed signals are random because of the inherent indeterminacy of ICA algorithms. When we transform the separated signals back from frequency domain to time domain, the components at different frequency bins may not come from the same source signal and may not have consistent scale. Thus, we need to align the permutation and adjust the scale in each frequency bin so that a separated signal in time domain is obtained from frequency components of the same source signal and with consistent amplitude. This is not a simple problem.

PARTIAL UPDATE ADAPTIVE ALGORITHM
The basic idea of partial update adaptive filtering is to allow for the use of filters with a number of coefficients L large enough to model the unknown system while reducing the overall complexity by updating only M coefficients at a time. This results in considerable savings for M L. Invariably, there are penalties for this partial update, the most obvious of which is reduced convergence rate. The question then becomes which coefficients should we update and how do we minimize the impact of the partial update on the overall filter performance. In this section, we review the MMax partial update adaptive algorithm for linear filters [10] since it forms the basis of our proposed MMax time-domain convolutive BSS algorithm.
Consider a standard adaptive filter set-up where x(n) is the input, y(n) is the output, and d(n) is the desired output, all at instant n. The output error e(n) is given by where w(n) is the L × 1 column vector of the filter coefficients and . , x(n − L + 1)] of the current and past inputs to the filter, both at instant n. The ith element of w(n) is w i (n) and it multiplies the ith delayed input x(n), The basic NLMS algorithm is known for its extreme simplicity provided for coefficient update as given by where μ is the step size determining the speed of convergence and the steady state error.
In the single-channel MMax NLMS algorithm [10], for an adaptive filter of length L, the set of M coefficients to be updated is selected as the one that provides the maximum reduction in error. It is shown in [10] that this criterion reduces to the set of coefficients multiplying inputs x(n − i) with the largest magnitude using the standard NLMS update equation. This selective-tap updating can be expressed as where Q(n) is the tap-selection matrix as An analysis of the mean square error convergence is provided in [10] based on matrix formulation of data-dependent partial updates. Based on the analysis, it was shown that the MMax algorithm provides the closest performance to the full update case for any given number of coefficients to be updated. This was also confirmed in [12].

EXCLUSIVE MAXIMUM SELECTIVE-TAP ADAPTIVE ALGORITHM
Recently, an exclusive maximum (XM) partial update algorithm was proposed in [11] to deal with stereophonic echo cancellation. The XM algorithm was motivated by MMax partial update scheme [10] as both select a subset of coefficients for updating in every adaptative iteration. However, in the XM partial update, the goal is not to reduce computational complexity. Rather the exclusive maximum tapselection strategy was proposed to reduce interchannel coherence in a two-channel stereo system and improve the conditioning of the input vector autocorrelation matrix. We now review the algorithm in [11] here since it forms the basis of our proposed XM time-domain convolutive BSS algorithm.
In stereophonic acoustic environment, the stereophonic signals x 1 (n) and x 2 (n) are transmitted to louder speakers in the receiving room and coupled to the microphones in this room by the room impulse responses. In stereophonic acoustic echo cancellation, these coupled acoustic echoes have to be cancelled. Let the receiving room impulse responses for x 1 (n) and x 2 (n) be h 1 (n) and h 2 (n), respectively. Two adaptive filters h 1 (n) and h 2 (n) of length L in stereophonic acoustic echo canceller are updated to estimate h 1 (n) and h 2 (n). The desired signal for the adaptive filters is where Thus, the error signal is Adaptive algorithms such as LMS, NLMS, RLS, and affine projection (AP) can be used to update these two adaptive filters h 1 (n) and h 2 (n). The exclusive maximum tap-selection scheme is outlined in the following.
(1) At each iteration, calculate the interchannel tap-input magnitude difference vector as Order x 1 and x 2 according to the sorting of p as It was shown in [11] that this update mechanism applying to LMS, NLMS, RLS, and affine projection (AP) algorithms results in significantly better convergence rate than their existing corresponding algorithms.

PROPOSED MMAX PARTIAL UPDATE TIME-DOMAIN CONVOLUTIVE BSS ALGORITHM
From the description of MMax partial update in Section 3, we know that the principle of MMax partial update algorithm for single channel is to update the subset of coefficients which has the most impact on Δw. Our proposed MMax partial update convolutive BSS algorithm is based on the same principle.
In the MMax LMS algorithm [10], given Δw(n) = e(n)x(n), the e(n) is common to all elements of Δw(n), then the larger the |x(n − i)|, the larger its impact on error. Thus, in MMax LMS algorithm, the coefficients corresponding to M largest values in |x(n)| are updated.
However, in time-domain convolutive BSS, ΔW is as follows: (2) Iteration k (3) Go to step 2 to start a new iteration.
principle, the coefficients with the M largest values of ΔW i j are the ones to be updated. We show this algorithm using a 2-by-2 system as an example in Algorithm 1. From the algorithm description, the challenge compared to the MMax LMS algorithm [10] is that we need to sort the elements in ΔW i j in every iteration, as opposed to simply identifying the location of one new sample in an already ordered set. However, we only need to update the selected subset of coefficients, which results in some savings.

PROPOSED EXCLUSIVE MAXIMUM SELECTIVE-TAP TIME-DOMAIN CONVOLUTIVE BSS ALGORITHM
As we already know from Section 4, exclusive maximum tap selection can reduce interchannel correlation and improve the conditioning of the input autocorrelation matrix. In this section, we examine the effect of tap selection on interchannel coherence reduction and extend this idea to our multichannel blind source separation case.

Interchannel decorrelation by tap selection
The squared coherence function of x 1 , x 2 is defined as where P x1x2 ( f ) is the cross-power spectrum between the two mixtures x 1 , x 2 and f is the normalized frequency [11]. A two-input two-output system is considered in this section. The mixing system used in the simulation is as follows: where b is an independent white Gaussian noise with zero mean.
In the simulation, we set γ = 0.9 to reflect the high interchannel correlation found in practice between the observed mixtures in a convolutive environment. The two-tap input signals s 1 and s 2 are generated as zero mean, unit variance gamma signals. The mixtures x 1 and x 2 are obtained from the following equations: where * is convolution operation. The squared coherence for the x 1 and x 2 with full taps selected is shown in Figure 4. In Figure 5, the squared coherence for inputs with taps selected according to the MMax selection criterion as described in Section 4 is shown. We can see that the correlation is reduced, but not significantly. Figure 6 shows the squared coherence for signals with exclusive tap selected, that is, the selection of the same tap index in both channels is not permitted. We can see that the correlation is reduced significantly. This confirms that exclusive tap-selection strategy does indeed reduce interchannel coherence and as such improves the conditioning of the input autocorrelation matrix even in the mixing environment of blind source separation case.

Proposed XM update algorithm for time-domain convolutive BSS
As a result of improved conditioning of input autocorrelation matrix, we expect improved convergence rate in timedomain convolutive BSS when using this update algorithm for a two-by-two blind source separation system. Based on the exclusive maximum tap-selection scheme proposed in [11], we propose the exclusive maximum timedomain convolutive BSS algorithm (XM BSS) as follows.
Define p as the interchannel tap input magnitude difference vector at time n as Sort p in descending order as Order x 1 and x 2 according to the sorting of p such that Taps corresponding to the M = 0.5L largest elements of the input magnitude difference vector p in the first channel and the M smallest elements of p in the second channel are selected for the updating of the output signal y 1 ; Taps corresponding to the M = 0.5L largest elements of the input magnitude difference vector p in the second channel and the M smallest elements of p in the first channel are selected for the updating of the output signal y 2 . The detailed algorithm is shown in Algorithm 2.

Computational complexity of the proposed algorithm
The complexity is defined as the total number of multiplications and comparisons per sample period for each channel. In XM convolutive BSS algorithm, we need to sort the interchannel tap input magnitude difference vector. For an unmixing system with filter length L, we require at most 2+2 log 2 L comparisons per sample period by the SORTLINE procedure [13]. However, the number of multiplications required for computing convolution per sample period is reduced from 4L to 2L for a two-by-two BSS system. Thus, the overall computational complexity is still reduced provided L > 2, which is always satisfied for convolutive BSS case.

SEPARATION PERFORMANCE EVALUATION
In this section, we describe separation performance evaluation measurement used in our simulations.

Performance evaluation by signal-to-interference ratio
The performance of blind source separation systems can be evaluated by the signal-to-interference ratio (SIR) which is defined as the power ratio between the target component and the interference components [14].
In basic instantaneous BSS model, the mixing system is represented with A, the unmixing system is represented with W, the global system can be presented as P = A * H. Each element in ith row and jth column of P is a scalar p i j . The SIR of output i is obtained as for instantaneous BSS case.
In the convolutive BSS model, the mixing system is represented with H, the unmixing system with W. We can express the global system as P = W * H and each element in P is a vector p i j .
The SIR of output i is obtained as for convolutive BSS case, where * is the convolution operation and E{} is the expectation operation.

8
EURASIP Journal on Audio, Speech, and Music Processing

Performance evaluation by PESQ
When the target signal in our simulations is a speech signal, we will also use PESQ (perceptual evaluation of speech quality) as a measure confirming the quality of the separated signal. The PESQ standard [15] is described in the ITU-T P862 as a perceptual evaluation tool of speech quality. The key feature of the PESQ standard is that it uses a perceptual model analogous to the assessment by the human auditory system. The output of the PESQ is a measure of the subjective assessment quality of the degraded signal and is rated as a value between −0.5 and 4.5 which is known as the mean opinion score (MOS). The larger the score, the better the speech quality.

Experiment setup
In the following simulations, our source signals s 1 and s 2 are generated as gamma signals or speech signals. The gamma signals are generated with zero mean, unit variance. The speech signals used in our simulations include 3 female speeches and 3 male speeches with sample rate 8000 Hz to form 9 combinations. A simple mixing system is used in our simulations to demonstrate and compare separation performance.
The mixing system is given by The mixture signals are obtained by convolving the source signals with the mixing system. The filter length in the separation system is set at 64. In the following, we will compare the separation performance of the regular convolutive BSS algorithm, MMax partial update BSS algorithm, and XM selective-tap BSS algorithm.

MMax partial update time-domain BSS algorithm for convolutive mixture
In this simulation, we test the performance of MMax partial update time-domain BSS algorithm for convolutive mixtures. In the following diagram, "reg" means regular timedomain BSS algorithm; "par56" means MMax partial update time domain BSS algorithm with M = 56; "par48" means MMax partial update time-domain BSS algorithm with M = 48; "par32" means MMax partial update time-domain BSS algorithm with M = 32, where M is the number of coefficients updated at each iteration in a given channel.
In the first experiment, we use generated gamma signals as the original signals and use (9) to get the mixture signals. The performance of regular time-domain convolutive BSS algorithm and MMax partial update convolutive BSS algorithm evaluated by the SIR measure defined in (26) is shown in Figures 7 and 8. From these diagrams, we can see that as expected, the MMax partial update convolutive BSS algorithm converges slightly slower than the regular BSS algorithm while only a subset of coefficients gets updated. However, it converges to similar SIR values.
In the second experiment, we use speech signals as the original signals and use the same mixing system to get the mixture signals. In Figures 9 and 10  combination of speech signals, the separation performance is evaluated by SIR. The performance for other combinations of speech signals is similar to that shown in Figures 9 and 10.
Since we used speech signals in the second experiment, we also use PESQ to evaluate the separation performance. In the following, we evaluate the similarity between the mixtures, the separated signals from regular and MMax BSS algorithms with the original source signals by PESQ score. Table 1 shows the average PESQ evaluation results for different combinations of female and male speech signals, where (S1,S2) present the original source signals; (mix1,mix2) present the mixture signals; (regular out1, regular out2) present separated signals from regular BSS algorithm; (partial M = 56 out1, partial M = 56 out2) present separated signals from MMax BSS algorithm with M = 56; (partial M = 48 out1, partial M = 48 out2) present separated signals from MMax BSS algorithm with M = 48; (partial M = 32 out1, partial M = 32 out2) present separated signals from MMax BSS algorithm with M = 32.
From Table 1, we can see that the separation performance evaluated by PESQ is consistent with the SIR results. The separation algorithms make the separated signals more biased to one source signal and away from the other source signal. The separation performance evaluated by PESQ and SIR is also consistent with our informal listening tests.
From the above simulation results, we can see that similar to MMax NLMS algorithm for single-channel linear filters, there is a slight deterioration in performance of the proposed MMax partial update time-domain convolutive BSS algorithm as the number of updated coefficients is reduced. However, the performance at 50% coefficients updated is still quite acceptable.

Time-domain exclusive maximum selective-tap BSS for convolutive mixture
In this simulation, we test the performance of XM selective tap time-domain BSS algorithm for convolutive mixtures.
In the first experiment, we use generated gamma signals as the original signals and use (9) to get the mixture signals. The performance of regular time-domain convolutive BSS algorithm and XM selective-tap convolutive BSS algorithm evaluated by SIR is shown in Figures 11 and 12.
From Figures 11 and 12, we can see that XM BSS algorithm has much better convergence rate compared with regular BSS algorithm for generated gamma signals.
In the second experiment, we use speech signals as the original signals and use the same mixing system to get the mixture signals. In Figures 13 and 14, we show the performance of regular time-domain convolutive BSS algorithm and XM selective tap BSS convolutive algorithm for one combination of speech signals, the separation performance is evaluated by SIR. The performance for other combinations of speech signals is similar with that shown in Figures 13 and  14.
From the plots, we can see that the XM BSS algorithm has much better convergence rate compared with the regular BSS algorithm for both generated gamma signals and speech signals.
Since we used speech signals in the second experiment, we also use PESQ to evaluate the separation performance. In the following, we evaluate the similarity between the mixtures, the separated signals from regular and XM BSS algorithms with the original source signals by PESQ score. Table 2 shows the average PESQ evaluation results for different combinations of female and male speech signals, where (S1, S2) present the original source signals; (mix1, mix2) present the mixture signals; (regular BSS out1, out2) present separated     signals from regular BSS algorithm; (XM BSS out1, out2) present separated signals from XM BSS. The performance evaluation by PESQ is consistent with that measured by SIR. The separation performance evaluated by PESQ and SIR is also consistent with our informal listening tests. Based on the above simulation, we can see that XM BSS algorithm significantly improves the convergence rate compared with regular time-domain convolutive BSS algorithm.

CONCLUSION
In this paper, we investigate time-domain convolutive BSS algorithm and propose two novel algorithms to address the slow convergence rate and high computational complexity problem in time-domain BSS. In the proposed MMax partial update time domain convolutive BSS algorithm (MMax BSS), only a subset of coefficients in the separation system gets updated at every iteration. We show that the partial update scheme applied in the MMax LMS algorithm for single channel can be extended to multichannel natural gradientbased time-domain convolutive BSS with little deterioration in performance and possible computation complexity saving. In the proposed exclusive maximum selective-tap timedomain convolutive BSS algorithm (XM BSS), the exclusive tap-selection update procedure reduces the interchannel coherence of the tap-input vectors and improves the conditioning of the autocorrelation matrix so as to accelerate convergence rate and reduce the misalignment. Moreover, the computational complexity is reduced as well since only half of tap inputs are selected for updating. Simulation results have shown a significant improvement in convergence rate compared with existing techniques. The extension of the proposed XM BSS algorithm to more than two channels is still an open problem.