Acoustic object canceller: removing a known signal from monaural recording using blind synchronization

In this paper, we propose a technique for removing a specific type of interference from a monaural recording. Nonstationary interferences are generally challenging to eliminate from such recordings. However, if the interference is a known sound like a cell phone ringtone, music from a CD or streaming service, or a radio or TV broadcast, its source signal can be easily obtained. In our method, we define such interference as an acoustic object. Even if the sampling frequencies of the recording and the acoustic object do not match, we compensate for the mismatch and use the maximum likelihood estimation technique with the auxiliary function to remove the interference from the recording. We compare several probabilistic models for representing the object-canceled signal. Experimental evaluations confirm the effectiveness of our proposed method.


Introduction
Unlike multichannel recording, to which various array signal processing techniques can be applied, removing nonstationary noise from a monaural recording is generally challenging.Some algorithms [1][2][3] for noise suppression are based on estimating a noise power spectrum.Because these algorithms assume stationary noise, the accuracy of noise estimation is imperfect.
However, the situation is different when the sound source waveform of the interference sound is known in advance.For instance, it becomes feasible to obtain signal waveforms for specific sounds such as ringtones of mobile phones, commercially distributed music, television broadcasts, and similar sounds.We define these signals as acoustic objects.As with general noise removal, various applications can be considered for removing these acoustic objects.For example, one might wish to eliminate mobile phone ringtones or alarms that were inadvertently included in a recording, remove the music to circumvent copyright issues, or attenuate any interfering noise to enhance the precision of speech recognition and acoustic scene recognition.Additionally, it may be desirable to remove announcements that are specific to certain locations in order to anonymize the location of the recording.This study aims to achieve high-precision removal of the acoustic object from monaural recordings by utilizing it.
We treat the obtained acoustic object as a new channel and apply array signal processing.Note that the recording contains an acoustic object regardless of when or where it was acquired.However, the sampling frequencies of the recording and the available acoustic object can be mismatched even when the nominal sampling frequencies are the same.The time drift due to the sampling frequency mismatch makes the frequency response time variant, which differs from the assumption in array signal processing.
In this study, we propose an "acoustic object canceller, " a framework to remove an acoustic object from a monaural recording.The monaural recording and the obtained acoustic object are treated as components of an asynchronous microphone array and we apply one of the blind synchronization methods [7] for compensating the sampling frequency mismatch.Then, the frequency response of the acoustic object is determined by the maximum likelihood estimation by the auxiliary function method, also known as the majorization-minimization (MM) algorithm [20], so the acoustic object is removed from the recording.
This paper is partially based on a conference paper [21] in which we proposed the framework of the acoustic object canceller.In summary, the main contributions of this paper are as follows.
• We consider three types of model for the objectcanceled signal for the maximum likelihood estimation: generalized Gaussian distribution, multivariate Laplace distribution, and local Gaussian distribution.• We experimentally investigate the dependence of the performance on the model parameters using the three types of desired sound and four types of acoustic objects.• To confirm the effectiveness of the acoustic object canceller, we compare it with the amplitude-based noise suppression method [22].• We also evaluated the sound quality of the proposed method using two speech quality metrics: Perceptual Evaluation of Speech Quality (PESQ) [23] and Short-Time Objective Intelligibility (STOI) [24].
The rest of this paper is organized as follows.In Section 2, we describe the problem setting.In Section 3, the acoustic object canceller is described.In Section 4, we carried out evaluations from the following four perspectives: (i) the effectiveness of the synchronization, (ii) the performance in the model for frequency response estimation, and (iii) comparison with the conventional method, and (iv) the evaluation of sound quality.We conclude the paper in Section 5.

Problem setting
We assume a situation where an acoustic object interferes with a monaural recording.(1)

Acoustic object canceller
We propose a framework "acoustic object canceller" that removes the acoustic object signal from the monaural recorded signal.Figure 2 shows an overview of the signal processing blocks that make up the acoustic object canceller.The acoustic object canceller has two inputs and an output.The two inputs are the monaural recorded signal x[n] and the obtained acoustic object signal o[n].The output is the estimated target signal ŝ[n] from which the acoustic object signal is removed.
The acoustic object canceller consists of three major processes: time shift compensation, sampling frequency mismatch compensation, and frequency response estimation.In time shift compensation, we synchronize the recorded and acoustic object signals by a rough time shift (detailed in Section 3.1).In sampling frequency (2) x mismatch compensation, we compensate for the sampling frequency mismatch using blind synchronization techniques proposed for ad-hoc microphone arrays [7] (detailed in Section 3.2).In frequency response estimation, the frequency response of the acoustic object signal is obtained by maximum likelihood estimation, assuming the model of the target signal (detailed in Section 3.3).

Sampling frequency mismatch compensation
We compensate for the sampling frequency mismatch between the monaural recorded signal x[n] and the timeshifted acoustic object signal ô  In this study, we use the sampling frequency mismatch compensation technique [7].We adopt the same assumptions and approximations as those in the application of sampling frequency mismatch compensation [17,18].We assume that the sources have stationary amplitudes and are motionless and approximate that the phase difference between channels caused by sampling frequency mismatch is constant in the time frame m.Then, the sampling frequency mismatch ǫ o is compensated by a linear phase shift in the STFT domain.The signals compensated using accurate ǫ o are expressed as and are stationary at each discrete frequency k.Here, Ô(k, m; ǫ o ) is the acoustic object signal with sampling fre- quency mismatch compensation by linear phase shift and is expressed as where N FFT and N shift are the frame length and shift length of STFT, respectively.We assume that the monaural recorded and acoustic object signals X(k, m; ǫ o ) fol- low a multivariate Gaussian distribution with covariance matrix V(k) , and accurate compensation of ǫ o recovers the stationary X(k, m; ǫ o ) .Then the log-likelihood func- tion is expressed as where {•} H denotes the conjugate transpose, and the covariance matrix V(k) is the parameter of the log-like- lihood function.The V(k) is obtained by sample estima- tion using X(k, m; ǫ o ) .The sample estimation for V(k) is described as Here, M is the total number of time frames.Substituting Eq. ( 8) into Eq.( 7), the first term in Eq. ( 7) is constant, as derived by the following equation: where I and K indicate a 2 × 2 identity matrix and the total number of frequency bins, respectively.Tr(•) denotes the trace of matrix.In this derivation, we use a matrix formula Tr(ABC) = Tr(BCA) for any matrices A , B , and C such that ABC is a square matrix.The log-likeli- hood function simplifies to the following equation where the constant term is excluded.
When the sampling frequency mismatch ǫ o is not com- pensated accurately, the log-likelihood function J (ǫ o ) will be small owing to the reduced stationary caused by drift.Therefore, we can estimate ǫ o by maximizing J (ǫ o ) .Unfortunately, an estimate of ǫ o that maximizes the like- lihood J (ǫ o ) cannot be obtained analytically.We per- form a rough full search of ǫ o and then a golden section search [7].

Frequency response estimation
From Eq. ( 1), when the length of the impulse response h(t) is sufficiently smaller than the window length of STFT, the recorded signal in the STFT domain is described as where H(k) is the frequency response of the acoustic object signal.Note that H(k) may not be the frequency response of the actual impulse response h(t) due to the effect of τ described in Eq. ( 4).We assume that S(k, m) and Ô(k, m; ǫ o ) are uncorrelated.( 8) The target signal S(k, m) can be obtained by rewriting Eq. ( 11) as: Since time-invariant H(k) is the only unknown factor in Eq. ( 12), we focus on how to estimate it.
In this study, we adopt maximum likelihood estimation to estimate H(k) instead of using power minimization, which has been commonly used in conventional echo and noise cancellers.It is known that assuming a suitable distribution p(x), which represents the statistical characteristics of the desired sound, is effective in various applications, such as in echo canceller [25] and blind source separation [26][27][28].We assume three distributions often assumed in the blind source separation: the generalized Gaussian distribution [26], multivariate Laplace distribution [27], and local Gaussian distribution with variance represented by nonnegative matrix factorization (NMF) [28] (see Table 1).
In maximum likelihood estimation, we estimate the frequency response as where C(H(k)) is a negative log-likelihood function and is described as In the following section, we derive update formulae to estimate frequency response.

Generalized Gaussian distribution
The probability density function of the generalized Gaussian distribution is given as where α and β are the scaling and shape parameters, respectively.It includes a Gaussian distribution when β = 2 and a Laplace distribution when β = 1 .Hereaf- ter, we consider 0 < β ≤ 2 that corresponds to a super- Gaussian distribution.Under the above assumptions, the objective function to be minimized, that is, the negative log-likelihood function, is given by where parameter-independent terms are omitted.Note that, in the case of β = 2 , minimizing Eq. ( 16) is equiva- lent to minimizing the power of the target signal S(k, m), as has been commonly used in the conventional echo canceller and noise canceller.
The optimization problem to minimize Eq. ( 16) in terms of H(k) has no closed-form solutions in the case of β = 2 .We apply the auxiliary function method, also known as the majorization-minimization (MM) algorithm [20].In the auxiliary function method, we define the auxiliary function, which is an upper bound of an objective function and is easier to optimize.Given the auxiliary function, we can derive an efficient algorithm that minimizes the objective function by iteratively minimizing the auxiliary function instead of the objective function.
An auxiliary function for Eq. ( 16) is obtained by the theorem described in [29].According to the theorem, for the continuous and differentiable even function G(x) of x, if G ′ (x)/x is continuous, x > 0 , positive, and monotonically decreasing, holds for any x, and the equality condition is x = ±x 0 .
From Eq. ( 17), the auxiliary function of Eq. ( 16) is calculated as ( 16) Table 1 Model of S(k, m) and parameters.In Local Gaussian distribution based on NMF, we assume that the variance r(k, m) can be expressed by NMF: r(k, m) = c a(c, m)b(c, k)

Model p(S(k, m)) Parameters
Generalized Gaussian distribution p(S(k, m)) Here, H 0 (k) is an auxiliary variable.The auxiliary func- tion can be written as follows.Note that terms that do not depend on H(k) are omitted.

Equation (20) has a closed-form solution of H(k) because it is quadratic in form with respect to H(k).
The following update equation is obtained by differentiating Eq. ( 20) with respect to H(k) and setting it to 0 and then substituting the frequency response before the update for H 0 (k).{•} * denotes the complex conjugate operator.The esti- mated target signal Ŝ(k, m) is obtained by applying these updates sufficiently.

Multivariate Laplace distribution
The probability density function of the multivariate Laplace distribution is shown as where S(m) = [S(0, m), S(1, m), . . ., S(K , m)] ⊤ and σ is the scaling parameter.Equation (23) depends on the norm of the vector S(m) that assembles all frequency components of the target signal into one vector.
Using this probability density function, the objective function to be minimized, the negative log-likelihood function, can be obtained as ( 18) where terms that do not depend on H(k) are omitted.In Eq. ( 24), k |S(k, m)| 2 is included in the square root and has no closed-form solution for H(k).Therefore, we apply the auxiliary function method to Eq. ( 24) to obtain the solution (see Appendix).We obtain the update rules shown as By sufficiently updating Eq. ( 25) and Eq. ( 26), the target signal is obtained as Ŝ(k, m).

Local Gaussian distribution based on NMF
The probability density function of the local Gaussian distribution is shown as where r(k, m) is the variance of the local Gaussian distribution.We assume that the variance r(k, m) can be expressed by NMF, where a(c, m) and b(c, k) denote the activation and the basis in NMF, respectively.c denotes the index of the basis.The objective function, the negative log-likelihood function, is given by where parameter-independent terms are omitted.Equation ( 29) is a quadratic form for H(k) and has closed-form solutions.On the other hand, Eq. ( 29) has no closed-form solutions for a(c, m) and b(c, k) because the first and second terms are an inverse function and a logarithmic function of them, respectively.
Therefore, we apply the auxiliary function method to obtain the solutions for a(c, m) and b(c, k) (see Appendix).We obtain the following update formulae: (27) The estimated target signal is obtained by sufficiently updating Eqs. ( 30), (31), and (32).

Experimental conditions
In this experiment, we created a dataset by simulation.

Initially, we generated s[n] and o[n] * h[n] . To make s[n] and o[n] * h[n]
, we used Pyroomacoustics [30].A 4.1 × 3.8 × 2.8 m 3 virtual room where T 60 is 0.40 s was considered.The speech and acoustic object sources and a microphone were randomly positioned in the virtual room and impulse responses were made.
There were three types of target signals s[n]: (i) speech signal convolved with the impulse response, (ii) environmental sound signal, and (iii) a mixture of speech signal convolved with the impulse response and environmental sound signal at 5 dB.As the speech signal, we used utterances in the Japanese Newspaper Article Sentences (JNAS) corpus [31].This corpus includes utterance signals of a sentence in Japanese.To obtain an utterance signal longer than the object signal, we concatenated the utterance signals of the same speaker.As the environmental sound, we used the TUT Acoustic scenes 2016, Evaluation dataset [32].This dataset includes a 10-s environmental sound signal.To obtain an environmental sound signal longer than the object signal, we concatenated the environmental sound signals of the sequential scene "Grocery store." In target (i), S(k, m) is known to follow a super-Gaussian distribution [33].For target (ii), S(k, m) may follow a Gaussian distribution due to the presence of various sounds.In target (iii), S(k, m) is considered to follow a distribution closer to Gaussian than in target (i).
The acoustic object signals o[n] were the following four types of sound: Electronic Alarm, BGM, Broadcast, and Announce.We used a windows notification sound signal in the Electronic Alarm case.In the BGM case, we used the mixture signal of "ANiMAL-ClinicA" in DSD100 [34].In the Broadcast case, we used an audio signal from a YouTube video 1 .In the Announce case, we used a train (31) b .
announcement signal of a JR East Yamanote Line intrain automatic announcement [35].The signal length of the Electronic Alarm was 6 s.We clipped other signals (BGM, Broadcast, Announce) to 30 s. Electronic Alarm, BGM, Broadcast, and Announcement assumed a short duration of music, a long duration of music, combine a long duration of speech and music, and a long duration of speech, respectively.The sampling frequency of all signals was unified at 16,000 Hz.
To to make a recorded signal x[n], we randomly determined a time difference t d and mixed s[n] and o[n] * h[n] at an input signal-to-noise ratio (SNR).
Here, the sum of n is taken for the period when either the target signal or the acoustic object signal is not silent.The sampling frequency mismatch was simulated by resampling the recorded signals.For each input SNR and mismatch combination, we generated ten recorded signals with random time differences and source placements.
For evaluation, we used the SNR improvement is the difference between the input SNR SNR input and output SNR SNR output in Section 4.2, Section 4.3 and Section 4.4.We define the output SNR: where ŝ[n] is the estimated target signal.
For STFT, the fast Fourier transform was performed at 8192 points with a 4096-length Hamming window, and a shift length was half the window length.The number of update iterations was 20 for the generalized Gaussian and multivariate Laplace distributions to attain sufficient enhancement.However, for the local Gaussian distribution based on NMF, 20 was insufficient, and we iterated the updates 200 times to obtain sufficient enhancement.We set the initial frequency response H init (k) = 1.

Effectiveness of mismatch compensation
In this experiment, we evaluated the effectiveness of sampling frequency mismatch compensation from the following two perspectives: the difference in the acoustic object type and the difference in sampling frequency mismatch.We compared "w/o sync." and "w/ sync., " which indicate SNR improvement without and with blind synchronization, respectively.We used recorded signals x [n] where the target signal s[n] was target iii), and the acoustic object signals o [n] were Alarm, BGM, Broadcast, and Announce.We set SNR input as 0 dB.The sam- pling frequency mismatches ǫ o were ±31.25 and ±62.5 ppm.We assumed the multivariate Laplace distribution, (33) which has no parameter dependence in the frequency response estimation because we investigated the parameter dependence of models for three types of target signals in Section 4.3.
Figure 3 shows the SNR improvement for different acoustic object types (BGM, Broadcast, Announce, and Alarm).We focus on the results of recorded signals, where the sampling frequency mismatch was 62.5 ppm.In the the box-plots, the box extends from the first to the third quartile of the SNR improvements, with a line at the median.The whiskers extend from the box by 1.5× the interquartile range.Outliers are those past the end of the whiskers.From Fig. 3, we have demonstrated that the performance was significantly improved by applying the blind synchronization technique.On the other hand, the difference in SNR improvement with and without blind synchronization was almost insignificant when the acoustic object was Alarm.It would be reasonable to infer that the shorter the signal length of the acoustic object signal, the less affected the performance and the less susceptible the sampling frequency mismatch.Therefore, it suggests that when the signal length of the acoustic object signal was short, it had less impact on performance by time drift due to sampling frequency mismatch.
Figure 4 shows the removal performance for different sampling frequency mismatches ( ±31.25 , ±62.5 ppm), where positive and negative ppm correspond to upsampling and downsampling, respectively.We focused on the results where acoustic object signals were BGM, Broadcast, and Announce because these acoustic objects were significantly affected by sampling frequency mismatch.Figure 4 demonstrates the performance improvement upon applying the blind synchronization technique.In addition, we confirmed that the difference in SNR improvement between the absolute values of 62.5 ppm and 31.25 ppm of sampling frequency mismatch is about 2 dB when the synchronization method is not applied.
We have demonstrated that in environments where mismatches occur, the removal performance is affected by the signal length of the acoustic object (see Fig. 3) and the amount of sampling frequency mismatch (see Fig. 4) and that the blind synchronization technique could reduce these effects.We also confirmed these findings where the target signal was targets (i) and (ii) in the preliminary experiments.
Figure 5 shows examples of spectrograms.The upper left shows the recorded signal where we used target (iii), BGM, and set input SNR at 0 dB and sampling frequency mismatch at 62.5 ppm.The upper right shows the target (iii).The lower left shows the estimated target signal without blind compensation for sampling frequency mismatch.The lower right shows the estimated target signal with blind compensation for sampling frequency mismatch.
From Fig. 5, we confirm that the acoustic object signal was almost completely removed by the proposed method with blind synchronization (lower right) compared with that without synchronization (lower left).
Fig. 3 SNR improvement between four types of acoustic objects (BGM, Broadcast, Announce, and Alarm) where target signal was target (iii), input SNR was 0 dB and sampling frequency mismatch was 62.5 ppm

Performance with change in the model
We compared the SNR improvement of the models with various parameters for frequency response estimation.We used recorded signals x [n] where the target signals s [n] were target (i), target (ii) and target (iii),   Table 2 shows the SNR improvement averaged by each target for each model.Here, the generalized Gaussian distribution is denoted as "L-GG, " the multivariate Laplace distribution as "M-Laplace, " and the local Gaussian distribution based on NMF as "L-G-NMF." According to Table 2, the averaged SNR improvement differed depending on the model and parameters.
First, we focused on the results for L-GG.For (i) speech signal, the highest performance was attained when β < 2 , which corresponds to a super-Gaussian distribution.For (ii) environmental sound signal, the peak of performance was obtained with β = 2 , which corresponds to a Gauss- ian distribution.We have demonstrated that the parameters that maximized the SNR improvement changed depending on the target signal type.
Second, we focused on the results for L-G-NMF.In L-G-NMF, there was no significant difference in SNR improvement for a change in the number of bases c.In L-G-NMF, when the number of bases is small, the model might have insufficient capability to represent the target signal.On the other hand, the number of bases is large, the model might represent the target signal but also represent the acoustic object signal than when the number of bases is small.It would be reasonable to infer that results were not significantly changed by this tradeoff relationship.
Finally, we focused on the results for M-Laplace.We have demonstrated that the SNR improvement with M-Laplace was greater than with other models when the target signal was target (i).It suggests that M-Laplace represented the co-occurrence relationship of spectra and that the speech signal fit the model.

Comparison with amplitude-based method
In this experiment, we compared the proposed method with the conventional amplitude-based method [22] (see Appendix) from the following two perspectives: the difference in input SNR and the difference in the target type.We compared the SNR improvement among three approaches: "w/o sync." (without blind synchronization), "w/ sync." (with blind synchronization), and "Amp." (amplitude-based method).We utilized the recorded signals that were previously employed in Section 4.3.We assumed the multivariate Laplace distribution as the target model in the proposed method since Table 2 shows no significant performance differences, and multivariate Laplace distribution is parameter independent.
Figure 6 shows the SNR improvement for the four different types of input SNR ( −5, 0, 5, 10 dB). Figure 6 dem- onstrates that the performance of the proposed method ("w/ sync.") was the best.In the high input SNR case, o(t) is smaller than s(t).This may reduce the estimation accuracy of τ and h(t) and lead to a decrease in SNR improvement.We also confirmed that the conventional amplitude-based method showed little performance improvement when the input SNR was 10 dB.
Figure 7 shows the SNR improvement for each target type.According to Fig. 7, the SNR improvement of the conventional method is greater than that without synchronization.On the other hand, the results of the method with blind synchronization (proposed method) is higher than that of the conventional amplitude-based method.We have demonstrated the effectiveness of the proposed method with blind synchronization.
Figure 8 shows examples of spectrograms.The upper left shows the recorded signal where we used target (iii), BGM, and set input SNR at 0 dB and sampling frequency mismatch at 62.5 ppm.The upper right shows the target (iii).The lower left shows the estimated target signal of the conventional amplitude-based method.The lower right shows the estimated target signal of the proposed method.
In Fig. 8, we can confirm that the conventional and proposed methods almost completely removed the Table 2 Average SNR improvement of different target models.The generalized Gaussian distribution is denoted as "L-GG, " the multivariate Laplace distribution as "M-Laplace, " and the local Gaussian distribution based on NMF as "L-G-NMF." Parameters are the shape parameter β for L-GG and the basis number c for L-G-NMF acoustic object signal.A small target signal component was also removed in the conventional method, which may have contributed to the performance difference.

Evaluation of sound quality
In this experiment, we evaluated the sound quality of the proposed method by employing two speech quality metrics: Perceptual Evaluation of Speech Quality  2.
Figure 9 shows the average PESQ for the four different types of input SNR ( −5, 0, 5, 10 dB).According to Fig. 9, the PESQ of the estimated target signals was higher than the recorded signal.We also confirmed that the PESQ of the estimated target signals of the amplitude-based method was lower than the proposed method.In particular, the larger the input SNR, the more significant the difference in PESQ between proposed and amplitude-based methods.It might be due to speech distortion in the amplitude-based method.We confirmed the effectiveness of the proposed method.
Figure 10 shows the average STOI for the four different types of input SNR ( −5, 0, 5, 10 dB).According to Fig. 10, the STOI of the estimated target signals was higher than the recorded signal.We also confirmed that the STOI of the estimated target signals of the amplitude-based method was lower than the proposed method.In particular, the larger the input SNR, the more STOI differences between the proposed and amplitude-based methods.When the input SNR was 10 dB, the STOI of the estimated target signal by amplitude-based method showed little change from the STOI of the recorded signal.It might be due to speech distortion in the amplitude-based method.We confirmed the effectiveness of the proposed method.

Conclusion
In this study, we proposed the acoustic object canceller, a framework for removing the acoustic object signal from the monaural recorded signal.In the acoustic object canceller, first, we synchronized the monaural recorded signal and the available acoustic object signal.Second, we estimated the frequency response of the acoustic object by the maximum likelihood estimation assuming three types: generalized Gaussian distribution, multivariate Laplace distribution, and local Gaussian distribution based on NMF.In the experiments, we have demonstrated the effectiveness of applying the synchronization technique and investigated the performance of the model types.

Fig. 1
Fig. 1 Problem setting.The acoustic object o(t) radiated through an acoustic path interferes with the monaural recording.Here, the loudspeaker has a D/A converter that converts o[n] to o(t), and the microphone has an A/D converter that converts x(t) to x[n]

Fig. 2
Fig.2Overview of procedures in proposed acoustic object canceller.First, we synchronize the recording and acoustic object (time shift compensation and sampling frequency mismatch compensation).Then the acoustic object is removed using an estimated frequency response and 30-s acoustic object signals o[n] (BGM, Broadcast, and Announce) that signal lengths are the same.We set SNR input as −5, 0, 5, 10 dB.The sampling frequency mis- matches ǫ o were ±31.25 and ±62.5 ppm.Since we con- firmed the effectiveness of sampling frequency mismatch

Fig. 4
Fig. 4 SNR improvement between four types of sampling frequency mismatches ( ±31.25 , ±62.5 ppm) where target signal is target (iii), input SNR was 0 dB and acoustic object signals were BGM, Broadcast, and Announce

Fig. 5
Fig. 5 Examples of spectrograms.Upper left shows the recorded signal where we used target (iii), BGM, and set input SNR at 0 dB and sampling frequency mismatch at 62.5 ppm.Upper right shows the target (iii).Lower left shows the estimated target signal without blind compensation for sampling frequency mismatch.Lower right shows the estimated target signal with blind compensation for sampling frequency mismatch

Fig. 8 Fig. 9 Fig. 10
Fig. 8 Examples of spectrograms.Upper left shows the recorded signal where we used target (iii), BGM, and set input SNR at 0 dB and sampling frequency mismatch at 62.5 ppm.Upper right shows the target (iii).Lower left shows the target signal estimated by the conventional amplitude-based method.Lower right shows the target signal estimated by the proposed method