De-Noising of Sparse Signals Using Mixture Model Shrinkage Function

In this work a new thresholding function referred to as ’mixture model shrinkage’ (MMS) based on the minimization of a convex cost function is proposed. Normally, thresholding functions underestimate larger signal amplitudes during the de-noising process. The proposed model is a more flexible shrinkage function as it solves the underestimation problem to a greater extent and thus efficiently de-noises the signal without affecting signal amplitudes. The Expectation minimization (EM) algorithm is used to find the model parameters along with the majorization-minimization (MM) algorithm that minimize the monotonic cost function. The proposed model is then applied for de-noising group sparse signals and Shepp Logan phantom images. Our experimental study shows that MMS outclasses current thresholding functions and overlapping group shrinkage algorithm without results suffering from underestimation. Furthermore, the proposed model has the smallest Root Mean Square Error (RMSE) for de-noising group sparse signals.


I. INTRODUCTION
Different sparsity-based algorithms have been developed in the past to de-noise and recover sparse signals and images, that is, soft thresholding [1], hard thresholding [2], [3], [4], firm thresholding [5], non-negative garrote thresholding [6], hyperbolic tangent thresholding [7], logarithmic thresholding [8], hankel sparse low-rank approximation [9], proximal operators [10], [11], [12], alternating direction method of multipliers [13], [14], block thresholding [15], and overlapping group shrinkage (OGS) [16]. Along with these established techniques, some new techniques are also used for de-noising of specific image types. Jawad in [7] uses hyperbolic tangent thresholding and Hayat in [8] uses logarithmic based thresholding for de-noising biomedical images. An adaptive thresholding method based on neural networks is used for de-noising of Gaussian and speckle noise in natural, The associate editor coordinating the review of this manuscript and approving it for publication was Kathiravan Srinivasan . ultrasound, and SAR (synthetic aperture radar) images [17]. Liu uses a convolutional neural network approach and guided filtering for SAR image de-noising [18] and hybrid frequency modulations for the removal of speckle-noise from such images [19]. In this paper, we consider the estimation of signal x from its noisy version h.
where δ is the zero-mean white Gaussian noise which is independent of x. h is the noisy observed signal, and we seek the possible noise-free signal x. The problem can be formulated in the transformed domain as s represents noisy coefficients, (e.g., wavelet coefficients), and r represents the noiseless coefficients. If the transform is orthogonal, then the noise in the transform domain has the same correlation function as the original noise in the signal domain, therefore, when the transform is orthogonal, VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ white noise in the signal domain becomes white noise in the transform domain. To approximate a suitabler (the estimated signal), basis pursuit de-noising [13], [20] and LASSO (least absolute shrinkage and selection operator) [21] are used in literature, which result in a soft threshold function.This function is frequently used for the de-noising and recovery of sparse signals. Chen previously used OGS for de-noising of group sparse signals [16]. OGS is a fine thresholding method that uses the clustering property of a given signal or coefficients of a vector r. The algorithm does not act on r block-wise but acts on the whole r and minimizes the cost function. The OGS thresholding function is derived using the MM algorithm [22], [23]. OGS also underestimates the recovered signal. In this paper, we develop an algorithm referred to as 'Mixture Model Shrinkage' (MMS), which is also derived using the MM algorithm. As an application example, we consider here the de-noising and recovery of group sparse signals and Shepp-Logan phantom images. To achieve this, we are modelling the wavelet coefficients using the mixture of a Gaussian distribution. Our goal is to estimate r from noisy observations s. The estimated signal is denoted byr. As the estimated signalr is dependent on the observed noisy signal s, so, it can be denoted asr(s). This can be estimated by using the maximum aposteriori (MAP) estimator [24], [25], [26], which is based on the probability density function (PDF) of r. For example, if s is the observed signal, then the MAP estimator has to estimate the value of r as to which is more likely to occur (that is, the MAP estimator is seeking the specific value of r, where the probability of r is the peak value of r). The MAP estimator can be defined aŝ where 'arg max' is the argument value of the point where the peak of the function occurs. The PDF p r|s (r|s) is the distribution of r, given a specific value of s. In section III, the algorithm is discussed in detail. The MAP estimator is given in (3). The problem is solved through different methods [24], [27], [28]. In [28], coefficients of the signal are modelled as Laplacian PDFs, and a close form or explicit solution to the MAP estimator is obtained in the form of a soft thresholding function. Although this technique removes noise from sparse and group sparse signals, it badly affects and over-shrinks the signal of interest. Natural signals and images actually possess group sparsity in some transform domains, for example, in the case of the wavelet transform and Fourier transform domain. In these cases, linear shrinkage functions cannot properly separate the coefficients with larger amplitudes [29], [30], [31]. In the body of literature, different authors use different properties of signals and images to de-noise data. Some of these used methods utilize the group sparsity of the signal for de-noising and recovery. Liu uses block structure sparsity of the data for de-noising by creating small signal patches to de-noise them, then block structure sparsity properties are used to dynamically group the patches to produce better de-noising results [32], [33]. OGS is a recent technique used for the de-noising of group sparse signals [16]. The method is derived by minimizing a sparsity promoting convex cost function. OGS minimizes the cost function monotonically and has no explicit formulation, such as soft thresholding, hard-thresholding, or firm thresholding. The technique is used for de-noising and recovery of group sparse signals. It performs better than soft, hard, and firm thresholdings but still shrinks larger values of the coefficients to a certain extent and it is evident from the de-noising of one dimensional signal in experiments section. It can also be applied for de-noising and recovery of speech signals. We use wavelet based de-noising. The main part of Wavelet de-noising is thresholding or shrinkage functions. The wavelet coefficients smaller than threshold value are discarded and the larger coefficients are retained or shrinked and the signal is then reconstructed. Soft, hard, and firm thresh are classical thresholding function. Hard thresholding is keep or kill rule. Soft thresholding shrinks the wavelet cofficients with larger absolute values than the selected threshold value. Firm thresholding modest way between hard and soft threhsoldings.
To properly solve the problem of underestimation of the larger amplitudes of sparse signals, this paper develops a new shrinkage/thresholding function called mixture model shrinkage function. The main contributions of the research are given here.
The mixture model shrinkage function is derived by using mixture of Gaussian distributions and maximizationminimization. The nonzero coefficients are modeled as a parameterized mixture of Gaussian variables. The parameters are then estimated using maximum likelihood minimization. The Gaussian mixture model decomposes the noisy signal into high and low frequencies. It classifies data based on statistical characteristics.
The motivation is the wavelet-transformed signal is sparse and most of the coefficients are zero or close to zero. The energy of the signal is contained by small and large coefficients in the wavelet transform. Noise is spread along all the coefficients. The larger coefficients carry the important features and information of the signal. The mixture model thresholding function is applied in the transform domain to further sparsify and de-noise the signal. The signal can be recovered without affecting its larger amplitudes.

II. RELATED WORK
Sparse signals de-noising and recovery is a challenging problem. Many prior methods have been proposed over the years to solve this problem, e.g. Convex optimization and greedy algorithms. Convex optimization-based techniques offer higher accuracy but greedy techniques are computationally more efficient and hence better fit for large-scale problems. Sparse approximate solutions to linear equations are classically obtained through L1 norm regularized least squares, but this method often underestimates the true solution. The low amplitude coefficients are discarded for sparsity requirements and the other nonzero coefficients are shrunk towards zero which causes an underestimation of the larger coefficients. In our de-noising/reconstruction of the sparse signals and images, we addressed this problem. Some of the methods used for de-noising and recovery of sparse signals are soft thresholding [1], hard thresholding [2], firm thresholding [5], and hyperbolic tangent thresholding [7]. An improved total variation regularization is applied to remove salt and pepper noise from images [36]. Mahdaoui used a compressive sensing approach based on regularization constraints for image de-noising [37]. Additive white gaussian noise is removed from images by using neural networks and soft thresholding in wavelet domain [38]. A novel image de-noising method is proposed by Onur by using Otsu's thresholding and wavelet transform to improve the de-noised image quality [39]. To increase the noise removal level and improve the quality of the de-noised images using singular value decomposition and thresholding [40]. Although these methods effectively reduce the signal noise but also underestimate the larger amplitudes of the signal, which consequently destroys some of the significant information of the signal. This paper proposes a mixture model shrinkage function that avoids the systematic underestimation characteristic of L1 norm regularization. The problem is stated as a mixture of Gaussian distribution and is then solved by the MM method. The proposed technique is evaluated based on its efficiency with respect to the de-noising performance and the computational timing.

A. PROPOSED METHOD
Wavelet coefficients are modeled as mixture of a Gaussian distribution. The problem is then solved by MM technique. This gives a closed form solution called mixture model shrinkage function. The complete method is given below.
Based on the discussed results in [22], [25], and [26] neighbouring coefficients have statistical dependencies even when they are uncorrelated, as in the case of wavelet transform. A wavelet coefficient in the vicinity of a larger coefficient will be larger even if they are mutually uncorrelated. Here, the behaviour of wavelet coefficients is formulated using mixture of Gaussian probability density functions.
where, 0 ≤ a ≤ 1, 'r' represents the noiseless coefficients of the signal, and a is the mixing proportion. σ 1 and σ 2 are the standard deviations associated with components one and two, respectively. To find r for which p r|s (r|s) has maximum amplitude [28], consider and so, rearranging terms, we get Equation (7) shows the Bayes ′ rule. Therefore, we get The term p s (s) does not depend on r, the value of r that maximizes right-hand side is not dependent on the denominator. The denominator does not influence right hand side of (8). So, (8) can be written without the denominator [28]. Therefore, the MAP estimate of r is given bŷ , is the PDF of the Gaussian random variable with mean equal to zero. That is, p s|r (s | r) = p δ (s − r). The MAP estimate of r can be written as: The maximum value of a function F(r) is not changing by applying a monotonic function G(r) (that is, if a specific value of r maximizes F(r), then the value of r will also maximize G(F(r))) [28]. The logarithmic function is a monotonically increasing function, therefore (9) can also be written as; or equivalently, The use of the logarithm simplifies the differentiation. It is assumed that the noise is the zero mean Gaussian with variance σ δ , By using (13), (12) can be written as [28]; Suppose f (r) = log(p r (r)), then (14) becomeŝ By setting the derivative w.r.t. r to zero, we find the MAP estimate of r.
We need a probabilistic distribution model p r (r) to properly represent the transform-domain coefficients vector, r. Previously used probabilistic models are given in [22], [23], [24], [25], [26], and [27]. Here, we model the coefficients as a mixture of a Gaussian distribution; Thus by using (4), (14) becomes To solve this problem, we use the majorization-minimization technique.
We apply the majorization-minimization technique to the mixture model given in (16). We get the majorizer of a function h given by the following relation [35], By using (16) and (30): As Putting (30) in (30), we get where c, d, c ′ and d ′ are given in (30), ) .
Setting the derivative equal to zero, we get, where r and s are noiseless and noisy coefficient vectors respectively. σ δ is the standard deviation of the noisy signal. σ 1 and σ 2 are the standard deviations of the noise-free transform coefficients. c ′ and d ′ are the points where the function f M has its minimum value and f M becomes equal to f . The parameters σ 1 , σ 2 can be calculated through expectation minimization in this work.
A schematic diagram of wavelet based de-noising using the proposed shrinkage function is given in Fig. 1. Wavelet transform is applied to decompose the noisy input signal/image. By applying wavelet transform, the total energy of the signal/image is distributed into few coefficients. A specific threshold value is set according to the requirements of the problem to hold or shrink certain values of the coefficients and discard others. Selection of threshold values is further discussed in the experiment section. Data consistency is performed in the transformed domain. Then the de-noised signal/image is examined. The algorithm is run until the maximum number of iterations reached.

B. ESTIMATING MODEL PARAMETERS
We use an expectation maximization (EM) [34] method for finding parameters of the mixture model. The linear superposition of Gaussian: where N (r|µ k , σ k ) denotes a normal distribution. K denotes the total number of Gaussian mixtures, the mixing coefficient or weight of each distribution is a k , µ k represents mean and σ k is variance of k th Gaussian. 0 ≤ a k ≤ 1, and  N denotes the number of observations. We calculate parameters in (28) 2σ 2 2 , c and d are constants.

Iterations:
Step-1: Take wavelet transform of the noisy signal to distribute energy in a few wavelet coefficients.
Step-2: Apply mixture model shrinkage function in the wavelet transform to threshold all low frequency sub-band coefficients and sparsify the input signal/image.
Step-5: Calculate peak signal to noise ratio and root mean square error of the output.
Step-6: Compare the results. N it = N it + 1 Repeat steps 1 through 5 until stop criteria are met. Output:r mixing coefficients in (30) have prior probabilities for the components. We estimate the corresponding posterior probabilities of r. Using Bayes rule where γ k is the latent variable, a k = N k N , and N k are the samples belonging to cluster k. EM is an iterative optimization algorithm and consists of two main steps (that is, the estimation step and the maximization step). We compute the expected values of the latent variable in the estimation step from the given values of the parameter. The parameters are updated in the maximization step using a maximum likelihood (ML) estimation. Here, our aim is to maximize the likelihood function w.r.t. the current parameters consisting of mean and co-variance of the elements and the mixing coefficients. The EM algorithm for finding parameters of the mixture model is summarized in Algorithm 2 and schematic diagram of EM algorithm is given in Fig. 2. We run the EM algorithm in MATLAB to find the parameters of the model. Values of the parameters are given in table 1.

Algorithm 2 Expectation Maximization for Finding Parameters of the Mixture Model
Step-1: Initialize µ j , σ j and a j and find the value of (30). µ j , σ j and a j are the means, variance, and weight of each distribution respectively.

Sep-2 (E-step):
Evaluate the responsibilities using the values of the parameters used in step-1. ,
Definition 2: The hard threshold function is defined as Definition 3: The firm threshold function firm: R → R with parameters λ > 0 and µ > λ is defined as As µ → λ the firm threshold function approaches hard threshold and µ → ∞ the firm threshold function approaches soft threshold function.

D. COMPUTATIONAL COMPLEXITY AND CONVERGENCE OF THE ALGORITHM
As the input signal s is sparse with sparsity k, so the computational complexity per iteration of MMS threshold is of order k, k ≪ N , N denotes size of vector s. The threshold function is derived by using the MM algorithm, the convergence of f (r) is guaranteed and decreases monotonically from iteration to iteration (that is, f (r(k + 1)) < f (r(k))). Fig. 3, it is clear that MMS is graphically close to hard thresholding and firm thresholding but is more flexible. In case of soft thresholding; if s is independent and identically distributed (iid) with s ∼ N (0; 1) and r is output of soft-thresholding then r is given by r = soft(s; λ). r is a vector with too many zeros (that is, the values of s, for which the magnitude of s ≤ λ are mapped to 0).

From
In case of mixture model thresholding, if s is iid with s ∼ N (0; 1) and r is the output of MMS, then it is given by r = mixture(s; a; σ 1 ; σ 2 ; σ δ ; N it ). Here, r is again a vector with too many zeros. s is the noisy signal, r is the noise-free signal, a is the mixing proportion, and σ 1 and σ 2 are the standard deviations associated with components of mixture one and two, respectively, while σ δ is the standard deviation of noise, and N it is the number of iterations.

IV. EXPERIMENTS A. ONE-DIMENSIONAL SIGNAL
As an illustration, the proposed MMS algorithm is applied to de-noising the 1D group sparse signal in Fig. 4(a). We simulate the 1D group sparse signal in MATLAB: N = 100;  x is noiseless signal and N is length of the signal. The x and y-axis are showing the time and amplitude, respectively. Fig. 4(b) show the independent white Gaussian noise with standard deviation σ = 0.5. The noisy signal is given in Fig. 4 (c). We apply the classical thresholding functions along with the proposed MMS function to de-noise the 1D group sparse signal. The results of the hard and firm thresholdings are given in Fig. 4(d) and 4(e) respectively, which are not satisfactory. The noise is removed up to a great extent in case of both the soft thresholding and OGS. The soft thresholding removes the noise contents but also badly affects the larger amplitudes of the original signal. OGS removes almost all noise but also attenuates the larger amplitudes to some extent. The root mean square error (RMSE) is a quality measurement parameter, while minimizing RMSE does not always lead to the most favourable de-noising results in practice. In our experiment, the proposed MMS model negligibly attenuates the larger amplitudes of the original signal when compared to soft thresholding and OGS, and the RMSE improved from 0.521 to 0.311, meaning all three methods (that is, soft thresholding, OGS, and MMS) remove the total noise contents. The de-noising results of soft-thresholding, OGS, and MMS is illustrated in Fig. 4(f), (g), and (h).

B. IMAGE DE-NOISING
To demonstrate the validity of our proposed model on biomedical images, we tested on Shepp-Logan phantom image of size 256 by 256. A 256 by 256 mask is applied to each original Shepp-Logan phantom image. The original image, the mask, and the resulting Fourier domain noisy images are shown in Fig. 6 (a), (b), and (c) respectively. We have reconstructed the images using soft, hard, firm, and the proposed thresholding models. We take the inverse Fourier transform of the given noisy image and apply our proposed thresholding function in the wavelet domain. That is,x t = Threshold(W * x i ; a; σ 1 ; σ 2 ; σ δ ; N it ), where 'W ' denotes the wavelet transform,x t is the thresholded value, and x i represents the individual samples of the image. The values below a certain threshold are discarded. Threshold values can be selected according to λ = 3σ [16]. It can also be selected as to reduce σ to a specified level. E.g., If σ ranges from 0.1 to 0.5, λ ranges from 0.30 to 1.5. We vary λ from 0.1 to 4. λ can also be selected to improve PSNR values. It depends on nature of the problem. Larger values of λ make the solution sparser but shrink the coefficient amplitudes. After selection of threshold value, the inverse wavelet transform is applied to recover the noisy image. Data consistency in the transform domain is enforced, and the algorithm is run up to the maximum numbers of iterations. The whole process and the projection onto convex sets (POCS) is detailed in [31]. Fig. 5 shows the point-wise error between actual and recovered data for the one-dimensional group sparse signal. Let r s , r OGS , r h , r f , and r m show the recovered values through soft, OGS, hard, firm and mixture model thresholding. The magnitude of the point-wise errors between actual and recovered data for each case are given by error s = abs(r − r s ), error OGS = abs(r − r OGS ), error h = abs(r − r h ), error f = abs(r − r f ), error m = abs(r − r m ). Fig. 5 shows better classification clustering behaviour of MMS as compare to the pre-existing classical techniques. In Fig. 5, the error points in the case of hard-thresholding and firm-thresholding are more scattered. In the case of OGS and soft-thresholding, the error points are mostly close to the origin, while some points are scattered. In the case of the proposed model, the error points are mostly around the origin and very few are scattered. Our proposed model exploits clustering properties of the coefficients of the signals and leads to minimum possible error. This shows the superiority of the proposed model. The number of errors of the proposed model is very low, as only a few values are out of order because it has the best clustering property as compared to other classical techniques. It is easier to accurately separate two or more different types of signals using our proposed model. The technique can be used for data classification.

V. RESULTS AND DISCUSSION
Our proposed shrinkage function has better tendency to separate different types of signal from a mixture. It is clear from Fig. 4 (h) that almost all noise was removed by our proposed model without affecting the larger signal amplitudes. VOLUME 11, 2023  The model has a minimum RMSE of 0.311, which is the best among all compared thresholding functions in this paper. For OGS, the RMSE is 0.521, for soft-threshold it is 0.818, for hard-threshold it is 0.433, and for firm-threshold the RMSE is 0.844.
In example 2, the reconstructed images by applying hard, soft,firm, and mixture model thresholding are given in Fig. 6 (d), (e), (f), and (g) respectively. We use the wavelet transform as a sparsifying transform in example 2. The quality of the recovered image through MMS is very high. The peak signalto-noise ratio is used as a quality measurement parameter to examine the accuracy of our proposed model. The PSNR curves for Firm, hard, soft, and mixture model thresholdings are given in Fig. 7 The higher the value of the PSNR, the better the recovery of the image, and the more accurate the recovery model is. The PSNR of the noisy image is 35.2791. The PSNR value for the proposed model is 87, which is far better than firm (76), soft-thresholding (85) and hard-thresholding (83).  The run time for OGS is 7.7 milliseconds while for MMS it is 5.3 milliseconds. MMS converges faster as compared to OGS. MMS reaches the peak value in just 8 iterations while soft-threshold takes 20 iterations to reach the peak value.

VI. CONCLUSION
This paper introduced a computationally-efficient algorithm for the de-noising of sparse and group sparse signals. In our approach, we derived a mixture model shrinkage using the MM algorithm, which is then used for de-noising of group sparse signals and Shepp-Logan phantom images as shown in the experiments. Our method is derived as a minimization of a convex cost function. The herein proposed shrinkage function overcomes the underestimation problem of larger values of the signals. MMS leads to better results when compared to soft-thresholding, OGS, hard-thresholding, and firm-thresholding in fewer iterations. The proposed model also has lower RMSE values than the other classical methods, which is one of the measures for better recovery of data. In the case of image de-noising, our proposed model achieves larger PSNR values than soft and hard-thresholding.
It can be used for de-noising and classification of sparse and group sparse signals. It can also be used for the recovery of MRI. The proposed model is not efficient for de-noising and classification of audio sources. We used our proposed model for speech enhancement (de-noising). The noisy speech is a male utterance, which is corrupted by additive white Gaussian noise. The main problem in speech enhancement is the audibility of the noise as musical noise. In speech enhancement, it is desirable to suppress the noise content to a high extent regardless of the RMSE. The proposed method can be used in parallel with l 1 norm (l 1 fused mixture model) for further improvement for speech de-noising in future. It would be very interesting to use the proposed model with some amendments for segmentation, classification, and motion detection.

ACKNOWLEDGMENT
The author would like to thank his foreign supervisor, Prof. Ivan W. Selesnick, for this academic mentorship, professional advice, and personal encouragement that he has provided throughout his research at the Tandon School of Engineering, New York University. He have been extremely lucky to have VOLUME 11, 2023 a supervisor who cared so much about his work, and who responded to his questions and queries so quickly. HAYAT  He is currently on leave from the Department of Electrical Engineering, Faculty of Engineering and Technology, International Islamic University Islamabad. He is also a Postdoctoral Researcher with the Department of Automation and Mechanical Engineering, Tampere University, Finland. He has many publications in the field of control and signal processing. His research interests include distributed control of multiagent systems, nonlinear control, robust and optimal control, signal processing, graph signal processing, optimization, machine learning, and computer vision.
SUHEEL ABDULLAH MALIK received the B.E. degree in electrical and electronics from Bangalore University, Bengaluru, India, in 1997, the M.S. degree in electronic engineering from Muhammad Ali Jinnah University (MAJU), Pakistan, and the Ph.D. degree in electronic engineering from International Islamic University Islamabad (IIUI), Pakistan. Since 2007, he has been with IIUI, where he is currently working as the Chairperson and an Associate Professor with the Department of Electrical Engineering (DEE). He has authored more than 20 journals and conference publications. His research interests include control systems, numerical investigation of nonlinear problems, application of nature-inspired metaheuristic algorithms for solving nonlinear problems, and signal processing. He has served as a member and the chair for some international conferences. He is also a reviewer of some journals.