Medical Image Denoising Based on Biquadratic Polynomial With Minimum Error Constraints and Low-Rank Approximation

,


I. INTRODUCTION
Medical imaging has been widely applied in clinical disease diagnosis with the advent of digital imaging technologies. However, due to the requirement of low exposure to radiation and some existing technique limitations, all the acquired images with low radiation dose are more or less contaminated by noise [1]. The noise decreases the visual quality of medical images and leads to a negative effect on the accuracy of clinical diagnosis. Generally, noise will increase with the amount of radiation being decreased. Therefore, as an important preprocessing step to improve the quality The associate editor coordinating the review of this manuscript and approving it for publication was Krishna Kant Singh. of medical images, denoising is highly desirable for proper medical image analysis.
Formally, the acquired noisy image X ∈ R m×n can be modeled as follows where Y denotes the noise-free image and N is the additive noise with the standard deviation τ . The aim of image denoising is to restore the noise-free image Y from its noisy version X as accurately as possible. In the past decades, the problem of noise reduction has been extensively studied and numerous denoising algorithms have been proposed in the literature. Most of existing algorithms can be formulated as the following minimization problem arg min where the first term X − Y 2 2 is the data-fidelity term that constrains Y to be consistent with X as accurately as possible, the second term R(Y ) is a regularization term that constrains Y subject to some a prior, and λ is a balance parameter. Earlier efforts mainly focused on least square based priors, such as total variation (TV) prior [2], [3], local autoregressive prior [4], and others. By penalizing local discontinuity or dissimilarity, these regularization priors can lead to smooth denoised images. As a result, some subtle details are partly lost, while they are crucial information for clinical disease diagnosis. To overcome the limitation of these local priors, Buades et al. [5] proposed the well-known non-local mean (NLM) prior, which assumes each pixel can be represented by a weighted averaging of its non-local similar pixels in the image. In fact, NLM is a nonlocal generalization of local autoregressive prior, which is modeled as where y i is the ith pixel of an image Y , w i,j = exp − Py i −Py j 2 2 σ 2 is the weight, σ is a similarity control parameter, and P and denote an patch extraction operator and a search window centered at the i the pixel, respectively. To improve NLM's shape-adaptivity, a variant is developed in [6]. In essence, both local and non-local priors attempt to model the spatial relationship of pixels [7], and apply this relationship to estimate each pixel.
Unlike the spatial domain denoising methods described above, noise reduction can also be conducted in the transform domain. A basic assumption is that an image can be sparsely represented by a set of basis functions, such as wavelets [8], curvelets [9], and learned representation dictionary [1], [10]. The power of transform based methods stems from the sparsity of transformation coefficients, i.e., sparse representation prior, which makes noise be more easily distinguished. A widely used sparse representation prior is defined as where α is a sparse representation vector, and D denotes a representation dictionary that can be constructed by using the fixed wavelet/curvelet basis or adaptively learned by a greedy algorithm from the noisy images or given noise-free images [11]. One disadvantage of sparse representation prior based methods is high computational complexity, while they have more effective denoising performance than traditional spatial methods.
Recent works have shown that low-rank priors are powerful models to reduce noise [12]- [14], in which an image is represented by a low-rank matrix. A simple and effective representation model for the low-rank prior is written as where rank(·) represents a rank function and r is the rank of matrix Y . When the rank r is unknown, finding the lowest rank matrix with rank constraint is NP-hard. In [15], the authors model the low-rank prior by using the nuclear norm that is defined by the sum of singular values of a matrix. In theory, the nuclear norm is the best convex approximation of the rank function. Under the low-rank assumption, the problem Eq. (2) can be translated as where Y * is the nuclear norm of Y . This model has been widely used to deal with various image restoration problems. Many variants, such as weighted nuclear norm and low-rank and sparse combination priors, have also been proposed to further improve its performance [16]- [18]. The minimization problem (6) is tractable by iterative singular value thresholding algorithm. However, the iterative algorithm is computationally expensive. In order to address this issue, the fixed rank strategy can be adopted to avoid the iterative process and leads to lower running time. The goal of the fixed rank strategy is to solve the following problem This problem has a close-form solution when the rank r is given [19]. Several feasible rank estimate schemes have been developed to determine the value of r [20], [21], which result in an efficient algorithm for low-rank image denoising. More recently, there exists a growing interest in using deep learning to handle medical image denoising problem [22]- [24]. The reason is that deep learning based denoising methods have shown a great promise. However, this type of denoising methods is limited by large training test data and high training time complexity. Therefore, in this paper, we still focus on the traditional prior-based denoising method. Inspired by the combined image denoisers [25], we present a two-stage image denoising method. Specifically, the method first applies a biquadratic polynomial surface reconstruction algorithm to derive an initial denoised image, and then the residual noise in the initial result is further reduced by a singular value shrinkage strategy.

II. OVERVIEW OF THE PROPOSED METHOD
In general, the noise in medical images is relatively small, which means that most of the important features in the images are preserved. Therefore, it is possible to reduce the noise in medical images by surface fitting techniques. The biquadratic polynomial surface representation of image patches is an effective tool for dealing with image restoration problems [26], [27]. The key issue is how to define the constraint conditions that are used to construct a fitted surface with more textural details and little noise. In addition, medical images always focus on a specific tissue. They exhibit a high selfsimilarity, which results in the low-rank representation of image patches. Thus it is very suitable to apply singular value decomposition theory to reduce noise. The proposed denoising method consists of two stages that depend on different regularization priors. The first stage exploits a biquadratic polynomial surface to derive an initial estimate of the noise-free image, in which the surface is constructed by dividing its coefficients into two groups. With the reconstruction error constraint, one group is used to minimize the gradient of the surface, and the other is to minimize the approximation accuracy of the surface. Based on the self-similarity of medical images and the intrinsic low-rank property, the second stage of the method is to optimize the singular value to further reduce the residual noise in the initial estimate. Figure 1 illustrates the denoising process of the proposed method.

III. DENOISING USING BIQUADRATIC POLYNOMIAL WITH MINIMUM ERROR CONSTRAINT
By using the gradient error as a regularization term, the minimization problem (2) can be formulated as where |∇Y | denotes the gradient term that enforces the image Y to be smooth. Numerical optimization algorithms have been developed to solve the Eq. (8). Unlike the traditional aforementioned methods, we propose a new method to solve it by decomposing it into two simple linear sub-problems. More specifically, the data-fidelity is approximated by a biquadratic polynomial, and the gradient regularization term is treated as a constraint term to mine the pixels with low noise. Therefore, the Eq. (8) can be represented as a constrained least square problem.

A. BIQUADRATIC POLYNOMIAL CONSTRUCTION
For simplicity of presentation, we assume X be an n×n noisy image, i.e., x i,j (i, j = 1, 2, · · · , n). y i,j is the corresponding noise-free pixel of x i,j . In fact, from the perspective of geometry, x i,j is viewed as the function value of a point (i, j) in the ouv plane. Namely, the corresponding 3-D point of . With the geometrical symmetry, y i,j can be approximated by x i,j and its neighbors (here we use twenty-five pixels), i.e., x i+l,j+k (l, k = −2, −1, 0, 1, 2). Figure 2 illustrates the neighbor relationship of the pixels used to calculate y i,j . In this paper, we assume that each 5 × 5 patch centered on y i,j can be approximated by a biquadratic polynomial surface. Upon this assumption, image denoising problem is alternative to a biquadratic polynomial surface reconstruction. In other words, we exploit twenty-five pixels Neighbor pixels that are used to approximate the noisy-free pixel y i ,j .
mate the noisy-free pixel y i,j . Formally, the biquadratic polynomial is defined as In theory, we can use nine points to construct a unique surface. Thus it is possible to construct a high precision surface f i,j (u, v) using the known twenty-five pixels. A key issue is how to find the available pixels with low noise from these twenty-five pixels. Obviously, the significance of pixels in surface construction is inversely proportional to the noise level of these pixels. We adopt the following method to calculate the polynomial coefficients in Eq. (9).

B. COEFFICIENT CALCULATION
Ideally, the pixels y i+l,j+k (l, k = −2, −1, 0, 1, 2) located in each image patch should be points in the surface f i,j (u, v). This implies that we can define a gradient term in Eq. (8) using the gradient information of the surface f i,j (u, v). By minimizing the gradient of f i,j (u, v), the parameters used in Eq. (9) will be adaptively determined. Specifically, we first calculate parameters a 1 , a 2 , a 6 and a 7 . For ease of presentation, let f i+s,j+t = f i,j (i + s, j + t). We have the following eight differences that are defined by eight directions (arrows in Fig. 2): where Based on the weighted least square theory, we define an objective function as follows where w i , (i = 1, 2, · · · , 8) are the weights defined as 2 ≤ α ≤ 3 is a control parameter that adjust the influence of eight differences defined in Eq. (10) on the objective function G(a 1 , a 2 , a 6 , a 7 ). By setting the derivatives of G(a 1 , a 2 , a 6 , a 7 ) to zero, i.e., the optimal values of parameters a i , (i = 1, 2, 6, 7) can be obtained.
If only x i,j of twenty-five pixels x i+l,j+k , (l, k = −2, −1, 0, 1, 2) has noise, the optimal values of parameters a i , (i = 1, 2, 6, 7) derived from Eq. (14) are accurate. Otherwise, we need to find the pixels with low noise that have significant influences on the determination a i , (i = 1, 2, 6, 7) of these parameters. It is necessary to point out that the weights defined in Eq. (13) do not reflect the influence of noise level in each pixel. In the following, we discuss how to define new weights w i , (i = 1, 2, · · · , 8), of which the influences on parameters are inversely proportional to noise level. Since the noise causes inaccurate estimation values of parameters a i , (i = 1, 2, 6, 7), we can define their estimation errors as For the first difference e 1 defined in Eq. (10), if the pixels x i+1,j and x i−1,j are accurate, the error ω 1 should be very small. Otherwise, ω 1 should be with a large value. It is easy to know that w 1 is inversely proportional to ω 1 . Likewise, w i , (i = 2, 3, · · · , 8) are also inversely proportional to ω i , (i = 2, 3, · · · , 8). Therefore, we update the weights w i as follows. 6, 7, 8, (16) where δ is a very small constant for avoiding division by zero. By combining Eqs. (12) and (16), we can obtain the updated parameters a i , (i = 1, 2, 6, 7). These parameters can be used to update the errors ω i in Eq. (15), which further leads to new weights w i by Eq. (16). This is an iterative updating process. In this paper, we empirically found that it is sufficient to set the iteration number to three. By the weights determination algorithm described above, we can indirectly mine the pixels with low noise.
To determine other unknown parameters a 0 , a 3 , a 4 , a 5 and a 8 , we reformulate Eq. (9) as follows where d(s, t) = a 1 s 2 t + a 2 st 2 + a 6 s + a 7 t. For each 5 × 5 image patch that consists of twenty-five pixels, the proposed method applies a biquadratic polynomial surface f i,j (u, v) to approximate it in the sense of least squares. Let   H (a 0 , a 3 , a 4 , a 5 , a 8 ) where w s,t = 1/ 1 + β(s 2 + t 2 ) are the weights, β is a control coefficient. Minimizing this objective func- tion H (a 0 , a 3 , a 4 , a 5 , a 8 ) can obtain the values of parameters a 0 , a 3 , a 4 , a 5 , a 8 . However, H (a 0 , a 3 , a 4 , a 5 , a 8 ) only depends on the distances, which does not reflect the effect VOLUME 8, 2020 of noise level in each pixel. Similar to Eqs. (15) and (16), we redefine the weight w s,t as w s,t = 1/ (1 + β(s 2 + t 2 ))(ω s,t + δ) , s, t = −2, −1, 0, 1, 2, (19) where . ω s,t essentially is the error between f i,j (s, t) and the pixel x i+s,j+t . As discussed above, if x i+s,j+t has a small amount of noise, the error ω s,t is also small and the weight w s,t is large. By combining Eqs. (18) and (19), we can obtain the parameters a i , (i = 0, 3,4,5,8). Similarly, we can use these parameters to further update the weight w s,t . After running this iterative updating process three times, we can achieve the final values of the parameters a i , (i = 0, 3, 4, 5, 8). From Eq. (9), we know that a 8 is the estimate of y i,j , namely,ŷ i,j = a 8 . The whole calculation procedure can be summarized as. S1. Construct a biquadratic polynomial by Eq. (9) for a noisy pixel x i,j ; S2. Calculate its eight differences using Eq. (10); S3. Calculate the parameters a i , (i = 1, 2, 6, 7) by performing the following steps: S3-1. Define an objective function G(a 1 , a 2 , a 6 , a 7 ) according to Eq.

IV. RESIDUAL NOISE REDUCTION USING SVD-BASED LOW-RANK APPROXIMATION
The biquadratic polynomial based denoising method is a local adaptive smooth technique. It can remove most of noise in a noisy image. But there are still a small amount of residual noise in the denoised result. To handle this problem, an alternative way is to exploit the non-local self-similarity of medical images and its intrinsic low-rank property to further reduce the residual noise. For each noisy image patch, we can search its similar patches and reshape them to form a similar patch matrix (denoted as P ∈ R n×n for ease of discussion). Then this patch matrix is factorized by singular value decomposition (SVD) and approximated by truncating the singular values and corresponding singular vectors. This process can be simply formulated as where σ i is the singular values of P, u i and v i denote the corresponding left singular vector and right singular vector, and the rank r of P is a key parameter that needs to be preseted. In [20], an estimate of the rank r is derived by using the Eckart-Young-Mirsky theorem. The self-similarity of medical images leads to the similar patch matrix P be lowrank. Owing to the energy compact property of SVD, most of singular values of P are approximately zero. Therefore, only a few largest singular values preserve most of important information. In essence, this truncated SVD can be considered as an adaptive group sparse representation method, in which singular values σ i are representation coefficients and the products of left singular vector and right singular vector, i.e., u i v T i , are basis functions. The truncation operator is equivalent to the well-known hard thresholding operator [28].
Even though this residual noise reduction method based on truncated SVD is simple and efficient, the truncated operation leads to some important details loss, especially for medical images. The main reason is that the patch matrix P is not an exact low-rank matrix. Figure 3 illustrates the distribution of singular values of a patch matrix P. It can be observed that the singular values σ i (i = 10, · · · , n) are very small but not equal to zeros. These small singular values contain the noises and a part of important detail information. Therefore, to address this issue, we apply the classical soft shrinkage strategy to estimate these small singular values instead of setting them to zeros. Upon this shrinkage strategy, the noise-free estimate of a patch matrix P can be formulated as where the shrinkage operator T (·) with the thresholds µ i ≥ 0 (i = r + 1, · · · , n) is defined by Note that we empirically set r = 9 due to the high self-similarity of medical images. Although this strategy is very simply, we found that it is effective for denoising the test images used in experiments. Next, we discuss how to determine the thresholds µ i . According to the Eckart-Young-Mirsky theorem, the error E between the patch matrix P and its estimate P derived by truncated SVD is It is easy to know that the sum S of all elements of E can be represented by S = c r+1 µ r+1 + c r+2 µ r+2 + · · · + c n µ n .
Ideally, the error should be the white gaussian noise, which means the sum of all elements of E is equal to zero. Without loss generality, let c r+1 = 0. Thus, Eq. (24) is written as where d i = −c i /c r+1 (i = r + 2, · · · , n). The method aims at minimizing the following constrained objective function s.t.µ r+1 = d r+2 µ r+2 + d r+3 µ r+3 + · · · + d n µ n . (26) Due to the fact that the singular values meet the condition σ i ≥ σ i+1 , the thresholds µ i should progressively decrease. Upon this condition, the objective function (26) is changed as If d i (i = r + 2, · · · , n) are known, because of the convexity of G(µ r+2 , µ r+3 , · · · , µ n ), the threshold µ i can be calculated by In our experiments, we empirically set d i = τ 2 − σ i σ r+1 . Due to the overlapping regions among neighbor patches, multiple estimates of a pixel can be achieved. Therefore, after obtaining the estimate of each patch matrix P, we need to aggregate all the denoised patches to output the final denoised image. The widely-used aggregation method is the weighted averaging. Although many complicated strategies have been presented to adaptively select the weights [6], here we adopt the uniform weights in the averaging scheme due to the computational simplicity, which can be expressed aŝ where y k i,j is the kth estimate of y i,j and N i,j is the number of its estimates. Based on the above discussions, the residual noise can be reduced by the following steps: S1. Divide the initial denoised image Y 0 into a lot of overlapping patches, and reshaping each patch into a vector; S2. For each patch, search its similar patches to construct a patch matrix P; S3. Apply SVD on each patch matrix P to obtain the singular values σ i and corresponding left and right singular vectors; S4. Shrink the singular values using the thresholding function Eq. (22); S5. Calculate the denoised version of patch matrix P by Eq.

A. DATASETS
In our experiments, we use a real clinical data set that includes one anonymous patient's 30 brain images. The noisy images with different noise levels (σ = 3, 5, 7) are produced by adjusting the radiation dose. The information of these medical images are provided in Tables 1-3, and several images are shown in Figures 2-4.

B. EVALUATION CRITERIA
To provide a quantitative performance of the proposed method, three commonly-used objective metrics, peak signalto-noise ratio (PSNR) [29], structural similarity (SSIM) index [30], and feature similarity (FSIM) index [31], are used to evaluate. The mathematical representations of these metrics are as follows.
• The PSNR is defined as where represents the ratio between the maximum value of an image and its distort that affects the quality. The higher the PSNR, the better the quality of the denoised image.
• The form of SSIM index is where µ x , σ x and σ xy denote the mean, standard deviation and cross-correlation of a signal, respectively, and VOLUME 8, 2020  C 1 and C 2 are small constants to characterize the saturation effects of the visual system at low luminance and contrast regions. Comparison with PSNR, SSIM is closer to the quality perception of the human visual system.
• The FSIM index is formed as where PC m (·) and denote the phase congruency value and the whole image domain, respectively, and S L (·) is the similarity metric that combines the phase congruency similarity and the gradient similarity.

C. DENOISING RESULTS
In order to evaluate the performance of noise reduction of our method, we compare the proposed method with two classical 84956 VOLUME 8, 2020  noise reduction methods: Gaussian filtering [32], non-local mean (NLM) filtering [5], an hybrid singular value thresholding [33]. All methods are coded in Matlab programming language. In our experiments, we perform these three methods on 30 test images with different noise levels (σ = 3, 5, 7), and evaluate the results in terms of both objective quantitative metrics and subjective visual quality.
To compare the denoising performance objectively, Tables 1-3 list the quantitative results of four denosing methods in terms of PSNR, SSIM and FSIM. It is seen that the proposed method outperforms Gaussian filtering and NLM filtering, which achieves the best quantitative performance, especially in term of PSNR. The proposed method is capable of obtaining gains in PSNR of up to  0.20dB than NLM filtering, and is superior to Gaussian filtering by 0.73dB on average. We also observed that the performance of the hybrid singular value thresholding is not stable. Compared with other methods, the SSIM results also show that our method achieves performance gains up to 3.4% and 6.3%, respectively. In addition, the proposed method is slightly inferior to NLM filtering and hybrid thresholding in FSIM, but is significantly superior to Gaussian filtering.
For visual comparisons, Figures 4-6 show the denoised images of six different images with different noise levels, in which zoomed local regions are also given. As can be observed from them, the results by our method and Gaussian filtering are significantly better than NLM filtering and hybrid thresholding, especially in detail-preserving. The reason is that the non-local averaging behavior of NLM leads to oversmoothed results and the minimum variance estimate used in hybrid thresholding also smooths detail information. Note that Gaussian filtering obviously outperforms NLM in visual quality, while NLM obtains higher quantitative performance than Gaussian filtering. This observation is significantly different from the case in dealing with natural images. One possible reason is that medical images can be effectively modeled by local prior used in Gaussian filtering. Additionally, it also indicates that quantitative metrics (PSNR, SSIM, and FSIM) widely used in medical image denoising may not be sufficient for evaluating the image quality.
We have also observed an interesting phenomenon as follows. Intuitively, the output of a denoising method should depend on the input. In other words, the denoised result from an input image with a higher noise level generally should has a lower quality. From Tables 1-3, we can observe that the performance of NLM and Gaussian filterings is relatively stable, while the PSNR values of the inputs (noisy images) are highly fluctuated. A possible reason for this phenomenon is that there exists a high self-similarity in medical images, which makes NLM and Gaussian filterings more stable. However, despite its stability, NLM results in a detail loss.

VI. CONCLUSIONS
This paper presents a new method for medical image denoising that combines biquadratic polynomial surface construction and low-rank approximation techniques. To the best of our knowledge, the biquadratic polynomial with minimum error constraints is first used to reduce noises in medical images. With the reconstruction error constraint, the construction coefficients are determined by minimizing the gradient and the approximation accuracy of the surface. To improve the performance of biquadratic polynomial based denoising, a new singular value thresholding strategy is applied to further suppress the residual noise. The thresholding is achieved by optimizing an objective function with a constraint. The feasibility and effectiveness of our method have been demonstrated by experimental results on a real clinical data set. In our future work, we will focus on further improving the real-time performance of the method.
[31] L. Zhang Mcgill University, involving in brain image analysis. Her researches focus on designing and application of high-performance machine learning models to solve problems in the fields of computer vision, biomedical imaging, and natural images.