Patch-based models and algorithms for image denoising: a comparative review between patch-based images denoising methods for additive noise reduction

Background Digital images are captured using sensors during the data acquisition phase, where they are often contaminated by noise (an undesired random signal). Such noise can also be produced during transmission or by poor-quality lossy image compression. Reducing the noise and enhancing the images are considered the central process to all other digital image processing tasks. The improvement in the performance of image denoising methods would contribute greatly on the results of other image processing techniques. Patch-based denoising methods recently have merged as the state-of-the-art denoising approaches for various additive noise levels. In this work, the use of the state-of-the-art patch-based denoising methods for additive noise reduction is investigated. Various types of image datasets are addressed to conduct this study. Methods We first explain the type of noise in digital images and discuss various image denoising approaches, with a focus on patch-based denoising methods. Then, we experimentally evaluate both quantitatively and qualitatively the patch-based denoising methods. The patch-based image denoising methods are analyzed in terms of quality and computational time. Results Despite the sophistication of patch-based image denoising approaches, most patch-based image denoising methods outperform the rest. Fast patch similarity measurements produce fast patch-based image denoising methods. Conclusion Patch-based image denoising approaches can effectively reduce noise and enhance images. Patch-based image denoising approach is the state-of-the-art image denoising approach.


Introduction
The noise level in digital images may vary from being almost imperceptible to being very noticeable. Image denoising techniques attempt to produce a new image that has less noise, i.e., closer to the original noise-free image. Image denoising techniques can be grouped into two main approaches: pixel-based image filtering and patch-based It is now common in image denoising field to utilize patch-based models and algorithms instead of pixelbased approaches to produce most promising estimate of the noise-free images. However, there are both advantages and disadvantages in the use of patch-based models and algorithms. There are several advantages of patchbased approaches. Smoothing flat regions is the most important aspect. Redundancy between patches enable patch-based approaches to properly smooth flat reigns. A second advantage of using patch-based models and algorithms approaches is that it can preserve fine image details and sharp edges. However, there could be some disadvantages for patch-based models and algorithms. First, although similarity between patches assists in estimating flat regions, so is the averaging. It is, therefore, quite timeconsuming to group and compare similar patches. This might mean that each patch has multiple estimates and patches are overlapped. Secondly, while it may be that patterns and textures seem clear with less noise, patchbased models and algorithms usually exploit large number of parameters, which can be extremely difficult to adjust properly. We believe that the advantages of patch-based methods far outweigh their disadvantages, as modern computers are significantly fast, and have large memory spaces.
In this work, the patch-based image denoising schemes are analyzed from two different aspects: (1) the performance of patch-based denoising techniques in terms of image denoising quality and (2) the performance of patchbased denoising techniques in terms of computational time, where various patch-based denoising techniques are addressed.
A literature survey was conducted to evaluate the most recent patch-based denoising improvements for additive noise. Following the literature survey, there is an empirical study, which is used to evaluate the performance of various patch-based denoising techniques in terms of their accuracy and run times at various noise levels.

Patch-based image filtering
In patch-based denoising techniques, the input noisy image is divided into patches (i.e., blocks). The blocks are then manipulated separately in order to provide an estimate of the true pixel values. In this section, various patch-based image denoising algorithms are presented and their efficiency with respect to image denoising are studied.

Averaging patch-based: non-local means
Non-local means (NL-Means) is a patch-based filter proposed by Buades et al. [6] as a modification to the pixelwise bilateral filter [4,16,38,50,58,61]. Like the bilateral filter, the NL-Means filter blurs the homogeneous areas and preserves edges. The NL-Means filter divides the input image into sub-images and then filters each subimage separately in a technique that is referred to as being patch-wise. Each sub-image contains several patches. As in the bilateral filter, similarity is measured based on two measurements: (1) the Euclidean distance between the centers of the patches and (2) the luminance distance between the patches. In contrast to the bilateral filter, patches are compared within a search window instead of with the pixels of the neighbors. This is why it is called a non-local method. Patches with similar gray levels have larger weights when they are averaged. Figure 2a shows the NL-Means patches and how to find the similar patches in a raster scan in a search window. Figure 2b illustrates the fact that patches with a similar gray level, for example, Fig. 2 Similarity between patches. a NL-Means patches as a raster scan in a search window. b Patch P3 is similar to P1 more than patch P2; hence, P3 will get a weight larger than P2 weight P1 and P3, should be assigned a larger weight than that to be assigned to P2. The edges in NL-Means filtering are preserved regardless of their direction.
The estimated value NL-Means[v] i , for a pixel i, is computed as: where [v] i and [v] j are pixel intensities at locations i and j, respectively, and ω(i, j) is a similarity measure between the pixels i and j. The similarity measure weight satisfies the condition 0 ≤ ω(i, j) ≤ 1 and j ω(j, i) = 1. The similarity weight depends on the gray level similarity and the Euclidean distance between vectors N [v] i and N [v] j , where N[v] k denotes a square neighborhood of fixed size and centered at a pixel k. The weights are described as: where Z(i) is a normalization factor and h is a filtering parameter set depending on the noise level. The level of noise determines the sizes needed for the search window and patches. For a robust comparison between patches, the size of the patches increases when the noise level is high. Accordingly, the value of the filtering parameter h increases as the size of the patch is increased. Meanwhile, the size of the search window must be increased in order to find more similar patches.
NL-Means filter is considered the cornerstone of many patch-based denoising methods. It can be adapted easily to many other applications, such as, multi-view image denoising. Nevertheless, there are some disadvantages of NL-Means filter. The filter is computationally expensive due to the large amount of weight computations between similar patches. Another disadvantage is that the NL-Means filter is a spatial domain filter though convolution can be easily implemented in the frequency domain.
NL-Means filter has too many modifications. Improving the way of assigning the weights between patches would improve the performance of the NL-Means method. Hedjam et al. [25] improved the process of adjusting the weights in the NL-Means by using Markovian clustering. Wu et al. [63] used a statistical shrinkage perspective when assigning the weights in NL-Means via using James-Stein [30] shrinkage estimator. Lai and Dou [35] introduced an improved neighborhood pre-classification strategy for optimized weight kernels of NL-Means filter. Khan and El-Sakka [32] introduced a variant of the NL-Means scheme by using a thresholding step to reduce the number of similar patches before weight averaging the patches.
NL-Means is a spatial domain filter, transferring this filter to the frequency domain would help in suppressing more noisy signal. Enríquez and Ponomaryov [17] transferred the patches of the NL-Means to the frequency domain and used discrete cosine transform with a threshold to estimate true patches. Zhong et al. [69] combined the NL-Means with Lee filter [36] for SAR image enhancing. Chan et al. [7] incorporated a median filtering operation indirectly in the NL-Means method for denoising low signal noise ratio (SNR) images. Maruf and El-Sakka [43] projected NL-Means patches into a global feature space before performing a statistical t test to reduce the dimensionality of this feature space. They gathered similar patches globally. Irrera et al. [28] adapted NL-Means for denoising X-ray images (XNL-Means), and then they applied an additional multi-scale contrast enhancement in the frequency domains.
NL-Means filter could be adapted to improve other image processing applications (e.g., segmentation, recognition, and video denoising). Zhan et al. [66] introduced an extension to the NL-Means method for ultrasonic speckle reduction. They assigned the patch similarity weights iteratively in a lower dimensional subspace using principal component analysis (PCA). Xu et al. [65] adapted the NL-Means to be use for microscopy cell images via a frequency transform. Genin et al. [20] adapted a modified version of the NL-Means filter for detecting small objects by background suppression. Background pixels are estimated by a weighted average depending on the similarity between neighborhoods pixels. Kim et al. [33] adapted the NL-Means filter for noise reduction and enhancement of extremely low-light video. They use a motion adaptive temporal filter using gamma correction with adaptive thresholds before the NL-Means filter. Xu et al. [64] adapted the idea of patching from the NL-Means for filtering polarimetric synthetic aperture radar (POL-SAR) images; they use simultaneous sparse coding for transferring the patches into the frequency domain before assigning the weights.

Probabilistic patch-based filter
The probabilistic patch-based (PPB) filter, which works in the spatial domain, was proposed by Deledalle et al. [12] as an extension of the NL-Means filter. The PPB approach is one of a few denoising techniques that can provide a general denoising methodology for various noise models. Thus, it is more general than the NL-Means and can be applied where there is additive noise or multiplicative speckle noise. The PPB filter is a statistically based similarity scheme that depends on the distribution model of the noise. The weighted average is used for the Gaussian noise distribution in the NL-Means, but the PPB filter applies smoothing based on the maximum likelihood estimator (MLE). The PPB is expressed as a weighted maximum likelihood estimation (WMLE) problem. The weight is derived from the data by improving the isotropy of the filter-non-iterative probabilistic patch-based filter (Non-iPPB)-and it can be iteratively defined based on the similarity of the patches-iterative probabilistic patch-based filter (It-PPB).

Weighted maximum likelihood estimator
Using a weighted maximum likelihood estimation for image denoising is not a new technique; it was first used for image denoising by Polzehl and Spokoiny [51,52]. PPB redefined the weights in term of a patch-based approach. PPB image denoising is considered to be an estimationû of the true image u which originates from the noisy image υ. The images are defined over a discrete regular grid . A pixel value is described as i, and its neighbor is j at the location (x, y) ∈ . The noise model is considered as being defined by the parametric noise distribution "likelihood" p(i | θ * i ), where θ * i is an unknown space-varying parameter. The denoising of an image is equivalent to finding the best estimationθ i for θ * i for all pixels. The MLE at each location (x, y) estimatesθ i from a set S θ * i of the distributed random variables around it by: = arg max The indicator function has been derived from the data as weights ω(i, j) ≥ 0 in Polzehl and Spokoiny [52] and Polzehl and Tabelow [51], where it is used as weight for adaptive pixel-wise filters. However, the indicator function in PPB is used as weight function to form the WMLE:

Defining the weight between patches
In Subsection 2.1, the weights in the non-local means filter are defined by comparing the similarities of the two patches [v] i and [v] j centered around the two locations i and j, respectively. A weighted Euclidian distance between the two patches defines the level of similarity. The objective of the PPB filter is to generalize and extend the idea of the Euclidean distance weight used in the non-local means filter so that it can be adapted to non-additive noise models. The weights used in the probabilistic patchbased method are estimated using the probability of the two patches in a noisy image having the same parameters. By following the same idea as the weight in the non-local means filter and assuming equal values for i and j in the two statically similar patches [v] i and [v] j , PPB weights would be defined as: where θ * [v] i and θ * [v] j are the patches extracted from the image θ * and h is larger than 0 , which indicates the size of the patch in PPB. The h acts as σ in the NL-Means algorithm in order to control the filtering amount. The probability of the similarity of the patches pixels is decomposed into a product of the probabilities of k neighbors:

Iterative WMLE denoising
In order to improve the performance of the PPB algorithm, the probability of a similarity is estimated iteratively. The weight, at each iteration, is expressed as the product of two terms: (1) the probability of the similarity between the noisy patches as described in Subsection 2.2.2 and (2) the probability of the similarity derived from the previous iteration. Assume that the previous estimation at t iteration iŝ θ t−1 for θ * . Then, the formula in Eq. 5 can be expressed as: This is similar to what was achieved in Subsection 2.2.2, where the probability of the similarity is decomposed into a product of the probabilities of the k neighbors: From the Bayesian framework, the naïve Bayes model can be fitted with the maximum likelihood concept. The probability is estimated using the prior probability and can be presumed to be proportional to the likelihood: The similarity likelihood is computed using: The likelihood term is to compute the degree of similarity between the patches, and the prior term is to compare the two probability distributions from the previous iteration, similar to Polzehl and Spokoiny [51].
The scheme, Fig. 3, shows the procedure of iteratively competing the weights in the PPB algorithm. The procedure for defining the weights is estimated iteratively by (1) the PPB weights estimator (PPBWE) uses the likelihood term and the estimated value from the previous iteration in order to compute the prior term (Eq. 8), (2) WMLE uses the PPBWE estimation and the noisy image in order to estimate the new weight (Eq. 4), and (3) the PPBWE and the WMLE steps are repeated until there is no difference in the estimations made from the two steps.

Algorithm used in the case of Gaussian noise
The PPB filter can be used for filtering additive white Gaussian noise (AWGN). By assuming the AWGN model, the values of the pixels I of the patch [v] i are distributed based on the Gaussian distribution ℵ u, σ 2 . Here, u is the noiseless image and σ the noise variance. The noiseless image u can be estimated by the weighted average that maximizes the WMLE defined in Eq. 4: In order to estimate the weighted average ω i, j , two terms are considered: the likelihood and the prior terms. The likelihood function is discretized as: and the prior term is discretized as: By combining the two terms in Eqs. 10 and 11, the weight at any iteration is defined as: where n is number of pixels and T is a constant similar to h in Eq. 2. When there is no iteration "posterior term = 0, " the filter performs similar to the NL-Means filter.

Fig. 3
Weights in PPB algorithm: computing iteratively the weights between two pixels s and t in the probabilistic patch-based (PPB) filter. The PPB weights estimator (PPBWE) uses the noisy image and the estimation values from the previous iteration in order to estimate the weight PPB filter has several advantages and disadvantages. The main advantage of PPB filtering is that it is a statisticalbased approach, which can be utilized for suppressing additive Gaussian noise and/or multiplicative speckle noise. Also, this filter can be adjusted as an iterative filter or not. Nevertheless, the main drawback of this filter is the suppression of fine and dark details when denoising. Another disadvantage of the filter is the high computational cost when used as an iterative filter.
A further extend patch log likelihood (EPLL) filter, similar to probabilistic patch-based filter, was proposed recently by Papyan and Elad [47] via considering a multiscale prior.

Dictionary learning
Dictionary learning (DL) is utilized as a replacement for the use of a fixed dictionary for representing data. From the 1970s, data can be represented by using a fixed dictionary, for instance, Fourier of Boussinesq [5] and Wavelets of Haar [23]. In 1996, Olshausen and Field [45] proposed an approach to learn the dictionary from a data in order to optimize the sparsity of the data. The dictionary learning or k-means singular value decomposition (K-SVD) was first adapted to image denoising in 2006 by Aharon et al. [1]. Dictionary learning method finds the best dic- In order to filter a noisy image, each signal y j is considered as a patch extracted from the noisy image. The sparse code of signal data y = y j for j = 1, . . . , n is obtained by minimizing a constrained optimization 0 : where k > 0 controls the amount of sparsity, and 0 pseudo-norm is defined by: In dictionary learning, optimization is performed on the dictionary D and the coefficients X = x j m j=1 ∈ R p×m for j = 1, . . . , n, where the set of coefficients is x j of the data y j . The joint optimization is written as: where is the constraint set: The sparsity constraint is set on χ k , which is the unit normalization of the dictionary columns: Peyré and Fadili [49] proposed using a block-coordinate descent minimization approach used by Tseng [59] in order to minimize X and D.
The dictionary learning algorithm depends mainly on three steps: (1) patch extraction, (2) sparse coding, and (3) patch construction. In the first step, several patches are randomly extracted from the whole input image. In the sparse coding step, the energies of the X and D dictionaries are iteratively minimized. Patch averaging and reconstruction occurs in the patch reconstruction step. Each of the three steps has stages: 1. Patch extraction step: • The mean of each patch is removed from each pixels value. • Patches are sorted based on their energy, those with a high level of energy are kept by thresholding. • Patches are reshaped as columns in order to form Y.

Sparse coding step:
• Each atom in the columns is normalized in order to form the initial dictionary D. • Number of the columns is reduced again, before computing the X coefficients. • The X dictionary is initially started with zeros.
• The coefficients of dictionary X is updated by: where T is the thresholding.
• The dictionary D is updated again by: • K is the number of iterations, which is used to minimize the X and D dictionaries by updating them iteratively. • Finally, the coefficients of dictionary X are multiplied by dictionary D.

Patch construction step:
• The result of the multiplication is a new array, Y 1. • The columns of Y 1 are reshaped to form patches. • Re-inserting the averages to the patches comes before averaging the patches to replace the noisy patch.
The scheme in Fig. 4 shows the steps of the dictionary learning method for denoising images.
The most important advantage of the DL method is the sparse representation of the input data. One advantage of the DL is the low number of parameters when compared to other patch-based denoising methods. However, DL has some drawbacks. The main disadvantage of DL is the computational burden due to several iterations for computing the singular value decomposition. Moreover, DL does not handle images with many flat patches very well because the singular value decomposition is more sensitive to textured patches.
The DL method has many modifications. Tian and Wang [57] made DL more sparsely representative in the case of less observation values by proposing an adaptive orthogonal matching pursuit to adaptively ensure the sample size. Some of the modifications aim to adapt the idea of denoising based on dictionary learning to other image processing applications. Chen et al. [10] generalized the idea of the learning dictionary to explore identity information in multiple frames of videos. They generated a sparse representation from multiple video frames for face and body part recognitions. Fu et al. [19] proposed an effective model based on DL for hyperspectral image (HSI) denoising by considering sparsity across the spatial-spectral domain, high correlation across spectra, and non-local self-similarity over space. Kang et al. [31] proposed a feature-based approach for assessing similarity between images. After extracting feature points from an image, they utilize dictionary learning. Then, they measured the similarity between images in terms of sparse representation. A novel self-learning based image decomposition framework was presented by Huang et al. [27]. Their framework performs unsupervised clustering on the observed dictionary via affinity propagation that allows effectively to identify images components with similar context information. The framework can automatically determine the undesirable random noisy components from true image components directly from a noisy image. Dictionary learning algorithm was adapted to filter Chinese character images by Shi et al. [54]. They divided the image frequency to low and high frequencies. While a Butterworth low-pass filter was utilized to filter low frequency, the K-SVD dictionary learning algorithm was proposed to filter high-frequency parts which consists of structure of Chinese characters.

Patch-based PCA
Recently, over-complete dictionaries with sparse representation techniques became very widespread in image denoising [1,40,41], and they are one of the state-ofthe-art denoising algorithms. These methods use overcomplete dictionaries derived from enormous image sets or from the noisy image itself. They outperform other denoising techniques due to their ability to provide an appropriate basis for separating noisy signals from the true image signals, so they suppress more noise and preserve edges. Despite the fact that over-complete dictionaries are frequently used for image denoising, such dictionaries are sophisticated and quite expensive in terms of memory usage and time. However, patch-based principal component analysis (PB-PCA) of Deledalle et al. [13] is a modification of the dictionary methods.
PB-PCA uses simple orthogonal dictionaries constructed by using principal component analysis. The results of PB-PCA still do not outperform the overcomplete dictionary methods, but PB-PCA shows how simple orthogonal dictionaries can achieve excellent results with less sophistication. PB-PCA simply learns the orthogonal dictionaries from the noisy image via principal component analysis. The next step is to threshold the patches' coefficients in the dictionaries. This idea is similar to the wavelet denoising methods used in [8,9,14], in which they use either hard thresholding or soft thresholding for zeroing the coefficients. Figure 5 shows extracting patches used in the PB-PCA method from an image and grouping them before computing the PCA.
For the problem of denoising an image that is interrupted by AWGN, the patch model has the following formula: where [u] i is the true image patch, z i is the AWGN noise, [v] i is the noisy patch, and n is the number of patches. By assuming [v] i , .., [v] i−1 are a group of overlapped patches of size N × N extracted from the noisy image υ, the covariance matrix is the sum of: wherē In PCA, the singular value decomposition (SVD) of the covariance matrix is processed. Moreover, the eigenvalues g 1 , · · · , g n−1 of the covariance matrix and the corresponding eigenvectors G 1 , · · · , G n−1 are calculated. Eigenvectors are called the principal components "axis" of the processed data and are used to form an orthogonal basis, G i is the the ith principal axis of the data. Due to the orthogonal basis of the principal components, an image patch can be decomposed as Figure 6b, c shows the first and last 16 principal axes of the all patches obtained from the house image shown in Fig. 6a.
By assuming that the true image pixels have a lowdimensional subspace and the noise is spread in all directions, projecting the axes into the first axis would suppress the noise in the noisy image. Projecting the axes is called coefficient thresholding, and it is done by using an appropriate shrinkage function. A general formula for estimating a true image is: where η is the shrinkage function. PB-PCA filtering method has been tested with four shrinkage functions: (1) soft thresholding (ST); (2) hard thresholding (HT); (3) Keep or Kill (KoK), and (4) Wiener filter [62]. Figure 7 shows a comparison between the four different projection methods into the basis by PCA for the House and Cameraman images. From Fig. 7, it can be seen that hard thresholding with a large number of axes is the best of the four projection methods. Using a Wiener filter as a shrinkage function for the PCA has been proposed by Muresan and Parks [44,67]. The Zhang et al. [67] algorithm will be discussed in Subsections 2.4.1.
PB-PCA has three variants based on how each set of patches is collected from the input noisy image before the PCA process. In PB-PCA, the three variants for collecting patches are globally, locally, or hierarchically. Figure 8 shows the best means of collecting the patch sets globally, locally, or hierarchically. These variants are discussed in the following subsections.

Patch-based global PCA
Collecting patches for a PCA can be done globally from the entire noisy image in patch-based global PCA (PGPCA). This approach is faster than the other approaches, which need time to divide the image into subimages before computing the PCA coefficients. However, it has less filtering quality. Collecting patches globally for PCA filtering has been used in Bacchelli and Papi [2] and Zhang et al. [67]. The Bacchelli and Papi algorithm uses a linear transformation, "a wavelet transform, " before computing the PCA in order to achieve better results. The Zhang et al. algorithm collects patches globally, but it computes the PCA in two stages.  Fig. 6 The principal components "axis" of the house image: a is the input image, b is the first 16 principal axes of the all patches obtained from the house image, and c is the last 16 principal axes of the all patches obtained from the house image [13] The PGPCA approaches cannot compete with other local or semi-local approaches, which consider the high level of redundancy occurring between neighboring patches. In the global PGPCA, one original basis for the whole image, which impacts negatively on the denoising process. The global PGPCA does not identify the rare patches because they do not exert a strong influence on the total variance. However, an allowance can be made for these limitations by considering the local redundancy between patches.

Patch-based local PCA
In the patch-based local PCA (PLPCA) approach, patches are collected locally in order to overcome the limitations of the global PGPCA approach. The local collection of patches means that the patches are collected within a small region of interest in the noisy image. A fixed search window N × N is applied to the whole image. Since the patches are overlapped in PLPCA, there will be multiple estimates for a single pixel. Averaging is used to compute a single pixel's value.
The advantage of this approach is that the orthonormal basis is adapted only to the sub-image and not to the whole image. However, this approach has two limitations: the overfitting and the fact that it is time-consuming. The overfitting problem is due to the limited number of patches on which to compute the PCA. PLPCA is extremely time-consuming because the PCA needs to be computed repeatedly.
Fei et al. [18] collected patches locally, but they improved their approach by using geometric structure clustering to guarantee that only patches with similar properties were gathered. Pal et al. [46] considered patch redundancy in order to improve on the global two-stage PCA approach of Zhang et al. A sliding window that moves with a step s = W p −1 2 , where s is the step size and W p is the window's current location was utilized as a modification in order to reduce the time-consuming element of the PLPCA approach. The computational time is divided by s 2 without losing the denoising quality. Zhang et al. [68] proposed using similar patch-based local PCA  filter with an extended step where a Wiener filter is finally applied.

Patch-based hierarchical PCA
In the patch-based hierarchical PCA (PHPCA) approach, an algorithm builds a hierarchy cluster of the patches. Clustering is the task of grouping a set of patches into the same cluster, i.e., set. There are different cluster models; each model has several clustering algorithms. The models include connectivity models (e.g., hierarchical clustering) and centroid models (e.g., k-means). The hierarchical clustering is based on the concept of the grouping of patches according to a maximum distance between patches. Patches are represented as a dendogram, which is a Greek word meaning a tree diagram that illustrates the arrangement of the patches. In centroid models, vectors are assigned usually to a number k of fixed clusters. For more information about the clustering models, readers are referred to Chapter 17 in Introduction to Information Retrieval by Manning et al. [42].
The objective of the PHPCA approach is to offer a solution that can provide a result between the PGPCA and PLPCA approaches, which is less time-consuming than the local approaches and is more adapted to local subimages. PHPCA uses a geometric partitioning to first divide the image into four areas, and then it estimates the principal axis for each area. Each area has its own principal components. This process is repeated until the end of the tree is reached. Several dictionaries share the first axes. Figure 8c shows how an image can be divided to sub-images.
The most important advantage of patch-based PCA filter is its ability to produce a solution for image denoising using a modest orthogonal dictionary with PCA of the input data. Moreover, a patch-based PCA filter provides various grouping stages to fit user needs-local, global, or hierarchical. The main disadvantage of PB-PCA is its susceptibility to overfitting due to few training patches when performing the PCA locally. Also, PB-PCA is quite expensive in terms of time since PCA needs to be performed repeatedly.

Sparse 3D transform-domain collaborative filtering
Block matching 3D algorithm (BM3D) is the state-of-theart denoising technique by Dabov et al. [11]. It is based on modified sparse representation in the frequency domain. BM3D groups the patches into 3D data arrays instead of into 2D arrays, then it applies a modified sparse representation in the frequency domain. Collaborative filtering is used to deal with the 3D arrays. BM3D's algorithm depends on two steps: (1) collaborative hard thresholding and (2) the collaborative Wiener filtering. The two steps Fig. 9 BM3D filtering: the two steps of BM3D filtering [11]  allow the BM3D to suppress more noise and to preserve more detail. The amount of noise is suppressed in the thresholding step, and the details are restored in the second step. Collaborative hard thresholding has three functions: (1) 3D transform, (2) shrinkage, and (3) 3D inverse transform. The patches in the 3D arrays are overlapped, so a weighted average is used to obtain one estimation for each pixel. Aggregation is the averaging procedure. A significant filter is obtained by using the BM3D algorithm. The scheme in Fig. 9 shows the two steps of BM3D filtering. Below, the two steps of the BM3D algorithm are described. First, the collaborative hard thresholding step is explained. Then, using the collaborative Wiener filtering is discussed.

Step 1: thresholding
Grouping: Similar to the NL-Means, a search window is used to determine the similarity between the patches. The search window is used in order to benefit from the high redundancy among the neighboring patches. There are several different grouping techniques. A number of these techniques have been discussed in this survey. Some other useful techniques for grouping could be considered for patch grouping, such as vector quantization [21], k-means clustering [39], self-organizing maps [34], and others discussed in this survey [29]. However, grouping in BM3D is based on the similarity distance between patches, the "Euclidian distance. " The grouping stage is the first of the two steps in which similar patches are gathered to form 3D arrays. Similarity is computed according to the distances between the patches. Patches with a distance that is below a fixed threshold are considered to be similar and are grouped into the 3D array. Before measuring the distance, a coarse pre-filtering is used to linearly transform the patches using a 2D linear transformation such as multiple wavelet transforms [15,53]. The formulation in Eq. 21 is used to compute the similarity distance between patches, where [v] i , [v] j are respectively the reference patches at i and its neighbors at j, T 2D hard is the 2D linear transform, γ 2D is a hard-thresholding operator equals to λ 2D × σ , and N hard 1 2 is the patch size N × N. σ is the estimated noise standard deviation. γ 2D makes all coefficients with absolute value less than the threshold (λ 2D × σ ) equal to zero, and it leaves the other coefficients unchanged. After computing the Euclidian distance, grouping the similar patches into a 3D array is required. The formulation in Eq. 22 is used for gathering similar patches.
where 3D S hard i is the constructed 3D array contains similar patches and T hard match is the maximum distance between two similar patches. The maximum grouped patches size are restricted to N hard 2 . The next stage is to apply the collaborative filter by (1) performing a 2D linear transform then a 1D linear transform, (2) shrinkage, and (3) inverting the 1D transform and the 2D linear transform.
Collaborative filtering: Once the 3D array is built, a collaborative filter is used for suppressing the noise. A 3D transform is applied to the 3D array, before the shrinkage of the transforming coefficients. The 2D transformation in the grouping stage is applied along both horizontal and vertical lines for each patch, and then a third transformation is conducted along the third diminution of the 3D array for the 3D transform. The formulation of the collaborative filter is: where T 3D hard is the 3D linear transform of the first (hard) step, T 3D −1 hard is the inverse of 3D transformation, and γ 3D is a hard-thresholding operator equals to λ 3D ×σ . γ 3D makes all coefficients with absolute value less than the threshold (λ 3D × σ ) equal to zero.
Aggregation weights: At this stage, overlapped patches in the 3D array 3Dû S hard i have multiple estimates for each pixel in the reference patch at the location i. A weighted averaging procedure is required to provide an estimate for each pixel. Weights in BM3D are inversely proportional to the total variance of the patches in the 3Dû S hard i array. When the total variance is high, a small weight is assigned to the patch.
The amount of the additive noise is independent when processing the collaborative filter in step 1 and step 2. Thus, the total variance is not the same after applying the collaborative filter in the first and second steps. In step 1, the total variance is computed by σ 2 × N hard non−zero , where N hard non−zero is the number of non-zero coefficients after the hard thresholding. The total variance calculated in step 2 depends on the results of the Wiener filter coefficients, the weights of step 2 will be explained on the facing page. However, the weights for step 1 is equal to:

Step two: Wiener filter coefficients
Grouping: Grouping in the second step is in some ways similar to the grouping in the first step; but here, the power spectrums of the first step are grouped, not just the patches from the noisy image. The same formula is used: are the estimated sub-images from step 1 respectively at locations i and j, respectively. At this stage there are two groups: (1) a group of similar patches derived from the noisy image and (2) a group of similar patches derived from the first step.
Collaborative filtering: After grouping the patches, a 3D transform is applied to the 3D array of the grouped patches. A Wiener shrinkage is applied to the transform coefficients of the 3D array. The definition of the Wiener shrinkage coefficients of the power spectrum of the first step is shown in the equation: where T Wiener where 3D ν i is the 3D transform coefficients of the noisy data.
Aggregation weights: Adjusting the weights in this step is not like the first step, which depends on the number of non-zero coefficients reached after the hard thresholding. Table 2 The performance of the denoising algorithms with various noise levels (σ ) The weights, here, depend on the Wiener shrinkage coefficients; the weights are assigned as: The parameters set for the original BM3D is shown in Table 1; the table has some non-discussed parameters: N hard step1 : step size for searching the patches inside the search window N hard step2 : step size for moving the search window N hard Prev. : small search window width for fast BM3D β hard : the Kaiser window function one parameter for reducing the borders effect Haar: a Haar transform DCT: a discrete cosine transform Bior1.5: a biorthogonal wavelet BM3D filter has advantages and disadvantages. The main advantage of BM3D filtering is that it yields great results with less loss of detail due to the smoothing and sharping stages. The BM3D filter is a fast method because it computes similarity between patches before the actual filtering procedure. Despite its many advantages, BM3D has several drawbacks. The performance of BM3D decreases with a high noise level (σ > 40), and it produces images with many artifacts. BM3D is more complex and less flexible to be adapted for domain-specific image processing applications. In addition, it is not easy to be parallelized. In addition, it has many parameters, and adjusting them optimally is a challenging task.
BM3D has two filtering steps and more than 20 parameters. Improving the way of adjusting any of the 20 parameters would participate on improving the output of the BM3D method. BM3D modifications would be categorized into four main categorizes: (1) modifications consider improving the shrinkage function, (2) modifications consider improving the transforms, (3) modifications consider improving the image similarity measures when collecting similar patches, and (4) modifications consider improving the Wiener filtering stages. Suwabe et al. [55] modified the way of collecting similar patches in the BM3D from non-locally to globally. They proposed using iterative filtering with Chebyshev polynomial approximation (CPA) in order to collect the patches from the whole noisy image. Bashar and El-Sakka [3] replaced the fixed hard thresholding scheme with a learning-based adaptive hard thresholding scheme that considers the context of corresponding blocks. Hasan and El-Sakka [24] improved the Wiener filter of BM3D by maximizing the structural similarity (SSIM) [60] between patches instead of using the mean square error (MSE). Moreover, they introduced a 3D zigzag thresholding.

Results and discussions
In this section, various denoising methods are compared aiming to reduce additive white Gaussian noise (AWGN). The objective of this section is to experimentally study the performance of these methods, where the performance is Table 3 Execution time in seconds shows the speed of the denoising methods applied to the four images at various noise levels (σ ) assessed at various noise levels. The issue of time consumption is also addressed. Four images are used to run this experiment. The images have been chosen carefully to help in distinguishing between the methods. The first two of the four images are natural scene images, Barbara and House; the other two are synthetic images, CurvedBand and Chessboard. The Chessboard image is a binary image while the other three images are gray-scale images.
The four images are shown in Fig. 10. The fine details in Barbara image helps in demonstrating how various methods preserve the image clarity, whereas the sharp edges in the House image helps in demonstrating how various methods preserve edges. The gray gradations in CurvedBand image provide insight into the amount of smoothing that has been applied to images. The methods are also tested with the binary pattern repetitions in the Chessboard image. MatLab is used for this experiment. The computer's processor is an Intel (R) Core(TM) i7 CPU @ 3.40 GHz. In Subsections 3.1 and 3.2, the methods are evaluated both quantitatively and qualitatively.

Quantitative evaluation
In order to make an objective comparison between the results, two image similarity matrices are used: (1) the structural similarity and (2) the peak signal-to-noise ratio (PSNR). These quality metrics have been chosen for their capability to assess the level of the additive Gaussian noise degradation. The best result for SSIM is 1, while the PSNR has good result when its value is high. Equations 29 and 30 show the formulas for these two quality metrics, respectively: where x is the true reference image, y is the noisy image, μ x and μ y are the mean of the true reference image block and the noisy image block, respectively, σ x and σ y are the variance/covariance of the true reference image block and the noisy image block, respectively, and C 1 and C 2 are constants used to avoid instability. The peak signal-to-noise ratio is defined as: where MSE is the mean squared error and n is an integer number representing the number of bits per pixel. When n = 8, i.e., in case of gray-scale images, the PSNR formula is reduced to: A study conducted by Hore and Ziou [26] has revealed that SSIM is less sensitive to additive noise than PSNR. They used F-score test to compare between SSIM and PSNR which works for AWGN. Thus, the final conclusion in our study is driven based on SSIM.
The experimental results of the denoising methods are shown in Tables 2 and 3. The tables show the performance of the patch-based denoising methods along with the famous pixel-based denoising methods: anisotropic diffusion (AD) by Perona and Malik [48] and the bilateral filter. In both tables, the methods are sorted from the oldest to the most recent.
The results in Table 2 are computed by measuring the differences between the original images and the denoised images. The default parameters shown in Table 1 were used for the BM3D method, these values are suggested by the authors of BM3D. The noise standard deviation of the noisy images, which methods depends heavily on it, is required to be adjusted accurately before the denoising process. In case the noise estimation was not given, Ghazi and Erdogan [22], Tai and Yang [56], and Liu et al. [37] offer more information about noise estimation. The highest values of SSIM and PSNR are highlighted with a bold font. Table 3 shows the execution time of the methods in seconds. The fastest patch-based methods are highlighted with a bold font.
The performance of the denoising methods varies depending on the noise level and the scene details inside noisy images. For example, the pixel-based bilateral filter outperforms the famous patch-based NL-Means method  Table 2. Figure 11 shows the performance of the methods with various levels of noise on each of the four images. The charts show that BM3D is the best method (from SSIM point of view) when the noise level is high. BM3D is the best whether it is used for denoising natural scene or synthetic images. PCA patch-based methods come second The two lines at the bottom of each chart presents the performance of pixel-wise methods: anisotropic diffusion and the bilateral filter. From the charts, we can conclude that the block-wise denoising methods perform better than the pixel-wise methods. When using σ = 10 and σ = 20, the results are similar except with the pixel-wise methods. The performance of the methods will be discussed in the following two paragraphs.
Bar charts are used in Fig. 12 to represent the methods performance when the noise level is low σ = 10. The charts show that patch-based denoising methods are similar. The pixel-wise denoising methods perform better when denoising the synthetic images due to its flat areas and the absent of details. In contrast, the patch-based denoising methods achieved better results when denoising the natural scene images. When the noise level is low σ = 10, the top two denoising methods for denoising the natural scene images are PHPCA and BM3D, whereas the top two denoising methods for denoising the synthetic images are BM3D and bilateral filter. Yet, the differences are insignificant.
The charts in Fig. 13 show the methods performance when σ = 20. By increasing the noise level, the contrast between methods becomes obvious unlike when the noise is low σ = 10. BM3D achieved better results than other methods in all images. The patch-based denoising methods perform better than pixel-wise methods. Figure 14 contains charts that illustrate the average execution time aspect for each image. The K-SVD methods are very time-consuming. They are about ten times as expensive as any other methods. Thus, K-SVD is excluded from the charts to make it easier to distinguish between methods. The NL-Mean comes second after the K-SVD. Although, the BM3D uses two stages to perform

Qualitative evaluation
The evaluation in this subsection is a subjective evaluation, where the quality of the denoised images is addressed via the visual perception. Denoised images with AWGN (σ = 40) are chosen to perform this evaluating. Figures 15, 18, 19, and 21 show the denoised images.
The results of denoising Barbara's image are shown in Fig. 15. Figure 15g shows the best achieved result by using BM3D method. While homogeneous regions are properly smoothed in BM3D, the sharp edges are preserved. The K-SVD and the PCA patch-based methods  Fig. 15f, j, k, l shows K-SVD and PCA patch-based results. They apply a good smoothing to the image, but some edges are destroyed. The non-local mean method applies less smoothing; thus, the textures shown in Fig. 15e in Barbara's pants are preserved; a zoomed version of Fig. 15e is shown in Fig. 16e. Figure 15h, i shows how the PPB methods apply a good smoothing, but they fail to preserve sharp edges and textures, i.e., Barbara's eye cover fold. Zoomed versions of Fig. 15h, i are shown in Fig. 17h, i. Figure 18 shows the results of denoising the House image. The best results are achieved by using BM3D and PPB methods, Fig. 18g-i shows these results. K-SVD and PLPCA methods have disappointing results, unlike their results when they are used for denoising Barbara image. Figure 18f, k show K-SVD and PCA methods results. The edges are preserved when using K-SVD, but K-SVD fails to smooth properly flat areas, i.e., the sky in the House image. Thin edges are not preserved with PCA patch-based methods. Figure 19 shows the denoised CurvedBand image. PPB methods and BM3D have the best results, the results are shown in Fig. 19g-i. Unlike BM3D, PPB methods succeed more in smoothing the gray gradations; zoomed versions of Fig. 19g-i are shown in Fig. 20g-i. K-SVD and PCA patch-based methods have similar results, Fig. 19f, j, k, l shows these results. They preserve parts of the curve, but they do not smooth properly flat areas; zoomed versions of Fig. 19f, j, k, l are shown in Fig. 20f, j, k, l. Figure 21 shows the denoised Chessboard image. Figure 21g shows BM3D result, BM3D achieves the best result when denoising the binary Chessboard's image.  Figure 21e shows the result of NL-Means method. With the K-SVD method shown in Fig. 21f, edges are preserved and the flat areas are smoothed appropriately. Disappointed results are achieved by using PCA patch-based methods; Fig. 21j, k, l shows the result of using PCA patch-based methods.

Summary of contribution
This paper provides a review of the state-of-art patchbased denoising techniques and compares their effectiveness for denoising natural, synthetic, and binary images. Effort is drawn on details like smoothing flat regions and objects and preserving details such as edges, lines, and textures. The features, strengths, and limitations of the patch-based denoising techniques are also presented. Moreover, the review covers the issue of time. Finally, based on years of experience, we believe that this review will be helpful for researchers to choose suitable denoising techniques to be adapted further for their image processing applications.

Future research directions
Like any other image denoising approaches, many important research directions should remain in patch-based image denoising. Improving patch similarity measures is suggested for grouping accurately similar patches. Moreover, the research should also produce tools with better shrinkage functions to suppress noise and preserve fine details. Finally, developing effective image transform strategies that meet the needs to differentiate between a true and noisy signal is strongly recommended.

Conclusions
Among the best denoising methods is patch-based denoising method, which includes BM3D, NL-Means, and K-SVD. This paper has dealt with the efficiency of each of these methods when compared to other patch-based denoising methods. Experimentally, BM3D method gives the best result; it performs very well on all images and at all levels of noise. K-SVD and PCA methods come second to BM3D. Although the sparsity is learned from the data itself in K-SVD, K-SVD fails to compete with BM3D. NL-Means gives an encouraging result at low levels of noise. Furthermore, PPB methods preserve the fine details but fails to do so with respect to the sharp edges.
K-SVD is incomparable with the other compared methods in terms of its time consumption. As it is very expensive. In contrast, BM3D execution time is the best among the patch-based denoising methods because it computes all similarity between whole patches first before starting the actual denoising process.