Patch-based anisotropic diffusion scheme for fluorescence diffuse optical tomography—part 1: technical principles

Fluorescence diffuse optical tomography (fDOT) provides 3D images of fluorescence distributions in biological tissue, which represent molecular and cellular processes. The image reconstruction problem is highly ill-posed and requires regularisation techniques to stabilise and find meaningful solutions. Quadratic regularisation tends to either oversmooth or generate very noisy reconstructions, depending on the regularisation strength. Edge preserving methods, such as anisotropic diffusion regularisation (AD), can preserve important features in the fluorescence image and smooth out noise. However, AD has limited ability to distinguish an edge from noise. In this two-part paper, we propose a patch-based anisotropic diffusion regularisation (PAD), where regularisation strength is determined by a weighted average according to the similarity between patches around voxels within a search window, instead of a simple local neighbourhood strategy. However, this method has higher computational complexity and, hence, we wavelet compress the patches (PAD-WT) to speed it up, while simultaneously taking advantage of the denoising properties of wavelet thresholding. The proposed method combines the nonlocal means (NLM), AD and wavelet shrinkage methods, which are image processing methods. Therefore, in this first paper, we used a denoising test problem to analyse the performance of the new method. Our results show that the proposed PAD-WT method provides better results than the AD or NLM methods alone. The efficacy of the method for fDOT image reconstruction problem is evaluated in part 2.

Keywords: fluorescence diffuse optical tomography, image reconstruction, anisotropic diffusion, nonlocal means, structural information, multimodal imaging, regularisation (Some figures may appear in colour only in the online journal)

Introduction
Fluorescence diffuse optical tomography (fDOT) is an optical imaging modality that provides three-dimensional (3D) images of fluorescent source distributions inside biological tissue (Ntziachristos 2006). The image reconstruction in fDOT is very challenging due to illposedness of the inverse problem, which is a consequence of the multiple scattering nature of biological tissues. To overcome this problem, regularisation methods are commonly used to stabilise the solution. Usually, a quadratic (L2 norm) regularisation or penalty term is added to the least squares objective function to enforce smoothness in the reconstructed image (Hansen 1998, Zacharopoulos et al 2009. In Correia et al (2011) we proposed to use a nonlinear anisotropic diffusion regularisation method, which has the ability to smooth out noise while preserving edges in images, and showed that spatial localisation and size of fluorescence inclusions can be accurately estimated. In this method, a reconstruction step alternates with the regularisation step, i.e. a nonlinear anisotropic diffusion (AD) filtering step (Perona and Malik 1990). The AD method is a well-known and widely used image processing technique that provides satisfactory noise suppression and edge enhancement results. Buades et al (2005a) proposed the nonlocal means method (NLM), an image processing algorithm that surpasses the AD method. In the NLM method, pixels are averaged according to similarity between patches around pixels within a search window W, instead of a simple averaging strategy within a local neighbourhood. NLM exploits the redundancy of information within an image, by assuming that patches from different regions contain similar patterns and averaging them effectively reduces noise.
The NLM has superior denoising performance than local-based methods, but at the expense of higher computational complexity. Several methods have been proposed to accelerate the NLM without loss of denoising performance, such as a block-wise implementation of the NLM (Buades et al 2005b, Coupé et al 2008, preselection of similar patches based on mean and average gradient of patches (Mahmoudi and Sapiro 2005) or mean and variance (Coupé et al 2008), arrangement of patches in a cluster tree (Brox et al 2008), singular value decomposition (SVD) (Orchard et al 2008) and principal component analysis (PCA) (Tasdizen 2008). NLM using PCA compressed patches not only reduces the computational time, but also improves the denoising performance (Tasdizen 2008).
Another popular denoising/compression method is the wavelet shrinkage, which is based on the process of transforming an image into the wavelet domain (Donoho and Malik 1994, Donoho 1995, Donoho and Johnstone 1995. Important information is encoded by large wavelet coefficients, whereas most of small coefficients represent noise and can be removed using thresholding techniques. The denoised image is obtained by transforming the thresholded coefficients into the image domain. The NLM method has been successfully used in medical imaging, for example, as a denoising technique (Coupé et al 2012, Chen et al 2012, Dutta et al 2013, Chan et al 2014 and in image reconstruction as a regularisation method with (Chen et al 2008, Chun et al 2012, Wang and Qi 2012, Nguyen and Lee 2013. Here, motivated by the effectiveness of the NLM method in medical imaging applications, we propose a modification to our nonlinear anisotropic diffusion regularisation method for fDOT image reconstruction (Correia et al 2011). In this work, instead of the previous local strategy, we consider a patch-based approach where we use a robust edge-preserving potential function. We refer to this method as patch-based anisotropic diffusion (PAD). Moreover, we propose to reduce patch dimensions using wavelet compression, not only to reduce computational complexity but also to increase the robustness of the method to noise. Therefore, the new method combines the advantages of NLM, AD and wavelet shrinkage methods. We refer to this method as patch-based anisotropic diffusion with wavelet patch compression (PAT-WT). Since our new method is based on the NLM method, which was initially proposed for image denoising, we begin by assessing the performance of the PAD(-WT) method using a 2D denoising test problem (an ill-posed inverse problem). In the second part of this two-part paper, we study the performance of the PAD-WT method in fDOT image reconstruction as a regulariser.

Methods
Consider a noisy image of the form: where f true is the true image and ε is the noise. Denoising algorithms aim at removing noise from the observed noisy image f noise , returning an image as close as possible to f true . The NLM and AD are examples of widely used image denoising methods.

Anisotropic diffusion
In the image space f 2 ∈ R the discretised anisotropic diffusion method is given by the following iterative scheme (Perona and Malik 1990): where k is the iteration number, | is the number of neighbours (here i W 4, 2 ( ) | |= neighbours per direction) and g is an edge-preserving function such as Huber, Tukey, Total Variation, Perona-Malik, etc (Correia et al 2011). Here, we use the Perona-Malik function defined by ⎛ . The parameter T is the threshold and can be selected using the normalised cumulative histogram (NCH) of the gradient (Correia et al 2011). This method is commonly used in edge detection problems. The NCH indicates the probability ℘ of a gradient taking on a value less than or equal to the value X that the bin represents, i.e.
. It increases monotonically and the smoothness/sharpness of the curve indicates how smooth/sharp the edges are. The threshold can be calculated from the NCH by setting the threshold at, for example, 90 percent.

Patch-based anisotropic diffusion
We propose the following modification to the previous method: where w ij is the weight that measures the similarity between two neighbourhoods (patches with fixed size P N N = × ) centred at pixel i and j, in a search window W 2 1 2 1 ( ) ( ) = + × + W W centred at pixel i, where W is the maximum distance from pixel i (see figure 1 for an example). The weight w ij is defined as: where i N and j N are the neighbourhoods centred at pixel i and j, respectively. The parameter h controls the exponential decay and C(i) is a normalising constant given by: and where f ip and f jp represent the p element of the patches f i ( ) N and f j ( ) N , respectively. Therefore, the weights w ij are large when patches are similar and small when they are much different.

Nonlocal means
The new patch-based anisotropic diffusion method is based on the nonlocal means denoising method (Buades et al 2005a(Buades et al , 2005b: NLM can be considered as a one step of a fixed point iteration (Brox et al 2008), and denoising results can be improved by performing a few iterations: The iterative NLM method computes a weighted average of pixel intensities at each iteration. In our proposed method, pixel intensities are replaced by . Note that the patch-based anisotropic diffusion (PAD) method becomes the local AD when N 1 = and 1 = W .

Patch compression
NLM methods have superior denoising performance than local methods, but at the expense of computational complexity. Principal component analysis (PCA) has been used to reduce data dimensionality and speed up the NLM method (Tasdizen 2008). Furthermore, accuracy of the solution was shown to improve since principal components that represent noise, i.e. with small variance, are removed. We propose to use a wavelet transform (WT) compression method instead. In this method, the high frequency coefficients are removed, preserving the main information of the data, while removing noise. Compression is applied to the patches, which are previously converted into single dimension arrays to simplify and speed up computations.
The patch-based anisotropic diffusion method with wavelet patch compression (PAD-WT) is our proposed method.

Principal component analysis.
PCA is a statistical method widely used in data analysis and compression. PCA identifies similarities and differences in the data, by representing the signal in terms of a set of new orthogonal variables called principal components. The first and last components have the largest and smallest variance, respectively.
Consider a patch f i ( ) N with P elements, the mean of its elements f ī and covariance where Λ is a diagonal matrix whose elements are the eigenvalues and the columns of the orthonormal matrix U are the eigenvectors of the covariance matrix. The principal components are the eigenvectors ordered by descending magnitude of the corresponding eigenvalues. The first component is the principal component and represents the largest variance of the data. The dimensionality of the data can be reduced to d by projecting the patch where the d largest eigenvectors are kept, i.e. the first d columns of U, and the less significant components are discarded.

Wavelet transform.
The fast wavelet transform (FWT) decomposes a signal or function into different frequency subbands (Mallat 2008). It uses a low-pass filter H and a high-pass filter G to obtain the approximation c ϖ and detail d ϖ coefficients. The approximation coefficients represent the approximation of the signal at a resolution 2 r , where r is an integer that specifies the resolution level. The detail coefficients contain the details of the original signal, i.e. the high-frequency information. For a signal f n r n 1, i ( ) ( ) ϖ = + , with n samples and at a starting scale r + 1: Figure 1. Similar to the NLM method, the patch-based anisotropic diffusion method is based on patch similarity. A square search window W is defined surrounding pixel i. The method estimates pixel i by taking a weighted average of all pixels j within W. The weight w ij is calculated based on the similarity between two neighbourhoods or patches ( ) N f i and ( ) N f j , centred around pixel i and j, respectively. Similar patches give larger weights. In this 2D example, patches have dimensions = × P 3 3, hence, patch length is = N 3, and the window = × W 7 7 or = W 3, meaning that the maximum pixel distance from pixel i is 3 pixels.
Thus, the FWT consists of a convolution, followed by downsampling by a factor of 2 ( 2 ↓ ), i.e. keeping the even index samples. The wavelet transform can be implemented as a decomposition filter bank, where the initial signal goes through a series of filters. Note that the filters are related to each other by G n H n 1 1 n ( ) ( ) ( ) = − − . Data compression is used to reduce the dimensions of the patches for computational efficiency. If a two-level 1D FWT is applied to patch f i ( ) N , the obtained wavelet coefficients are r n r n r n 1, , 1 , , , The vectorised patch is compressed by keeping the largest coefficients, which contain most of the relevant information. Thus, the weights in equations (3) and (4) are calculated using: where ip ϖ and jp ϖ represent the p wavelet coefficient of the wavelet transformed patches f i ( ) N and f j ( ) N , respectively.

Evaluation
We used an image denoising problem to test the hypotheses that the proposed method is superior to NLM, anisotropic diffusion filtering and that WT is an efficient patch compression method. We used several images corrupted with Gaussian noise in the denoising test. First, we analysed the performance of the PAD-WT method for different wavelet types. Finally, we solved the denoising problem using the following iterative methods: (1) NLM (see equation (8)), (2) NLM with patch compression using PCA (NLM-PCA), (3) NLM with patch compression using wavelets (NLM-WT), (4) AD or local PAD (see equation (2)), (5) PAD (see equation (3)), (6) PAD with patch compression using PCA (PAD-PCA) and (7) PAD-WT. We considered two tests: Test (1) using a test image (figure 2) corrupted by Gaussian noise with different noise levels, ranging from 10% to 50% in steps of 10%, and Test (2) several standard test images (figure 2) with the same noise level (20% Gaussian noise). All the images had 320 320 × pixels. We assessed the denoising quality with a widely used figure of merit in image processing, the peak signal to noise ratio (PSNR) between the original image f true and the denoised image f: f PSNR 10 log max /MSE 10 true and NP is the total number of pixels in the image.
We solved the denoising problem for 20 noise realisations for each noise level and image. For all the methods, the iterative process was automatically stopped when PSNR PSNR k k 1 < + . We averaged the resulting PSNRs and calculated the corresponding standard deviations.
As mentioned previously, the parameters T and h can be calculated from the NCH by setting the threshold at a certain percentage. The PSNR was calculated for different possible combinations of percentages (for NLM-based methods we only calculated h). The parameters that resulted in the highest PSNR were the ones used to generate the results presented here. The time step was set to 1 τ = for the PAD and AD methods. All methods were run on a linux PC with an Intel Xeon E5-2665 CPU @ 2.4 GHz using Matlab Mex-files.

Wavelet types
We used Test 1 (left image in figure 2 with different noise levels) to identify a suitable wavelet type for the patch compression. The wavelet types tested were (Salomon and Motta 2009): Haar, Vaidyanathan, Daubechies of order 2, 4 and 6, Battle-Lemarié of order 1, 3 and 5, and Coiflet of order 1, 3 and 5. The patch and search window sizes of the PAD-WT method were set to N 3 = and 5 = W , respectively. The largest N N /4 ⌈( ) ⌉ = × wavelet coefficients are kept, where ⌈ ⌉ ⋅ denotes the ceil function. For a N 3 = patch, 3 = , which is considered to be the maximum compression for this patch size. Table 1 shows the averaged PSNRs (and standard deviations) obtained using the proposed method with different wavelet types to remove noise from an image corrupted with 5 different noise levels. The best results were obtained using Daubechies wavelets of order 4. Therefore, Daubechies wavelets of order 4 were used to compress the patches in the following analysis.

Comparison between denoising methods
First, we used Test 1 to compare the iterative NLM method with PAD and their local versions. As mentioned previously, the methods are considered local when N 1 = and 1 = W . We compared the performance of the previous methods to their equivalent with patch compression, using PCA or WT.
We calculated the averaged PSNRs and standard deviations of images obtained using the NLM and PAD methods, with and without patch compression, for window sizes 1, 3, 5, 7, 15 { } = W and patches of length N 3, 7 { } = . We retained a total of N N /4 ⌈( ) ⌉ = × wavelet coefficients for the NLM(PAD)-WT method. For the NLM(PAD)-PCA method, we kept 1/2 of the principal components. Table 2 shows the denoising results for Test 1 obtained using equation (8) with N 1 = and 1 = W (local filtering), NLM, NLM-PCA and NLM-WT methods. Table 3 shows the results obtained using AD, PAD-PCA, PAD-WT, and the respective elapsed time for one iteration. Results show that NLM-based methods perform better than local filtering. This test also shows that the PAD method performs better than the conventional NLM or AD. Similarly, the PAD-WT shows superior performance compared to the NLM-WT method. Better results were obtained using PAD(NLM)-WT compared to PAD(NLM)-PCA. The PAD-WT method provides results close to those of PAD without patch compression for lower levels of noise and gives better results for higher noise levels (>30%), with the advantage of being faster.
These results show that the size of the search window affects the performance of the PAD(NLM) method. This test suggest that for the PAD-WT method the optimal window and patch sizes are 7 = W and N 3 = , respectively. The results obtained using 5 = W are almost as good, with the advantage of being approximately two times faster to compute. For a window of size 10 = W and patch N 7 = , as used by Buades et al (2005b), the method is slower and the PSNR values are not higher.
In this study, we did not intend to perform an extensive study on the optimal number of wavelet coefficients that results in the highest PSNR value. Nevertheless, we tested other values and there were no notable changes in the results (not shown). Tasdizen (2008) found that, for a patch of size N 7 = , choosing d = 6 yields the highest PSNR. The performance of the NLM(PAD)-PCA for d = 6 and d t N N ⌈( ) ⌉ = × , where t = {3/8 , 1/4}, was also evaluated. The PSNRs values (not shown) do not vary significantly, and hence, the conclusions regarding the performance of the denoising methods remain the same. Nevertheless, for consistency, d was kept fixed since was also kept constant. Figure 3 shows the test image (Test 1) contaminated with different levels of noise and the best denoising results obtained using the PAD, PAD-PCA and PAD-WT methods. The proposed PAD-WT method is qualitatively very similar to the PAD and performs much better than the PAD-PCA method. However, note that the image background can be recovered using PAD-PCA if different T and h parameters are used, but at the expense of decreased PSNR values. Both PAD and PAD-WT demonstrate high denoising performance for noise levels up to 20%. For higher noise levels, low contrast features cannot be recovered. However, for noise levels up 40% the quality of the denoised images is still quite high. Iteration times are faster when wavelet patch compression is used compared to other PAD-based methods (table 3).
Test 2 was used to further evaluate the performance of the PAD-WT method. Denoising results for standard test images using PAD, PAD-PCA and PAD-WT are shown in table 4 and figure 4. For each method we used the window and patch sizes that gave the best Test 1 results for a noise level of 20%. The PAD-WT method returns slightly lower PSRNs than the PAD method for the Barbara and cameraman images, but higher values for the CT and MRI images. This denoising study suggests that the proposed method is suitable for medical imaging applications.
One of the advantages of using the AD method is the computational speed. However, NLM-based methods can easily be parallelised and implemented to exploit GPU (graphics processing units) acceleration (Cuomo et al 2014) to greatly reduce the execution times. Alternatively, to reduce the computational times, a semi-implicit scheme can be used, instead of the explicit discretisation used here, so that τ can be relatively large without causing numerical instability, which is particularly advantageous to reduce the number of iterations (Correia et al 2011).

Conclusion
In this study, we proposed a modification to the AD method, motivated by the effectiveness and simplicity of the NLM method for image denoising. The proposed PAD method uses a patch-based approach instead of a local neighbourhood strategy. Additionally, patches are wavelet compressed to speed up the method.
We used a denoising problem to compare our PAD method with the AD method, iterative NLM and its corresponding local model. The results obtained show that combining AD with NLM, i.e. the PAD method, provides better denoising results than these methods alone. Moreover, results show that the proposed PAD-WT method has superior denoising performance compared to the analogue method with PCA patch compression. In addition, this method is faster and achieves comparable results or, for high noise levels, even better results than those obtained when patch compression is not used.
Medical images are often corrupted by noise and artifacts intrinsic to the image acquisition process. Our study suggests that the proposed method is particularly suitable for medical imaging applications.
In the second part of this paper, we use the PAD-WT as a regularisation method in fDOT image reconstruction. A split operator method is used for solving the fDOT inverse problem, which is a two-step method, where a reconstruction step alternates with the regularisation step, i.e. the PAT-WT method.