A fully automated, faster noise rejection approach to increasing the analytical capability of chemical imaging for digital histopathology

Chemical hyperspectral imaging (HSI) data is naturally high dimensional and large. There are thus inherent manual trade-offs in acquisition time, and the quality of data. Minimum Noise Fraction (MNF) developed by Green et al. [1] has been extensively studied as a method for noise removal in HSI data. It too, however entails a manual speed-accuracy trade-off, namely the process of manually selecting the relevant bands in the MNF space. This process currently takes roughly around a month’s time for acquiring and pre-processing an entire TMA with acceptable signal to noise ratio. We present three approaches termed ‘Fast MNF’, ‘Approx MNF’ and ‘Rand MNF’ where the computational time of the algorithm is reduced, as well as the entire process of band selection is fully automated. This automated approach is shown to perform at the same level of accuracy as MNF with now large speedup factors, resulting in the same task to be accomplished in hours. The different approximations produced by the three algorithms, show the reconstruction accuracy vs storage (50×) and runtime speed (60×) trade-off. We apply the approach for automating the denoising of different tissue histology samples, in which the accuracy of classification (differentiating between the different histologic and pathologic classes) strongly depends on the SNR (signal to noise ratio) of recovered data. Therefore, we also compare the effect of the proposed denoising algorithms on classification accuracy. Since denoising HSI data is done unsupervised, we also use a metric that assesses the quality of denoising in the image domain between the noisy and denoised image in the absence of ground truth.


Introduction
Chemical imaging is an emerging technology in which every pixel or voxel of an image contains hyperspectral data, often consisting of hundreds or thousands of data points. The spectrum at each pixel resolves the chemical components at that point and, thus, provides the molecular profile of the sample [2][3][4]. Computer algorithms that can process the data to information useful for a particular problem often require a specific data quality, at that spectral resolution, that often determines scanning (signal averaging) time. In addition to the chemical signature of the data, another benefit of modern technologies including machine learning is that workflows can be automated with fully digital structure and function analysis of the data [5][6][7][8]. For example, Fourier Transform Infrared (FT-IR) spectroscopic imaging is emerging as an automated alternative to human examination in studying disease development and progression by using statistical pattern recognition [9][10][11][12][13][14]. For a practical protocol for tissue imaging, as demonstrated in at least one instance of tissue histopathology, the signal-to-noise ratio (SNR) of 4cm −1 resolution spectral data needs to be more than 1000: 1 [13]. To achieve this SNR, especially for the emerging high definition IR imaging [15][16][17], extensive signal averaging is required. The need for signal averaging increases processing time ðSNR � ffi ffi t p Þ, in turn, increasing acquisition time [18] to the extent that clinical translation becomes impractical. Signal processing approaches to reduce noise has previously been suggested to mitigate this crippling increase in integration time by mathematical methods to utilize correlations in data to reduce noise but suffer from two major drawbacks. First, given the large size of the data, the mathematical operations require computer processing often comparable to the acquisition time itself [19]. Second, such methods invariably try to separate data into informative and noisy components; subsequently, a manual selection step is required to identify the information-bearing components thus compromising the automation benefits of using spectroscopic imaging for tissue analysis [20].
One class of mathematical transform techniques for noise reduction utilize the property that noise is uncorrelated whereas spectra (signals) have a high degree of correlation. In a transform domain, hence, the signal becomes largely confined to a few eigenvalues whereas the noise is spread across all. Noise reduction can be achieved by retaining eigenvalue images that correspond to high signal content and computing the inverse transform. All the eigenvalue data contain signal and noise but the relative proportion of the signal to noise which forms a threshold criterion for inclusion of specific eigen-images in the inverse transform. Inclusion of too many will not allow for significant noise rejection, while inclusion of too few would result in loss of fine spectral features. Hence, identifying eigenvalues corresponding to high signal content is an important step in the noise reduction process.
One widely used algorithm was provided by Green et al. [1] that applies Minimum Noise Fraction (MNF) to order spectral components in terms of SNR in the transformed space. It assumes that the covariance for the raw data S Y and the noise S δ can both be estimated. Similar to principal component analysis (PCA) that orders the components in terms of variance, after transformation in MNF space, the top components are chosen and filtered and rest are zeroed out. This reduced basis is then used for inverse transformation into the signal space. Noise Adjusted Principal Components (NAPC) transform by Lee et al. [21] is a reformulation of the MNF transform in terms of noise whitening process. While MNF and NAPC transforms are mathematically equivalent, the latter consists of a sequence of two principal component transforms: First to whiten the data (de-correlate noise from data); Second to perform eigen decomposition on the modified covariance matrix, to order the underlying data by SNR.
Our goals in this work are to address the major challenges in noise rejection using mathematical methods. Specifically, first, we aim to provide criteria for unsupervised band selection, thereby maintaining objectivity of the data analysis protocol and reducing analysis time within the data processing pipeline by dispensing with the need for manual intervention. Second, we aim to re-examine the mathematical formulation of the MNF approach to speed up the process of computation of the forward and inverse transform MNF vectors. Specifically, we examine two novel approaches: (a) use of a truncated singular value decomposition (SVD) variant that is computationally more efficient, and (b) the use of a randomized variant of the above that is also memory efficient and has a reconstruction accuracy-memory trade-off depending on the application. Finally, we seek to improve the signal processing methods to provide a higher confidence in the consistency and accuracy of the noise rejection pipeline. We propose a comparison between acquired and denoised biomedical images using a robust metric, thereby providing better denoising guarantees in terms of both root mean square error (RMSE) and structural similarity.

Sample preparation and data collection
A paraffin embedded breast tissue microarray (BR1003) consisting of 101 cores were obtained from US Biomax, Inc. The unstained sections of the TMA were placed on a BaF 2 salt plate for IR imaging. The sections were deparaffinized using a 24h hexane bath. High Definition data was acquired using the Agilent Stingray imaging system with 0.62 numerical aperture and a 128 × 128 focal plane array. A spectral resolution of 4cm −1 along with a pixel size of 1.1μm was obtained at the sample plane. The final FTIR sample has a spatial dimension of 11620 × 11620 pixels and spectral dimension of 1506 channels.

Classification
A random forest classifier was used to differentiate between the different histologic classes of a tissue sample. Labeled pixels for each class were obtained by the cases annotated by a pathologist. The pathologist distinguishes dense from loose collagen just by looking at the density of the collagen fibers that are elucidated by the Hematoxylin and Eosin (H&E) stain. Pre-cancerous and malignant epithelium are also identified using the H&E stained images as these images are high resolution images that provide contrast in different cell types in the tissue. These images are used to delineate nuclear architectures, shapes and sizes. These cellular features along with the presence of basal cells and the extent of cellular growth are used to confirm the disease state. On the other hand, while infrared imaging does not provide as much spatial detail but offers a wide variety of molecular signatures that help in disease detection despite the lower resolution.
In this study, we have used a four class model separating benign epithelium from malignant epithelium. Finally, to assess the performance of the classifier, sensitivity and specificity is calculated for all the classes to generate the receiver operating characteristic curve. The area under this curve signifies the diagnostic potential of the model.

Methods
In practical situations, noiseless data d j is recorded by instruments as a noisy signal estimate y j , due to fluctuations in detector current, backgrounds and source/instrument factors, which is modeled as: where y j 2 R S is the actual data collected by the apparatus and d j 2 R S is the noise in the same pixel. The goal is to estimate δ j given y j , so we can best estimate the true signal d j such that the relevant features (peak positions, peak heights, relative peak spacing etc.) in the spectrum are preserved. Given a HSI Y orig 2 R W�H�S , where W, H are the spatial dimensions (width and height respectively), it has been restructured into a 2D matrix Y 2 R N�S , where N = (W × H) is the number of pixels, S is the number of spectral channels. Each column y i , 8i = [1: S] represents the reshaped image for the i th spectral band and each row y j , 8j = [1: N] represents the spectral signature for the j th pixel. The i th entry of the spectral vector y j 2 R S of each pixel y ij determines the absorbance value of the tissue at that wavenumber. Wavenumbers ν are defined as inverse of wavelength and have units cm −1 . The recorded value at each wavenumber has unit absorbance/au, where au is arbitrary unit.

Mathematical background
Assuming additive noise only, raw data can be represented as . ., Y S } and D and δ are the uncorrelated signal (actual spectral data with baseline included) and noise components of the raw data Y. Cov{Y} = S Y = S D + S δ , where S D and S δ are the covariance matrices of D and δ respectively. Noise Fraction (NF) for the i th band is defined as the ratio of noise variance to the total variance for that band. Similarly Signal to Noise ratio (SNR) for the i th band is defined as the ratio of signal variance to the noise variance for that band.
MNF is the set of linear transformation (Y MNF ) i = Yϕ i , for i = 1, . . ., S, such that the SNR for (Y MNF ) i is maximum among all linear transformations orthogonal to (Y MNF ) j , for j = i + 1, . . ., S. All the transformation vectors in MNF space follow Maximization of the noise fraction leads to a numbering of bands that gives decreasing image quality with increasing component number. The SNR for (Y MNF ) i in MNF space can be formulated: The Noise fraction itself can then be re-factored as follows: The vectors ϕ i are thus the real, symmetric eigenvectors of the eigenvalue problem: Hence ϕ i are the eigenvectors of S Y S À 1 d , and λ i , eigenvalue corresponding to ϕ i , equals to the noise fraction in (Y MNF ) i . Also, λ 1 � λ 2 � . . . � λ S , so that the components show decreasing image quality. Thus SNR is given by λ i − 1.

Geometric interpretation
As the data D and noise δ are additive and uncorrelated, we can write: Let the spectral decomposition of S δ be S δ = EΛ δ E T . Rotating S Y in Eq 7 with eigenvector matrix E and rescaling it using the inverse-square root of the noise singular values Λ δ , results in new covariance matrix where the contribution of noise component has been turned into an identity matrix.
Let the eigen decomposition of S W be S W = GΛ MNF G T . Rotating S W in Eq 8 with eigenvector matrix G results in: The series of transforms that we used to change the original data covariance matrix S Y into Λ MNF is given by the transformation vector F ¼ EL À 1=2 d G which we call the MNF projection vectors and Λ MNF estimates of SNR of the data.

Optimizing MNF
Expanding the entire MNF transform, we can infer the following: Since R is a block identity matrix R ¼ , introducing an extra R T term keeps the value of the expression unaltered as R � R T = R. Writing the MNF transform in this way, we ensure that we skip the costly matrix inversions of the MNF vectors. Also G � R is effectively choosing the top K eigenvalues of G. Therefore we can reduce the computation cost by finding the reduced rank-K SVD of S Y . The Λ δ matrix inversion can also be replaced by more efficient versions due to its diagonal structure.

Automatic band selection
The optimal value of K can be determined by inspecting the entries of Λ MNF = SNR + 1 which is a diagonal matrix. The Rose criteria [22] states that an SNR of at least 5.0 is needed to be able to distinguish image features at 100% certainty. We select the top K bands in the MNF space for which SNR = Λ MNF − 1 � 5.0. Automating this process is the main computational speed factor that brings down the processing time from days down to a few hours.

Fast MNF
By exploiting the MNF formulation (sec. Optimizing MNF), we can avoid all inverse operations and replace them with transpose, thereby making computations faster. Owing to the symmetric structure of covariance matrices, we also compute the singular value decomposition using eigen decomposition which is faster. Also, the transformation matrices are of size (S × K) instead of (S × S) where K � S. This is the main factor responsible for the algorithmic speedup.

Approx MNF
Since K � S, it is inefficient to compute the full spectral decomposition of the covariance matrix. Empirically, it was observed over different datasets, that the optimal value of the automatically selected K is 2 − 3% of the total number of bands S. Hence we compute only a rank K truncated SVD of the whitened covariance matrix. This results in reduced computation time as well as memory. The standard solutions to truncated SVD include the power iteration algorithm and the Krylov subspace methods. Since power iteration is unstable at times due to the structure of the singular values, we use a version of the Block Lanczos method [23]. We set K ¼ 0:03 � S and compute the rankK -SVD, then let the band selection criteria to decide the optimal K.

Rand MNF
Although the block Lanczos algorithm can attain machine precision, it inevitably goes many passes through S W , and it is thus slow when S W is large or does not fit in memory. To circumvent this scenario, we use a faster randomized and memory efficient version which computes theK -SVD of S W up to 1+ � Frobenius norm relative error [24].

Error metric
In the absence of ground truth images of denoised data, we use a non-reference image quality metric this is simple and easy to use. The Method Noise Image (MNI) [25] metric aims at maximizing the structure similarity between the input noisy image and the estimated image noise around homogeneous regions and the structure similarity between the input noisy image and the denoised image around highly-structured regions, and is computed as the linear correlation coefficient of the two corresponding structure similarity maps.

Setup
For all the experimentation, a standalone machine with Intel Xeon E5 − 1660@3.20GHz CPU and 64GB of RAM was used. Software for the simulations, results and plots include Matlab and ENVI. Table 1 shows the algorithmic time and space complexity in terms of memory usage for the different MNF versions. The best algorithm in terms of time-space-accuracy is Approx MNF. This is because it computes the best rank-K SVD with little loss in its approximation or memory usage. Depending on how much one wants a memory-accuracy trade-off, one may choose to switch to use Rand MNF, as it is a randomized version with approximation error guided by its parameters. For a typical FTIR spectrum with S � 1500 bands, the algorithm estimated the optimal number of bands to be K � 30, resulting in an efficiency factor of 50×.

Timing analysis
Standard MNF is worst in terms of time complexity as it computes the full eigen decomposition of S W ðOðS 3 ÞÞ and uses all the S eigenvectors to transform the input data into MNF space ðOðNS 2 ÞÞ. In comparison, Fast MNF still computes the full decomposition of S W ðOðS 3 ÞÞ, but uses the Band selection criteria to retain only the top K eigenvalues. Hence only the top K eigenvectors are used to transform the data into MNF space ðOðNSKÞÞ. For Approx MNF, we only compute a truncated rankK -SVD of S W , since it was empirically observed that the top K chosen bands lie withinK ¼ 2 À 3% of S, thereby highly reducing the computational cost ðOðS 2 KÞÞ. Again, only the top K eigenvectors are used to transform the data into MNF space ðOðNSKÞÞ. Rand MNF uses a randomized algorithm to compute a truncated rankK -SVD of S W , hence it is much more efficient in terms of speed ðOðnnzðS W ÞÞKÞ, as the values in S W will only be non-zero for channels which are correlated in terms of signal. Only the top K eigenvectors are used to transform the data into MNF space ðOðNSKÞÞ. Refer to Fig 1 for runtime. In all the three variants with S � 1500 and K � 30, the algorithmic scaling of SK instead of S 2 reduces runtime by � 50×. For the BR1003 data, the entire MNF denoising process was reduced from � 1 month to � 2 hours. This clearly shows that the larger the data size, the better is the scaling of the proposed algorithms, thereby massively reducing the computational time of denoising for large TMA and associated datasets. Note: Since the structure of the noise is unknown, we cannot instinctively perform the rank-K approximation of S δ , because depending on experiment and instrument there is no guarantee on the strength of noise present in the raw data. For this reason, in the first step of eigen decomposition, we choose to retain bands which accounts for 99% variance in the noise bands. This further reduces the computational time of the process (� 3× extra speedup).

Space analysis
Standard MNF computes the full SVD decomposition ðOðS 2 ÞÞ. The transformed data in MNF space contains all S bands ðOðNSÞÞ. Fast MNF again does the full SVD decomposition ðOðS 2 ÞÞ. However the data in transformed domain only contain K bands ðOðNKÞÞ. Both Approx MNF and Rand MNF compute the truncated rank K decomposition, hence there are K eigenvectors each of S dimension ðOðSKÞÞ. The data in the transformed domain contains only K bands ðOðNKÞÞ. For the FTIR data with S � 1500 bands and K � 30, we achieve a RAM space saving of � 50×, allowing us to process more data simultaneously in one go.

Denoising profiles
The improvements offered by the different versions of the MNF presented in this study are illustrated in Fig 2. The extent of denoising both in the spectral and spatial domain is approximately the same for all the different MNF algorithms . Fig 2A. depicts the spatial detail offered by different MNF versions with zoomed in sections in Fig 2C and 2B. shows the horizontal signal profile across the sample. Next, spectral profiles are compared across the different algorithms with reference spectrum (without MNF) to illustrate the extent of noise removal in  (Fig 2D). It can be seen that even with a speed up factor of � 10 there is no significant reduction in the spectral and spatial image quality.

Error metric
Along with evaluating the performance of the presented MNF versions by examining the tissue profile, we utilize the MNI (method noise image) metric [25] aiming to maximize the structural similarity between the input noisy image and the denoised image around highly-structured regions, in the absence of ground truth. Fig 3 shows the metric values for a core, over all the 1506 bands. A lower value of MNI indicate better denoising and structure preservation. This is evident from the fingerprint region (900 − 1800cm −1 ) which has very low values of Noise reduction and spectral chemical imaging MNI, while it is higher for the IR silent region (1800 − 2700cm −1 ). Since Standard MNF, Fast MNF and Approx MNF are almost the same in terms of reconstruction accuracy, all the plots from those three methods are same. The Rand MNF method which has a space-accuracy trade-off, produces similar results but with slight variations in most bands (notice the wavering in the plot).

Impact on tissue classification
Furthermore, we studied the effect of MNF based data processing on tissue classification models. In particular we have investigated the performance of different MNF algorithms against raw data for distinguishing cancer from benign breast cells. It can be seen in Fig 5, that for all the MNF versions, the Area Under Curve (AUC) value of the malignant epithelium (cancer) class is the same and there is a 10% drop in accuracy without the use of MNF. This suggests that for the development of highly accurate and efficient diagnostic models with HSI data, MNF gives better performance. Also, the MNF techniques presented in this paper (Fast MNF, Rand MNF and Approx MNF), offers the same performance as compared to standard MNF with a factor of 60× reduction in the processing time and 50× reduction in memory space required for computation. The case shown in Fig 5 belongs to a typical hyperplasia disease state that is known to be correlated with malignant signature but the extent is not well understood. This leads to slight variations in the projections as the some cells may be at the interface of benign and malignant classes. Further to illustrate that the projections are mainly different because of borderline pixels, projections of a benign case is also shown in Fig 6 where the epithelial assignments are similar for all the MNF approximations except at the edges.

Conclusion
In this paper, we demonstrate how to automate the band selection process in the MNF space, which drastically reduces the workflow duration of MNF denoising of TMA's from almost a month down to a matter of hours. We introduced three different optimizations of the MNF algorithm depending on the speed-memory-accuracy trade-off, resulting in a 60× runtime improvement and 50× memory efficiency. A well established error metric is also used which helps us decide the quality of denoising, in the absence of ground truth images. Similar classification performance of the suggested approaches as compared to conventional techniques suggesting the potential of the developed methods for computationally efficient analysis of big datasets for diagnostic applications. As a future work, we would like to make better approximations of the noise model itself, so that we can apply approximations for the eigen decomposition of the noise covariance matrix, hence further reducing the computational time of the process.
Supporting information S1 Dataset. A small set of the data (S13_noisy.mat) we used for our experiments has been uploaded as a supplementary file. Also in the supplementary files please find all our matlab code files (S_1_file-S_12_file) (.m). (ZIP)