Deep convolutional neural network recovers pure absorbance spectra from highly scatter‐distorted spectra of cells

Infrared spectroscopy of cells and tissues is prone to Mie scattering distortions, which grossly obscure the relevant chemical signals. The state‐of‐the‐art Mie extinction extended multiplicative signal correction (ME‐EMSC) algorithm is a powerful tool for the recovery of pure absorbance spectra from highly scatter‐distorted spectra. However, the algorithm is computationally expensive and the correction of large infrared imaging datasets requires weeks of computations. In this paper, we present a deep convolutional descattering autoencoder (DSAE) which was trained on a set of ME‐EMSC corrected infrared spectra and which can massively reduce the computation time for scatter correction. Since the raw spectra showed large variability in chemical features, different reference spectra matching the chemical signals of the spectra were used to initialize the ME‐EMSC algorithm, which is beneficial for the quality of the correction and the speed of the algorithm. One DSAE was trained on the spectra, which were corrected with different reference spectra and validated on independent test data. The DSAE outperformed the ME‐EMSC correction in terms of speed, robustness, and noise levels. We confirm that the same chemical information is contained in the DSAE corrected spectra as in the spectra corrected with ME‐EMSC.


| INTRODUCTION
Infrared spectroscopy has become a popular tool in biology and medical and material sciences for investigating the structure and chemistry of intact materials [1][2][3][4][5][6][7]. The information content provided by infrared techniques for the investigation of biological samples such as cells and tissues is overwhelming and twofold: (a) Chemical functional groups absorb in the mid-infrared region, thus the obtained absorption signal is a precise and interpretable fingerprint of the chemical composition of the sample. (b) Cells and tissues are highly scattering samples and thus the acquired spectral data is distorted by scattering signatures, which carry information about cell morphology and shapes of the cell components [8]. In infrared microspectroscopy, infrared radiation impinges on thin biological materials and the radiation transmitted in forward direction (or over a numerical aperture) is collected by a detector, which is either a single element or a focal plane array (FPA) detector. This results in spectra that quantify the loss of radiation in forward direction with respect to the incoming radiation. However, radiation is lost in forward direction not only due to absorption by the material, but also as a result of scattering. The loss of radiation due to scattering is considerable in infrared microspectroscopy of cells and tissues, since strong scattering effects appear when the wavelength of the infrared radiation matches the size of the cell and tissue components. Scattering in infrared spectroscopy of cells and tissues can be well estimated by the Mie theory [9][10][11][12][13][14], which provides a formalism for calculating the so-called extinction efficiency determining the loss of radiation due to scattering and absorption as a function of the wavelength. From the extinction efficiency, the apparent absorbance spectrum can be calculated. We call it "apparent," since scattering effects increase the absorbance, implying that features of the absorbance spectrum are not only due to absorption, but also due to scattering. In the ideal case, when scattering effects are absent, absorption can easily be quantified by the absorbance, which is, according to Beer-Lambert's law, linearly dependent on the concentration of chemical components that absorb at a certain wavelength. However, scattering effects, in particular Mie-type scattering, lead to highly non-linear absorbance data and have therefore been considered a major obstacle for the interpretation and data analysis of spectra in vibrational microspectroscopy [15]. Scattering and absorption are entangled by highly non-linear mechanisms, which can be modeled by electromagnetic theory. For interpretation, data needs to be scatter-corrected, which involves the separation of scattering and absorption contributions in infrared spectra.
Scatter correction of infrared microspectroscopy and imaging data involves solving an inverse scatter problem, which aims at recovering the frequency-dependent refractive index from the measured scatter spectrum. An iterative method based on extended multiplicative signal correction (EMSC) meta-modeling (ie, emulation by data-driven modeling) of the Mie equations has been developed and provided to the infrared spectroscopy community [1,[10][11][12][13][14]16]. The state-of-the-art algorithm for estimating Mie-type scattering in infrared microspectroscopy and imaging of cells and tissues, the socalled Mie extinction EMSC (ME-EMSC) algorithm was published recently and is openly accessible [1]. However, a drawback of the existing ME-EMSC algorithm is that it is computationally expensive, and it is therefore of interest to investigate alternative and less computationally expensive routes. Machine Learning has in recent years been applied in most disciplines of science and has proven itself to be a useful tool for solving a wide variety of problems. In particular, Deep Learning has recently been used to solve an inverse scattering problem for image reconstruction in diffraction tomography [17].
In the field of vibrational spectroscopy, deep learning has been employed as an unsupervised technique for preprocessing of spectra [18,19]. In the current paper, we investigate if a supervised representation learning approach could replace the existing ME-EMSC algorithm, that is, if the process of Mie scatter correction could be learned by a Deep Learning model from an experimental dataset, which has been corrected by the ME-EMSC algorithm. For this purpose, we correct a dataset of infrared imaging data by the ME-EMSC algorithm and train a convolutional descattering autoencoder that predicts pure absorbance spectra from strongly Mie-distorted spectra.

| Convolutional neural networks
Deep learning is a subset of machine learning which utilizes neural networks, which can discover highly complex structures in large datasets and learn representations of data with several levels of abstraction. Neural networks are models containing multiple processing layers, which are usually optimized through backpropagation of errors [20]. The backpropagation indicates how to alter the internal parameters used to compute the representation in each layer from the representation in the previous layer, in order to minimize a predefined loss function quantifying the network's performance [21].
Convolutional neural networks (CNNs) are shiftinvariant, locally connected networks that were initially introduced to reduce overfitting and the number of free parameters, as compared to fully connected neural networks [22]. CNNs utilize the convolution operation and turned out to be an efficient way of exploiting compositional hierarchies, where high-level features are composed of lower-level features [21].
The convolution operation is computing a linear combination of the local input in a small window, and the set of weights for this linear combination is called the convolution kernel. The convolution kernel slides over the input and yields feature maps which preserves spatial structure.
Using convolutions for analyzing images allows for efficiently extracting information and is widely used in image processing. It is, however, difficult to decide a priori which kernels are optimal for a given problem, and hand-engineering algorithms based on convolutions is tedious and often demands domain expertise.
With CNNs, we circumvent this problem, since the kernels' weights are learned during training and the CNN itself decides which kernels to use to get optimal results for the task at hand. Having multiple convolutional layers allows the CNN to learn complex spatial hierarchical structures in the data. This strategy has, in conjuncture with the increase in computational power and availability of data, yielded state-of-the-art results on several benchmark datasets and led to CNNs becoming ubiquitous in analyzing images, video, speech, and audio [21,23,24].
CNNs can be applied in a plethora of cases for one-or multi-dimensional data, which have a spatial or temporal structure, and have also been applied in the field of spectral analysis. It has been demonstrated in several studies that CNNs can be used to extract salient features from raw spectral data [25].

| Autoencoders
Autoencoders are a particular class of neural networks of which the main objective is to learn robust representations of complex data [26]. Generally, an autoencoder is considered an unsupervised learning technique, which attempts to map its input onto itself. It consists of an encoder-decoder network sequentially defined as where the encoder e maps the original data into a latent space L and the decoder d maps from L back into the original feature space X . The autoencoder is trained by minimizing the reconstruction loss where x (i) is the training data and R is a metric quantifying the reconstruction quality, such as, for example, the mean squared error. To ensure that the autoencoder does not learn the identity mapping, the dimension of the latent space, denoted jLj, is often smaller than that of the original space j X j . This creates a bottleneck, through which the data must be squeezed. During training, the encoder learns to extract the most salient features of the data, from which the decoder, as faithfully as possible, reconstructs the original data. Autoencoders can learn particularly robust representations of the data by reconstructing it from a corrupted version of itself [27]. If we let F model the process of corruption and x c = F(x) be our corrupted data, we can train the autoencoder to minimize Rðx ðiÞ ðd∘eÞðx ðiÞ c ÞÞand thereby learn to encode the corrupted input and undo the corruption process, that is, the autoencoder learns the inverse function F −1 .
Autoencoders have been used for de-noising blurry images, as well as removing text or watermarks from images [28,29], and has thus been established as a means of learning mappings from corrupted images to the original images.

| Mie scattering
Mie scattering occurs when radiation impinges on scatterers whose morphology changes on the same scale as the wavelength of the incident radiation and which are, at least locally, approximately spherical.
Correcting Mie-distorted absorbance spectra involves the solution of a highly non-linear inverse scatter problem [1,10,[12][13][14][15] to retrieve the pure absorbance spectrum. When the pure absorbance spectrum is known, it can be used to estimate the scatter-distorted measured spectrum by using rigorous Mie theory. In the following, we let this process of Mie-distortion of absorbance spectra be denoted as ℳ. The map ℳ maps the pure absorbance spectrum on the scatter-distorted spectrum. Thus, solving the inverse scatter problem involves estimating ℳ −1 .

| Mie extinction EMSC
The inverse process ℳ −1 is very well approximated by the ME-EMSC algorithm [1], which estimates the pure absorbance spectrum from the Mie-distorted spectrum. This is achieved by an iterative process that gradually updates an estimate of the pure absorbance spectrum until the scatter-free spectrum can be predicted with high accuracy.
Mie scattering features in the measured spectra are estimated by adding scattering curves, denoted p i , to an multiplicative signal correction (MSC) model, where Z app is the measured spectrum, Z ref is the reference spectrum and A opt is the number of scattering curves. The parameters c, b, and g i are estimated by least squares regression of Z app onto the model spectra. The residuals are denoted ε. The scattering curves derive from a PCA subspace model where Mie scattering first is simulated for many possible realizations of scattering of spheres with different radii and refractive indices. This is done to take into account that these physical parameters generally remain unknown. The subspace model is calculated from the van de Hulst approximation to rigorous Mie theory, given as where ρ = 4πaν n −1 ð Þ, tanβ = n 0 n − 1 , a is the radius, and n and n 0 are the real and imaginary parts of the refractive index, respectively. Q ext defines the extinction efficiency in forward direction. While the real part of the refractive index, consisting of a constant and fluctuating part denoted n 0 and n kk , respectively, relates to wave propagation, the imaginary part of the refractive index is directly connected to the absorption properties of the sample. In fact, n 0 can be estimated directly from the absorbance spectrum of a sample, given that the absorbance spectrum is the so-called pure absorbance, Z pure , which is not affected by scattering. Since Z pure is exactly what we are searching for when we seek to remove scattering features from the measured spectrum, the problem is an inverse problem. The solution to this problem has been to implement the Mie correction as an iterative algorithm which is initialized by providing a range for the physical parameters a and n 0 , and a reference spectrum. The role of the reference spectrum is twofold. Firstly, the reference spectrum is used to estimate the imaginary part of the refractive index. The fluctuating part of the real part of the refractive index is then found through n 0 , using the Kramers-Kronig relation. To facilitate the estimation of n 0 and n kk , it is beneficial to choose a reference spectrum that is close to the true underlying pure absorbance spectrum. Secondly, the reference spectrum is used in the EMSC model for estimating the model parameter related to the optical thickness of the material and to normalize the spectra with respect to the effective optical path length.
The algorithm is initialized with a reference spectrum, which is assumed to be relatively close to Z pure . As the corrected spectrum is estimated iteratively, the residuals contain the chemical differences between the reference and the underlying pure absorbance spectrum. The residuals are therefore used to update the reference spectrum to a more suitable estimation of the pure absorbance spectrum. This implies that we use the corrected spectrum as a reference for the next iteration. With each iteration, the scatter free pure absorbance spectrum is gradually retrieved.
The algorithm is normally initialized with the same reference spectrum and physical parameters for one dataset. However, already for the next iteration, the EMSC reference spectrum is updated in the iterative algorithm, since the residual term in the EMSC model used for updating the reference spectrum depends on the chemical signals that are characteristic for each measured spectrum. Therefore, the iterative process needs to be run separately for each spectrum in the dataset. In each iteration, the subspace of Mie realizations, the Q ext curves, must be calculated since they depend on the updated reference spectrum, which makes the algorithm time consuming. The complexity of the model employing a computationally expensive formula such as the formula for the Q ext , the Hilbert transform used in the Kramers-Kronig relation, and the need for individual iterative calculations per spectrum, result in a time consuming correction procedure.
The role of the reference spectrum and its effect on the correction has been discussed in a number of publications previously. It has been shown that the state-of-theart algorithm for Mie correction [1] is less dependent on the reference spectrum than earlier versions of the algorithm [12,15]. However, it is expected that a high chemical variability may still require the use of multiple reference spectra. An algorithm that tests different reference spectra for one scatter-distorted spectrum and decides which is the best reference spectrum to be used, does currently not exist. A testing of multiple reference spectra is expected to increase the computation time further since the iterative algorithm has to be performed for each reference spectrum. However, when for example the tissue type is known for a given sample and respective reference spectra for this tissue type exist, different reference spectra could be used for different samples. This may improve the quality of the ME-EMSC correction if there is a high chemical variability in the dataset.

| Descattering autoencoder
In order to establish a deep convolutional descattering autoencoder (DSAE), we consider the effects of Mie-scattering as corruption of the pure absorbance spectra, and the ME-EMSC corrected spectra a representation of the "non-corrupted" pure absorbance spectra. We train an autoencoder to correct for the scattering effects and reconstruct the non-corrupted spectra. Formally, we let s i ð Þ c be the raw spectra corrupted by Mie scattering and be the ME-EMSC corrected spectra, and we train an autoencoder to minimize the reconstruction loss We want to let the autoencoder learn a robust representation of the chemical signals in the spectra and discard the signals coming from scattering. The concept is depicted in Figure 1.
We use a convolutional autoencoder architecture as depicted in Figure 2, consisting of blocks of convolutions, batch normalization, and ReLU-activations. After each block in the encoder, average-pooling is employed to down-sample the data, and in the decoder, we use inverse convolutions [30] for the up-sampling. We tried several architectures with different number of blocks, and in our final model we have 5 blocks in both the encoder and the decoder, and a total of 110 000 trainable parameters.* The first blocks have convolutional layers with larger kernels and many channels, where we start with kernels of size 23 and 64 channels. Then, we gradually decrease both the kernel-size and the amount of channels toward the bottleneck in the latent space, where we have four channels with kernels of size 3, and thus a 22 × 4 dimensional latent space. To train the model we try both mean squared error and absolute-square error and combinations thereof as training objectives, and use the RMSprop-optimizer. Furthermore, we use L2-regularization to constrain the weights of the network. L2 regularization is done by adding a term with the L2 norm of all the weights of the network to the training objective, such that no weights will be too large. The training time of the DSAE which was used in the following was roughly 5 hours.
The rationale behind this approach is not to strictly mimic the ME-EMSC algorithm, but rather to use the ME-EMSC corrected spectra as training data to teach the model how to separate the chemical and physical signals in the measured spectra. The encoder should thus extract the chemical information encoded in the peaks of the scatter-polluted spectrum. The convolutional kernels run across the measured spectrum to extract information about spectral bands and baseline distortions, such as their shapes at different scales. During training, the DSAE learns which spectral information to extract and which to disregard. The extracted information is encoded into the feature maps and eventually reduced into a compressed representation of the spectrum's chemical information in the latent space. From the spectral information encoded in the latent space, the decoder is trained to build the scatter-free spectrum. In this way the DSAE F I G U R E 1 An illustration of the idea behind the proposed approach. We train a deep convolutional descattering autoencoder to correct for Mie-scattering in accordance with the ME-EMSC. We train the DSAE on a set of spectra which have been scattercorrected with the ME-EMSC algorithm F I G U R E 2 Illustration of the descattering autoencoder architecture. The DSAE consists of several blocks of convolutional layers, batch-normalization, and ReLU-activation functions. We down-sample with average pooling and up-sample with inverse convolutions. The length of the convolutional layers signify the number of channels and the width the dimension of the data at that given layer learns to separate the physical and the chemical contributions to the measured absorbance spectrum and re-construct the pure absorbance spectrum.

| Fungal data
Our dataset consists of 16 FTIR hyperspectral images of single cells of the oleaginous fungal strain Mucor circinelloides VI 04473 † 4 . The images were obtained by a microscope employing an FPA imaging detector, resulting in infrared images of a size of 128 × 128 spectra with 1505 wavenumbers. The data were obtained in a study with the aim to establish a sustainable production of polyunsaturated lipids using oleaginous filamentous fungi. The fungal strain Mucor circinelloides VI 04473 was grown on four different growth media, where the growth media differed in the content of inorganic phosphate salts. For every growth condition we have 4 hyperspectral images of different parts of the fungi, which were collected with an Agilent FPA microscope. All images were recorded with 15× spatial resolution, and with a spectral resolution of 4 cm −1 . For most of the images, 128 scans were averaged in each pixel. An averaging of 512 scans were also used for selected samples, to gain a higher signal to noise ratio.
Before correcting the spectra with ME-EMSC, suitable reference spectra needed to be established. Since the different growth conditions used for the cultivation of the filamentous fungi created strong spectral differences in some regions of the spectra, different reference spectra which corresponded to different growth conditions were used for the ME-EMSC correction. The reference spectra were recorded with a high throughput system (HTS) measurement of cell populations of the same fungal strain, grown under the same conditions [31]. The HTS spectra are obtained from the homogenized fungal biomass and constitute the average infrared fingerprint of the bulk cell constituents for the given condition. The HTS-FTIR spectra were measured for each sample in three technical replicates using a High Throughput Screening eXTension (HTS-XT) unit coupled to a Vertex 70 FTIR spectrometer (both Bruker Optik, Germany). The region between 4000 and 500 cm −1 was recorded in transmission mode with an aperture of 5 mm, a spectral resolution of 6 cm −1 and a digital spacing of 1.928 cm −1 . While the spectral resolution is due to the respective optical setup and different for the HTS and the FPA images, the digital spacing is the spacing between the wavenumber readings in the respective dataset, which is set in the Fourier transform algorithm. For each HTS spectrum, 64 scans were averaged.
The HTS spectra were normalized by a standard EMSC [32,33] before serving as a reference spectrum for the Mie correction. The idea of using different reference spectra for the ME-EMSC correction is in general not applicable in a correction task. However, for a training dataset, conditions, tissue types are in general known. We were interested in achieving the best possible correction for the DSAE training dataset, which later could be used on new independent data without any prior knowledge about the growth condition. Therefore, the approach of assuming that the growth condition, tissue type, and so forth, is known, is valid.

| ME-EMSC correction of fungal data
We created our training data for the descattering autoencoder by using the ME-EMSC algorithm to correct roughly 60 000 spectra taken from 12 different images. We then evaluate our trained descattering autoencoder on approximately 20 000 spectra from four independent images, which were likewise corrected with ME-EMSC.
For all images, the ranges for the physical parameters n 0 and a were set to [1.2, 1.5] and [3 μm, 9 μm], respectively. The maximum number of iterations was set to 15. While for many spectra the algorithm did not completely converge after 15 iterations, our experience is that the corrected spectra are scatter free and the main chemical features are retrieved. After correction, a strict filtering procedure was applied to the spectra. Using the quality test, the background spectra were discarded, and only spectra corresponding to the sample area were used to avoid building models on background spectra. The background filter was established by considering the scaling parameter from a basic EMSC on the raw spectra [34]. Spectra which converged fast in the ME-EMSC correction, after four or less iterations, were also discarded. This was done based on the knowledge that the correction requires a higher number of iterations than four in order to retain chemical features of the spectrum being corrected. Furthermore, an assessment was made on how successfully the scattering features were removed from the spectra. This was done by evaluating the root mean square error (RMSE) in the inactive regions from a basic EMSC on the Mie corrected spectra. A relatively high RMSE value indicates that there is still scattering features left in the spectra after correction, and the spectra are therefore discarded.
It is important to note that it is not possible to obtain pure absorbance spectra from raw spectra with low quality. If the absorbance signals are very weak, such as at the edges of the sample, the signal to noise ratio might be too low and the ME EMSC algorithm is strongly effected by the reference spectrum, since there are no dominating chemical features in the raw spectrum. This can lead to corrected spectra which adapt mainly features from the reference spectrum, which often is detected by a low number of iterations before the algorithm terminates. To make sure that the DSAE is only trained on high quality spectra, the filtering routine explained above is applied.

| RESULTS AND DISCUSSION
In Figure 3, the ME-EMSC correction setup is shown for FTIR images of the fungal strain Mucor circinelloides grown under different conditions. Comparing to applications in the biomedical, the different growth conditions may correspond to different tissue types or different tumor types [35]. In the training data, the tissue types and the tumor types are annotated and specific reference spectra could be used for the ME-EMSC correction. Reference spectra can be obtained as average spectra. In our study, reference spectra were obtained as HTS reference spectra and several images from each growth condition were corrected as training data. As further illustrated in Figure 3, we trained one single DSAE using the set of scatter-distorted raw spectra and the corresponding ME-EMSC corrected spectra from all four conditions. This approach allows the single model to be used for a wide range of input spectra without prior knowledge of the reference spectra. It is important to stress that the DSAE is trained on all of these images simultaneously resulting in one single DSAE, and not one per growth condition. The DSAE was thereafter validated on an independent set of images with one image from each of the four conditions. Figure 4 shows raw spectra, spectra corrected by the ME-EMSC algorithm, and spectra corrected with the DSAE. We can see that both the spectra which were corrected by the ME-EMSC and the spectra which were corrected by DSAE are very nearly scatter free. Figure 5A-D shows the comparison of single spectra that were corrected by the ME-EMSC together with the DSAE corrected spectra. The corresponding raw spectra are shown in Figure 5E-H, that is, in the row below such that corresponding raw and corrected spectra are below each other. One spectrum from each growth condition is shown. We can thus conclude that the DSAE is indeed F I G U R E 3 The setup for training and validation of the descattering autoencoder. Spectra taken from 16 different samples from four different growth conditions are corrected, using the HTS spectrum for the given growth condition as a reference spectrum. The DSAE is trained on samples from all of the conditions simultaneously. Three images from each condition are used to train one single DSAE, and the remaining image from each condition constitutes the independent test set able to reconstruct the pure absorbance spectra and remove severe Mie-scattering features in accordance with the ME-EMSC algorithm. It has learned from the ME-EMSC corrected spectra how to correct scatter-distorted spectra, and which reference spectrum implicitly to use for the correction.
Cross-validation between the conditions was also considered, where one growth condition was removed from the training data at a time and the DSAE was used to correct spectra from the unknown growth condition. This corresponds to a situation where a model is applied to an unknown tissue type, which in most practical situations is not the case, since the tissue type is usually annotated in the dataset. Results are presented in Figure S1. When the growth conditions are unknown to the DSAE model, the height of some absorbance peaks are slightly differently scaled compared to spectra corrected by ME-EMSC using the HTS spectra from that given growth condition and the band ratios will thus differ slightly. However, the internal relations between different types of spectra are correctly represented, for example, the relation between band ratios in DSAE corrected spectra of lipid bodies and DSAE corrected spectra of hyphae are very similar to the relations in the same the spectra corrected with ME-EMSC. That is, we can still easily differentiate between spectra of lipid bodies and hyphae.
Knowing that the ME-EMSC correction to some extent depends on the reference spectrum, this is exactly what we would expect. Cross-validation between growth conditions is thus essentially equivalent to ME-EMSC correction with a sub-optimal reference spectrum. So we F I G U R E 4 Examples of FPA microspectroscopic imaging spectra of fungal strain Mucor circinelloides grown under different nutrient conditions. We show 100 Raw spectra, ME-EMSC corrected spectra, and DSAE corrected spectra F I G U R E 5 Results of Mie-scatter correction in filamentous fungi samples. A-D, Spectra corrected with the descattering autoencoder and with the ME-EMSC algorithm. E-H, The corresponding raw spectra recommend training the DSAE on spectra coming from several growth conditions and tissue types in order to obtain a model which can correct a wide range of raw spectra with large chemical variability.

| Speed
Correction of 1000 raw spectra with the descattering autoencoder takes 200 ms on a GPU and 750 ms on a CPU, while for the ME-EMSC algorithm it takes roughly 4-5 minutes. Thus the correction is at least three orders of magnitude faster for the DSAE. In order to train a DSAE, a training set of ME-EMSC corrected spectra is required. However, since a representative training set can be established using just a few images and the DSAE can be subsequently applied on a huge number of images, the speed improvement is considerable for the DSAE compared to the ME-EMSC. For routine analysis in a clinical setting, this means that the DSAE could perform the Mie correction in real time.

| Noise reduction
We see clearly from Figures 4 and 5 that, in addition to predicting the ME-EMSC correction very well, the DSAE reduces noise. The noise reduction could be explained by the convolutional nature of the DSAE as well as the fact that the DSAE gradually compresses spectral information into a smaller latent space, in which only the most salient features of the spectra are contained and through which the noise is not easily transmitted. Using CNNs with a relatively simple loss function such as MSE/MAE could, in theory, obscure also some of the relevant chemical variations in the spectrum, but we have not detected any significant loss of chemical information for the bulk of the spectra, compared to the ME-EMSC for the dataset used in this study.
To quantify the noise reduction we use the fact that nearby spectra in the image domain should generally be very similar, and the difference between neighboring pixels can, to a large extent, be attributed to noise. Therefore, we consider the absolute value of the difference of every pixel and the mean of its eight nearest neighbors and itself. This difference should for the large majority of the pixels in the image be very small, since the chemical signals in the spectra mostly does not vary much between neighboring pixels. We calculate this difference for all pixels of the image after removing empty pixels with EMSC b-parameter quality test [34] and use the median value within images of the samples as a measure of noise. We found the median difference to be about 0.003 for the DSAE corrected spectra and 0.013 for the ME-EMSC corrected spectra. Thus we can conclusively say that the DSAE yields less noisy spectra.

| Spectral information
To confirm that the bulk of the DSAE and ME-EMSC corrected spectra contain the same chemical signatures, we performed a PCA on the two corrected and filtered sets of spectra. In Figure 6A,B shows the PCA score plots of the first two principal components of the ME-EMSC and DSAE corrected spectra are shown. We see that the first two score vectors of a PCA of the DSAE corrected spectra separate the spectra coming from the four different growth conditions fairly well, and in accordance with the PCA scores of the ME-EMSC corrected spectra. In Figure S2, we show the loadings of the PCA, and in Figure S3a, we also consider the PCA score plot of the DSAE latent space representation of all the spectra and find that these spectra also cluster very nicely according to nutrient conditions. This illustrates that the relevant chemical information is contained in the latent space (bottleneck) of the DSAE. To further show the chemical similarity of the spectra corrected with the two different methods, we F I G U R E 6 PCA score plot of the spectra from four different growth conditions. Spectra are corrected with the ME-EMSC in panel A and with DSAE in panel B. The scores are colored according to samples grown under different nutrient conditions consider the variability within single FTIR images. To this purpose, we perform a PCA on the spectra from each image, after removing the spectra which do not pass the filtering routine described above. We use K-means clustering to cluster spectra of each single image into two classes based on the first and second principal components of the spectra of the respective image. For one image, the score plot for the first and second principal components are shown in Figure 7A,B for the ME EMSC corrected and the DSAE corrected spectra, respectively. The mean spectra of the obtained classes are shown in Figure 7C,D, respectively. The K-means clustering resulted in classes which are very similar for the DSAE and ME-EMSC corrected spectra. When evaluating four images, we found that, depending on the image, 65%-90% of the spectra are clustered equivalently to the ME-EMSC corrected spectra. We observed that for the image that was used to obtain the results shown in Figure 7, both the DSAE and the ME-EMSC corrected spectra are clustered mainly on the basis of the ratio of the Amide 1 peak at 1650 cm −1 to the polyphosphates peak at 1265 cm −1 , as well as the ester peak at 1745 cm −1 . Thus the main source of variability for both sets of corrected spectra is the same and they represent the same chemical information. In Figure S4, we also see that that the PC loadings are very similar for the DSAE and the ME-EMSC corrected spectra and that the clusters co-localize in the image domain.
Furthermore, in Figure 8 the distribution of the absorbance peaks for all spectra from an image from each of the four nutrient conditions at two different wavenumbers are shown as heatmaps. We see that the distributions for the DSAE spectra are fairly similar to the ME-EMSC spectra's distributions, and that they change in the same way for different growth conditions. This shows that the DSAE can be used for all different growth conditions and yield similar correction as the ME-EMSC with the reference spectrum tailored to the growth condition.

| Chemical imaging
In Figure 9, DSAE corrected infrared images are shown. We used all the spectra of the infrared image as input for the DSAE without applying any quality test. We observe that the DSAE corrected images yield informative hyperspectral images. Figure 9A shows the -C=O stretch at 1650 cm −1 which is characteristic for proteins and the cell wall components chitin and chitosan, also known as Amide I peak. Since the cell wall covers the whole cell and since proteins have relatively homogeneous distribution in the cell except the active lipid storage sites, we expect chitin/chitosan and protein related chemical signals in the complete cell area. We expect a higher F I G U R E 7 A and B, PCA score plots of the first and second component are shown for ME-EMSC corrected spectra and DSAE corrected spectra coming from one image of the independent test set. The scores are colored according to classes that were obtained by Kmeans clustering based on the first and second principal component. C and D, Mean spectra of two different classes are shown for ME-EMSC corrected spectra and DSAE corrected spectra, respectively intensity of the 1650 cm −1 peak at the edges of the cells due to the increased thickness of the cell wall. A very strong signal related to chitin and chitosan molecules is also expected at 3274 cm −1 ( Figure 9D). The 3274 cm −1 signal corresponds to the N-H stretch in chitin and chitosan. As before, we expect absorbance signals related to the N-H stretch over the whole cells and an even increased absorbance at 3274 cm −1 at the edges of cells. Furthermore, the two most important lipid-related peaks were analyzed: (a) The peak 1745 cm −1 ( Figure 9C) corresponds to the -C=O stretch in the esters in acylglycerides which are the main lipid storage molecules in Mucor circinelloides. (b) The peak 1710 cm −1 ( Figure 9B) corresponds to free fatty acids. From the DSAE chemical images of the 1745 cm −1 peak, we observe that the main lipid storage sites, that is, lipid droplets with readily synthesized acylglycerides, are located toward the center of the round-shaped cells, which is in accordance with microscopical observations previously published in Kosa et al. [36] The chemical image of the free fatty acids peak at 1710 cm −1 shows lipid synthesis active sites located at the edges of the cell close to the cell membrane and cell wall where the formation of acylglycerides occurs [37]. Further details on the biological interpretation of these results will be published elsewhere.
It is apparent from the chemical images shown in Figure 9 that salient chemical features can be revealed by chemical images of DSAE corrected infrared images. The obtained images have a high contrast and are not corrupted by noisy spectra. The DSAE correction has a de-noising effect as we already observed in Figures 4 and 5. The ME-EMSC corrected spectra can be rather noisy and this noise is particularly detrimental in chemical imaging and results in less information-rich images. If the noise is of the same magnitude as the pure absorbance signal, creating chemical maps requires further processing and de-noising. Therefore, it is a large advantage that the DSAE produces smoother and less noisy images. We can conclude that the DSAE results in highly interpretative chemical images, while the input spectra F I G U R E 8 Heat map of the normalized distribution of the absorbances at the polyphosphates peak at 1265 cm −1 (top) and the lipids peak at 1745 cm −1 (bottom) for the ME-EMSC and DSAE for the four different growth conditions F I G U R E 9 Descattering autoencoder-corrected hyperspectral image at four different wavenumber channels strongly scatter distorted (see Figure 5). It is further important to note that the DSAE was able to establish the chemical images without any need of quality control of spectra, for example, of background spectra and spectra from edges of cells.
To further compare the DSAE to the ME-EMSC algorithm, we cluster pixels on images using raw spectra, ME-EMSC corrected spectra and DSAE corrected spectra. Before K-means clustering, we reduced the chemical dimension with PCA to six principal components and performed K-means on the scores of these six components. Figure 10 shows the result of the clustering as well as the mean spectra for each class. When inspecting the spectra of the classes, we find that the DSAE appears to be able to correct spectra from all parts of the cell. We find that the spectra close to the edges of the cell (cell wall) show lower absorbance in peaks that are characteristic for lipids and show comparatively larger absorbance in chitin and chitosan related peaks, which is a typical indication of the cell wall structure. These features which are biologically meaningful are not as clearly provided by the ME-EMSC corrected spectra. Furthermore we found that the ME-EMSC algorithm resulted in several highnoise spectra, which formed separate clusters in the Kmeans clustering and complicated therefore the clustering procedure for the ME-EMSC corrected spectra. This shows that the DSAE yielded a robust representation of the chemical information in the spectra and not simply copied the ME-EMSC algorithm. We observe further in Figure 10 that the raw spectra can also be clustered by the same approach. However, clusters are mainly obtained according to shares of scatter contributions. It is important to note that scattering features can be correlated with chemical features. Therefore it is expected that clustering according to scattering features ( Figure 10C) results in similar clusters as clustering according to chemical features ( Figure 10A). For example, we see from the mean spectra in Figure 10F that there is a distinct difference in scattering in spectra deriving from the edges of the lipid bodies (class 1) and spectra deriving from the center of the lipid bodies (class 3). These differences correlate with differences in chemistry between the center and the edges of the lipid bodies. Therefore, we expect that the chemical images obtained from raw spectra and the chemical images obtained from DSAE corrected spectra show similar features since scattering features and chemical features are correlated.

| Independence of reference spectrum
We would like to emphasize that although one reference spectrum was used per growth condition when establishing the dataset with ME-EMSC, and all spectra of one image were corrected initializing the algorithm with F I G U R E 1 0 The results of K-means clustering on the pixel-spectra of the hyperspectral image, after the pixel-spectra had been reduced to their six first principal components. At the bottom the mean spectra in the different classes obtained from the clustering are shown the same reference spectrum, the spectra within one image may still have a high chemical variability. This means that while a given reference spectrum may fit for the majority of the spectra within one image, there could be some spectra which are chemically different within one image and for which the ME-EMSC algorithm may not work that well. The ME-EMSC in its present form needs to adhere to one reference spectrum for the entire image, while the DSAE was trained on a dataset that was established by using multiple reference spectra, and has therefore learned to correct spectra which with a large chemical variability. We therefore expect that the DSAE will use implicitly different reference spectra when correcting spectra within the same image.
The influence of the reference spectrum on the ME-EMSC correction can also be observed in Figure 6, where the spectra from the different growth conditions separate very clearly in the score plot for the ME-EMSC corrected spectra without any overlap of the clusters. Whereas the score plot of the DSAE corrected spectra shows some overlap of scores from different images. The reason for this is believed to be that the DSAE preserves the high chemical variability within each image since it is implicitly using different reference spectra for different parts of the images. The high variability is the actual chemical variability in the images. We consider therefore the result of the DSAE correction to be superior to the result of the ME-EMSC correction. To investigate this further, we consider the distribution of the heights of the absorbance peaks in one image with a particularly large chemical variability containing both lipid bodies and hyphae. We consider the peak distribution at two different wavenumbers, namely the Amide I peak and the ester peak, at 1643 and 1745 cm −1 , respectively. The high chemical variability in the image means that using only one reference spectrum may not be optimal for all spectra in the image. The histograms are shown in Figure 11, along with the height of the absorbance peaks for the reference spectrum.
We see that the absorbance values of the Amide I peak and the ester peak for the ME-EMSC corrected spectra are fairly tightly distributed around the absorbance of the reference spectrum. The DSAE corrected spectra show a stronger variability of the heights of these peaks. This indicates that the ME-EMSC corrections can be affected more by the reference spectrum, a tendency which could be successfully reduced in the DSAE correction. Therefore, we argue that the DSAE can correct spectra from different parts of the image which have very different chemical features very well.

| | CONCLUSION
In this paper, we have demonstrated the potential of a convolutional descattering autoencoder to correct for the scattering effects in FTIR microspectroscopy. We show that spectra pre-processed by the DSAE contain the same chemical features as compared to the gold-standard preprocessing with ME-EMSC, in addition to removing noise from the spectra. Moreover, our approach is much faster than the ME-EMSC algorithm, which is an important property if the method is to be used by pathologists in clinics when they need to pre-process the images in real-time.
F I G U R E 1 1 Comparison of distributions of absorbances in the ester peak at 1745 cm −1 in panels A and B and for the Amide I peak at 1643 cm −1 in panels C and D for ME-EMSC and DSAE corrected spectra. All the spectra are from the same image. The dashed red vertical line shows the absorbance of HTS reference spectra at the peak in question. Background spectra have been filtered out beforehand with the EMSC b-parameter quality test [34] The ME-EMSC is based on the Mie formalism and involves transformations such as the Kramers-Kronig transformation and singular value decomposition. The DSAE learns to perform Mie correction from data that is corrected by the ME-EMSC algorithm for a given application and is not expected to work for any type of input spectra. Therefore, the rationale behind our approach was not to fully replace the ME-EMSC algorithm for arbitrary input spectra, but rather to train the DSAE using ME-EMSC corrected spectra for a given application. The DSAE learns to extract pure absorbance spectra from spectra with a chemical variability which is inherent in a given application and defined by the cell or tissue types considered. This means in practice, that a DSAE needs to be trained for each application in the same way as classifiers need to be trained for each application. For a new application, where input spectra display completely different chemical signatures, a new DSAE needs to be trained.
We have shown that the DSAE is able to address and correct noise in the ME-EMSC corrected spectra and that it generally yields smoother and more stable corrections, which is particularly important for producing informative hyperspectral maps. The ME-EMSC in its current form is not designed for using multiple reference spectra, where reference spectra are adapted to a tissue type, condition, and so forth, since the tissue type, condition, and so forth is usually only known in the training stage. However, for establishing the DSAE, the tissue type only needs to be known for the training data. The trained DSAE uses then implicitly the most appropriate reference spectrum for a given correction task. For the establishment of the training set, we used different reference spectra in the ME EMSC that were adapted to the respective growth conditions of the filamentous fungi. The DSAE could then be trained on the spectra corrected with the best possible reference spectrum by ME-EMSC and it learned which reference spectrum to use implicitly in a correction task. The situation of using multiple reference spectra is a very prevailing issue in the analysis of tissues. For example, tissue sections for pathology contain different tissue types that can be chemically very different. We suggest therefore to use different reference spectra for establishing a training set by ME-EMSC, that is, one reference spectrum for each tissue type. This training set can be used to train a descattering autoencoder which can then be used for correcting spectra from any tissue type.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study will be made openly available in the Norwegian Center for Research Data at https://nsd.no.