Scattering correction for samples with cylindrical domains measured with polarized infrared spectroscopy

Scattering artifacts are one of the most common effects distorting transmission spectra in Fourier-Transform Infrared spectroscopy. Their increased impact, strongly diminishing the quantitative and qualitative power of IR spectroscopy, is especially observed for structures with a size comparable to the radiation wavelength. To tackle this problem, a range of preprocessing techniques based on the Extended Multiplicative Scattering Correction method was developed, using physical properties to remove scattering presence in the spectra. However, until recently those algorithms were mostly focused on spherically shaped samples, for example, cells. Here, an algorithm for samples with cylindrical domains is described, with additional implementation of a linearly polarized light case, which is crucial for the growing field of polarized IR imaging and spectroscopy. The approach is tested on a polymer fiber and on human tissue collagen fiber. An open-source code with GPU based implementation is provided, with a calculation time of several seconds per spectrum. Optimizations done to improve the throughput of this algorithm allow the application of this method into the standard preprocessing pipeline of small datasets.


Introduction
Fourier-Transform Infrared Spectroscopy (FT-IR) and Imaging are rapidly developing and highly utilized methods, mostly due to the wealth of biochemical information encoded in infrared (IR) spectra.Reasonable speed of measurements, high Signal-to-Noise Ratio (SNR) and good sensitivity cause IR based techniques to be widely used in such fields as material science [1], [2], biology and medicine [3]- [5], archeology [6], [7], environmental science [8], forensic [9]- [11] and others.Apart from qualitative analysis targeted at compound identification, there is also a strong quantitative aspect of IR spectroscopy.Both areas of applications demand spectra to contain only absorption related information.Unfortunately, this requirement is very often not fulfilled because of the scattering artifacts.Such artifacts are especially pronounced for the spherical and cylindrical samples with a size comparable to the wavelength which is often the case for cells, tissue fibers and others.Scattering theory for spherical samples and small particles [12], [13] has been developed decades ago and explains the basics of the artifacts observed in IR spectra, allowing to distinguish between absorptive and scattering origin of absorbance variations.Many implementations have been developed with e.g.Extended Multiplicative Signal Correction (EMSC) algorithms based on full Mie theory or van de Hulst approximation targeted at scattering effects removal [14]- [17].In the past, those algorithms mostly focused on cells, which due to their morphology, can create perfect conditions for Mie scattering.Yet, over time, they were also extended to aid scattering by tissue samples [18]- [20].By nature, EMSC based algorithms are relatively computationally expensive, which despite the hardware development, makes their application on a large scale still problematic.Thus new advances using Neural Networks and augmentation approaches are also explored [21], [22].
A second important geometry for analysis, that is gaining attention, are samples with cylindrical domains [23]- [25].From collagen fibers in tissues, hair to microfibers, the cylindrical shape will distort the analytical IR signal.The theoretical approach for cylinder's scattering is well described [26], [27].However, the application of full Mie type scattering theory for infinite cylinders in EMSC algorithm was done only lately by Rasskazov et al. [28], considering unpolarized light and including a new workflow to optimize computation time.Yet, the polarized light case was still not addressed, which, considering cylinder's reduced symmetry, cannot be omitted as it is in the case of spherically symmetrical samples.Figure 1 presents how predominant are scattering artifacts for cylindrical samples considering unpolarized (a), parallelly (b), and perpendicularly (c) polarized light.This is an example of a Polycaprolactone (PCL) fiber model sample, specifically made for this study, with its dimensions strongly inducing scattering artifacts.
IR studies with linear polarization control are currently of interest [29], as many researchers apply it in molecular orientation studies [30]- [33], including studies focused on cylindrically shaped samples [34].Not only spectra themselves are relevant, but more advanced analysis of the integral intensity of vibrational bands allow calculation of in-plane orientation functions and more recently, retrieval of full 3D angles of a transition moment orientation [35]- [37].Considering that, there is a growing need for proper correction of polarized IR light scattering from cylindrical domains, in this research, an opensource, EMSC based algorithm for scattering artifacts correction in cylindrical samples has been developed.The algorithm considers different states of linear polarizations as an extension of the unpolarized light case reported by Rasskazov et al. [28].Optimization of the potentially extended calculation time was done via GPU based calculations allowing to introduce this method into the preprocessing pipeline of relatively small datasets.The developed algorithm was also tested with an isolated collagen fiber sample.An open-source, GPU based, version of MATLAB implementation is available, and referenced in the Code availability section.

PCL fiber sample preparation
Polycaprolactone (PCL) of Mn = 80 000 g/mol was purchased from Sigma Aldrich Sp. z o.o.(Poznan, Poland).Chloroform for analysis (99.8%), amylene stabilized was obtained from Idalia Sp. j. (Radom, Poland).In the first step, a polymer solution was obtained by dissolving 0.2 g of PCL in 40 mL of chloroform.The system was left on a magnetic stirrer for 24 h at room temperature to allow complete dissolution.The solution was cast onto a glass laboratory dish and placed in a fume hood for evaporation for the next 24 h.The obtained thin film was melted at 80°C as PCL has a melting point of about 60°C.At the same time, a calcium fluoride (CaF2) slide was heated up to 60°C.The thin PCL fiber was manually pulled out using a steel needle from the melt and placed on the CaF2 slide.Next, the sample was rapidly cooled down to room temperature so the fiber would not melt.

Tissue fiber sample
Breast tissue microarray (TMA) was purchased from Biomax Inc.One slice was placed on a BaF2 plate for transmission measurements, whereas the adjacent slice was mounted on a regular glass and H&E stained for visual inspection and structures identification.An isolated fiber from one of the tissue cores was selected for further FT-IR analysis.

IR measurements and data preprocessing
Samples were measured in transmission mode with Bruker Vertex80v spectrometer coupled to a Hyperion3000 Microscope with 64x64 FPA MCT detector and 15x objective (NA = 0.4) giving 2.7 μm projected pixel size.Spectral range cover region from 3850 cm -1 to 900 cm -1 , with 8 cm -1 resolution and zero filling factor of 2, giving a spectrum with 765 points.Data were collected for unpolarized light and with four linear polarizations: 0°, 45°, 90°, and 135° (corresponding to the fiber's orientation).The stage was specifically programmed to repeat the same positions, thus all five datasets (for each sample) are spatially aligned (to the accuracy of the stage movement, i.e. below 1 µm) and spectral information might be compared between measurements.During PCL fiber measurements sample and background signals were co-averaged 32 and 64 times, respectively, whereas tissue sample was scanned 4 times for the signal and 64 times for the background.Background measurements were done individually for each polarization.Datasets were individually denoised with Principal Component Analysis with 15 components used for signal reconstruction [38].

Resonant Mie scattering EMSC
The current status of EMSC based algorithms for Mie scattering have decades of development behind them, starting from simple cases to later on extending the original algorithm to be more accurate and tackle more realistic cases in terms of physics.In that context, Resonant Mie Scattering (RMieS) EMSC algorithm has been relatively recently developed, as a result of the progress in the comprehension of dispersion artifact [15], [39], responsible for the sharp decrease in absorbance band on the high wavenumber side.As described by Bassan et al., a change in real refractive index near the absorption band causes a reduction of resonant Mie Scattering efficiency [15], [39].Thus, the non-resonant Mie scattering algorithm, previously proposed by Kohler et al. [16], was extended by the implementation of the Kramers-Kronig (KK) transform for real refractive index calculations.An example of the KK transform for the determination of the real refractive index  based on IR spectrum for PCL is presented in Figure 2b.A simplification has been done by denoting IR spectrum as , due to its relationship with the imaginary part (absorptive) of the complex refractive index.This, along with ranges of possible values for average refractive index and particle radius are later used to calculate a set of scattering efficiency   curvesone curve for each specific combination of mentioned parameters.General model for RMieS-EMSC allows to decompose (correct) recorded signal  ⃗  as follows [14], [15]: where  ⃗ is the constant baseline,  is the gradient of the sloping baseline,  ⃗ is the wavenumber vector, ℎ is the magnifying factor of the reference spectrum  ⃗  ,  is the number of principal components (7 gives satisfactory level of explained variance),   and  ⃗  are loading and its weights (describing scattering efficiency impact), respectively, and  ⃗⃗ are residuals.In general, correction of the recorded spectrum is done by finding values of , , ℎ and   using linear regression.However, this is an iterative process, graphically presented in Figure 2a.The first iteration requires a reference spectrum for the refractive index calculations, whereas, further iterations use information from the previous step.Crucial step in each iteration is finding  ⃗  (loadings) which contain the scattering information.This is done by PCA decomposition of the set of scattering curves   , usually 1000 curves, calculated for permutations of physical parameters such as particle radius and refractive index.This RMieS-EMSC algorithm has been thoroughly described in previous publications, thus more detailed explanation can be found in [14], [15].

Scattering efficiency 𝑄 𝑠𝑐𝑎 for cylindrical samples
Scattering efficiency calculations are the crucial part of described RMieS-EMSC method, providing physical background for the correction algorithm.Up to recently, most papers focused on Mie scattering correction for spherical samples, as it has a wide range of applications, for example during cells measurements.In this case, two possible options for Mie scattering efficiency calculations are available.First of all, using full Mie theory [12], which is physically more accurate, but at the same time very computationally expensive due to the need for Bessel's spherical functions calculations [14].On the other hand, there is a van der Hulst approximation [40] giving major a time advantage, yet, being less accurate [15], [16].Currently, a focus has been pointed on the extension of the algorithm to an infinite cylinders case, which could be a good approximation of fibrous samples.Recently, Rasskazov et al. published an EMSC based approach where the theory for an infinite cylinders scattering is applied [28].One of the differences from spherical samples is that cylinders have rotational symmetry, thus, light polarization can no longer be omitted in this case.Even though neither discussed nor applied, theoretical approach for cylinders' scattering of linearly polarized light was provided in [28] and it is as follows: where  ,∥ and  ,⊥ are scattering efficiencies per infinite cylinder unit length for the parallel and transversal illumination, respectively,  is a size parameter and  is a spherical function order.Other parameters -  and   along with  are described in the supplementary data.Presented formulas are also used to calculate the scattering efficiency for unpolarized light [28]: An extension for linear polarizations other than parallel and transversal is also provided in Supplementary Materials.An example of scattering efficiencies (  ,  ,⊥ and  ,∥ ) for a theoretical PCL cylinder with 7.5 μm radius is shown in Figure 2c.It is also compared to a scattering of a spherical sample of the same radius, calculated through the van de Hulst approximation used in RMieS-EMSC [15].It is clearly visible, that there is a major difference between sphere's and cylinder's scattering and how different polarization cases influence scattering.Note that for a cylinder, a high frequency fine structure resulting for scattering is superimposed on the spectrum.Such a high frequency signal may overlap with absorption bands and hinder quantitative spectral analysis..More examples of quantitative and qualitative differences in spheres' and cylinders' scattering character, including the impact of the varying angle of incidence ζ, are provided in [28].

Algorithm development and GPU acceleration
The development of a scattering correction algorithm for cylindrical samples was based on an open source code provided by Bassan et al. for RMieS-EMSC.The key part was to replace van de Hulst approximation of  for spheres, with the scattering formulas for cylinders described in the previous paragraphs.Also, a necessary parameter ζ, describing the angle of incidence was added and the normal incidence was assumed for all further calculations (ζ = 90°).A general workflow of the new algorithm is presented in Figure 2a.Apart from data to be corrected, its input requires a reference spectrum, with PCL film spectrum (almost free from scattering) used in this case.Implementation of the full theory for scattering efficiency (Mie type) calls for advanced programming solutions due to the time and computing power needed for Bessel's spherical functions calculations.Rasskazov et al. [28] suggested to use a new way based on interpolation of   from data coarsely simulated for a wide range of  and mean refractive index.Here, a classical approach with GPU calculations was applied, considering major progress in graphic card development.It is worth mentioning, that not only Bessel and Hankel's functions were transferred to GPU, but also other parts of code required for   calculations (differentiation and coefficients calculations).This required further optimizations to apply code vectorization, but more information is provided and discussed in the Supplementary Material.Calculations and time estimation were done using two machines: 1) Desktop working station Dell Precision 5820 with Intel Core i9-10900X processor, 256 GB RAM memory and NVIDIA Quadro P6000 24 GB GPU 2) Server Dell PowerEdge R740 with 2x Intel Xeon Gold 6254, 12x 64GB RDIMM RAM memory and single NVIDIA Tesla v100 32GB GPU.
As the Quadro P6000 GPU is not an optimal card for double precision scientific calculations, the algorithms were developed using the first (desktop) computing station, but later their final performance was tested with a Tesla v100 card, dedicated to clusters in supercomputing centers.Assuming a spectrum of 765 variables, the calculation time per single spectrum with a Tesla card was 3.98 s, which gives 0.398 s per iteration (10 iterations), and 9.62 s for Quadro (0.96 s per iteration).This time scale is already optimized, but it will give even faster results for samples with smaller radii (the size parameter defines the order of Bessel's functions as described in Supplementary Materials).Here, the calculations were done assuming the cylinder's radius varied within the 5 to 15 μm range.Whereas, for smaller samples, for example from 2 to 8 μm range, the calculation time was 2.6 s per spectrum for Tesla.

Results and discussion
As was mentioned in the Materials and Method section, a model PCL fiber sample with a radius varying from 7.5 μm to 12 μm, was crafted specifically to exhibit strong scattering properties.An optical microscope image of the sample is presented in Figure 3a.The goal here is to test the developed algorithms for unpolarized and linearly polarized IR radiation using a highly scattering model sample, observing at the same time if the scattering pattern is similar to scattering efficiency calculated theoretically.Additionally, results are compared to the RMieS-EMSC method for spherical samples, which also has an open source code available.

Unpolarized IR radiation
Figure 3 presents the correction done for the unpolarized light case.As the cylindrical sample has a varying size (radius varying from 7.5 to 12 um), it makes perfect conditions to analyze scattering with different physical conditions (different radii).Here, a case of 7.5 μm and 11 μm radius was analyzed and is shown in Figure 3 a-d and Figure 3 e-h, respectively.Based on theoretically calculated scattering efficiency for a 7.5 μm radius of the cylindrical sample shown in Figure 3a, major scattering artifacts were expected to appear in the experimental spectra.This is exactly what is observed for non-corrected spectra coming from the cylinder's cross-section and shown in Figure 3b (legend for spectra is provided in Figure 3a, along with spatial location marked on IR image).As expected, strong scattering is observed for those spectra, having the specific shape predicted by  curves.Artifacts are most prominent in the 3000-1700 cm -1 region, with a small number of absorption bands.However, the fingerprint region is also strongly distorted, although, the distortion is covered by absorption bands.After visual inspection of correction results, it might be concluded that the algorithm eliminated most of the scattering component (Figure 3c), which is indicated by the absence of characteristic shape and strong intensity reduction observed around 1000 cm -1 .Moreover, Cylinders EMSC results look very similar to the RMieS-EMSC algorithm results (Figure 3d) with subtle qualitative and qualitative differences in absorption bands' intensities.Both algorithms had a problem with spectrum number 5 (grey), as the strong scattering region falls around CH2 absorption bands.RMieS-EMSC seems to be doing slightly better in this case.On the other hand, both algorithms seem to struggle in the 3500-3000 cm -1 region, with Cylinders EMSC doing a little better.Very similar tendencies are observed for a larger radius case.Theoretical scattering efficiency curves changed their sine-shaped characteristics for higher frequency (Figure 3e), which is also clearly observed in the experimental spectra (Figure 3f).Smaller wavy distortions appearing on top of a main characteristic, became sharper and narrower.Thus, less likely to be observed in the spectra.As more spectra have a scattering peak around the CH2 band, both correction algorithms are not ideal in this case, with Cylinders EMSC doing definitely better in the 3500-2500 cm - 1 region.Overall, considering extremely high scattering artifacts (enhanced by almost perfect physical conditions for their occurrence) present in experimental spectra, the developed algorithm for cylindrical samples is highly effective.Its introduction into the preprocessing pipeline allows performing quantitative analysis using corrected spectrawhich was not possible before.Scales on visible and IR images are the same and the dimensions might be referenced using the fiber's radius.

Linear polarization of IR radiation
Analysis similar to the one presented for unpolarized light was conducted for linear polarization of incident IR radiation case.Again, scattering from the cylindrical sample with radii equal to 7.5 μm and 11 μm is analyzed.However, instead of the cylinders cross section, spectra from the same pixel coming from different polarization datasets are presented in Figure 4.This requires for cylinder's orientation to be well defined to determine the polarization orientation.Interesting differences in scattering efficiency are observed for different polarization cases (Figure 4a and e).Those differences might be crucial for the spectroscopic bands' appearance in experimental absorption spectra.Observed structures also show how distinct is RMieS-EMSC scattering model for spheres and expose the need for cylindrical model applications.The characteristic structure (wave) most observed for perpendicular polarization can be observed in Figure 1c.In both cases, raw spectra show extremely high scattering artifacts with the sinusoidal pattern well predicted by the theory.The second case of 11 μm radius exhibits a higher frequency of scattering pattern changes (Figure 4f) than is observed in Figure 4b.Also, as the spectra come from the same spatial location, the phase of the sinusoidal shape is very similar for all polarization datasets.Analyzing the effectiveness of the algorithms, PCylinders EMSC seems to be doing definitely better than RMieS-EMSC, especially in the 3500-3000 cm -1 region.Furthermore, for the 11 μm radius case, the carbonyl stretching region around 1720 cm -1 seems to be problematic for both algorithms (Figure 4 g-h), but with PCylinders EMSC doing better.

Tissue fiber validation
PCylinders EMSC algorithm implementation with linear polarization of incident light was tested using a tissue sample.The region of isolated collagen fiber, presented in Figure 5a, was used for evaluation due to its high scattering properties.Spectra from a single pixel measured with different linear polarization are presented in Figure 5b.Despite the fact that strong scattering is observed in this case, there is no specific sinusoidal shape in the baseline.However, spectra perpendicular (90°) to the fiber's axis show the biggest scattering level, whereas parallel one (0°) is the smallest.45° and 135° polarization have very similar scattering characteristicsas expected since the physical conditions are the same.The PCylinders EMSC algorithm manages to correct those spectra perfectly, which is mostly visible in the 2700-1800 cm -1 region (Figure 5c), and provides slightly better correction than RMieS-EMSC (Figure 5d).Furthermore, as polarized light absorption depends on its alignment with the dipole moment, all spectra should show variability in the absorbing parts, especially in fingerprint regions.RMieS-EMSC seems to force spectral alignment as if it is following the reference spectrum.In general, PCylinders EMSC seems to be doing very well in the case of tissue fiber, preserving absorptive differences in spectra coming from polarization orientation with respect to dipole moments orientation.This highlights the potential of this method in molecular orientation studies, based on polarized FT-IR.

Conclusions
As the polarized FT-IR field is rapidly gaining new applications, especially in molecular orientation studies, targeted preprocessing methods considering physical properties and parameters need to be developed.Therefore, an EMSC based scattering correction algorithm using full Mie-type scattering theory for samples with cylindrical domains (Cylinders EMSC) and considering the state of light linear polarization (PCylinders EMSC) was developed, with MATLAB open source code available.Due to computational requirements, GPU was used for the code acceleration resulting in a calculation time of several seconds per spectrum.This allows the introduction of this algorithm into a standard preprocessing pipeline of small FT-IR datasets.Algorithm performance was tested with a specifically prepared PCL model sample, exhibiting strong scattering properties, along with isolated tissue fiber.
Comparison with the widely used RMieS-EMSC algorithm shows an advantage of the new algorithm.

Figure 1 .
Figure 1.Spectra from a PCL fiber sample highlighting scattering artifacts, for unpolarized (a), parallel (b), and perpendicular (c) polarization of incident IR light with respect to the cylinder's axis.Spectra of each color come from the same location (the same chemical composition) as all datasets are spatially aligned.

Figure 2 .
Figure 2. a) Workflow of the Cylinders EMSC algorithm.b) Examples of the real refractive index calculated from IR spectrum of PCL () using KK transform.c) Theoretical scattering efficiency for cases of cylindrical sample (unpolarized   and with parallel  ,∥ and perpendicular  ,⊥ polarizations) and spherical sample   of the same radius.

Figure 3 .
Figure 3. Scattering correction results for model PCL cylinder sample, for two cases of sample's radius.Spectra from cylinder's cross-sections are analyzed from left to right, with 2.7 μm step (pixel size).a)-d) Results for 7.5 μm radius region of the sample.e)-h) Results for 11 μm radius region of the sample.Legends for spectra in plots b)-d) and f)-h) are provided in sections a) and e) respectively.Scales on visible and IR images are the same and the dimensions might be referenced using the fiber's radius.

Figure 4 .
Figure 4. Scattering correction results for linearly polarized light.A model PCL cylinder sample with varying radii was used as a reference and a single pixel (marked with star) with different polarizations is shown.a)-d) Results for 7.5 μm radius region of the sample.e)-h) Results for 11 μm radius region of sample.Legends for spectra in plots b)-d) and f)-h) are provided in sections a) and e) respectively.Scales on visible and IR images are the same and the dimensions might be referenced using the fiber's radius.

Figure 5 .
Figure 5. Scattering correction results for tissue fiber sample (single pixel marked with star) and linearly polarized light.a) Tissue sample used to validate the developed algorithm.b) Raw spectra coming from the single pixel (marked with a red star) and different polarization sets.c) Correction results for PCylinders EMSC algorithm.d) Correction results for RMieS-EMSC algorithm.