Auto-detection of cervical collagen and elastin in Mueller matrix polarimetry microscopic images using K-NN and semantic segmentation classification

: We propose an approach for discriminating fibrillar collagen fibers from elastic fibers in the mouse cervix in Mueller matrix microscopy using convolutional neural networks (CNN) and K-nearest neighbor (K-NN) for classification. Second harmonic generation (SHG), two-photon excitation fluorescence (TPEF), and Mueller matrix polarimetry images of the mice cervix were collected with a self-validating Mueller matrix micro-mesoscope (SAMMM) system. The components and decompositions of each Mueller matrix were arranged as individual channels of information, forming one 3-D voxel per cervical slice. The classification algorithms analyzed each voxel and determined the amount of collagen and elastin, pixel by pixel, on each slice. SHG and TPEF were used as ground truths. To assess the accuracy of the results, mean-square error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity (SSIM) were used. Although the


Introduction
Preterm birth (PTB), defined as any birth prior to 37 completed weeks of gestation, is responsible for 35% of the annual 3.1 million global neonatal deaths [1][2][3]. Many survivors will face life-long challenges including neurological disorders, long-term cognitive impairment, defects in hearing, vision, and digestion, as well as respiratory disease.
Unfortunately, there is an absence of clinical tools for early and accurate detection of spontaneous preterm birth risk, in part due to a lack in understanding of the molecular events that drive a term or preterm birth. Understanding the cervical remodeling process in a term or preterm pregnancy is critical to define therapeutic targets and to develop clinical tools. Given the demonstrated reorganization of the cervical extracellular matrix through pregnancy [4] and its direct correlation with mechanical function of the cervix [5], we aim in this study to develop improved methodology to characterize collagen and elastic fibers in the mouse cervix.
An extracellular matrix (ECM) lies under the epithelium of the cervix and consists of roughly 70% collagen fibers [6][7][8]. Numerous researchers have studied cervical collagen [9][10][11][12][13][14][15][16] and In this study, the reflectance measurements of SHG, TPEF, and near-infrared total reflectance images are performed using the SAMMM system described in [41,42]. Note that only total reflectance was used to study polarimetry in this work. The excitation source is a pre-compensated mode locked laser beam from a Ti-Sapphire broadband femtosecond laser with the central wavelength at 800 nm (FWHM=100 nm). SHG signal at 400 (FWHM=30 nm) and TPEF at 500 (FWHM=20 nm), and total reflectance images are collected by appropriated photo-multiplier tube detectors and a single data acquisition board sampled up to 125MHz (Vidrio Technologies LLC, VA). Each Mueller matrix is constructed utilizing a polarization state analyzer (PSA of four polarization states) and a polarization state generator (PSG of six polarization states), resulting in a set of 24 images in each of the three channels (SHG, TPEF and total reflectance). All SAMMM raw images of the mice cervix had a resolution of 1000 × 1000 pixels and were taken with a 5X objective.

Image decomposition
The Mueller matrix M of the medium were decomposed using the Lu-Chipman decomposition method in which M can be expressed as the product of three basic matrices [43]: M ∆, M R and M D are depolarization, retardance and diattenuation Mueller matrices, respectively. Following the decomposition, scalar terms such as diattenuation (d), depolarization coefficient (∆), linear retardation (δ) of the medium are determined: In addition to polar decomposition, differential matrix formalism of the Mueller calculus was used to retrieve total retardance (linear and circular) [44]. In this case, the medium polarization properties are contained in a single differential matrix m which relates the Mueller matrix M and its spatial derivative along the propagation of light [45,46]: Applying Lorentz symmetric matrices, L m and Lorentz antisymmetric matrices, L u to Eq. (5), we have: where In Eqs. (7)(8), G is Minkowski metric tensor. For a depolarizing medium, the off-diagonal elements of L m represents mean values of the elementary medium polarization properties over the path-length z and the off-diagonal elements of L u express their respective uncertainties. Lorentz components of the matrices is used to retrieve both linear (δ L ), circular (δ C ) and total (R) retardation, and angle of orientation (θ)

Section preparation for SAMMM imaging
In this paper, a total of 16 cervical slices were obtained from 5 mice, including one non-pregnant mice, two pregnant mice at gestation day 6 and two at gestation day 18 samples in accordance with the Institutional Animal Care and Use Committee protocol. The tissue was snapped frozen at -80°C in optimal cutting temperature (O.C.T.) compound (Tissue Tek, Elkhart, Indiana). The entire length of the cervix was cryosectioned transversely at -20°C using a cryostat (Leica CM3050). Sections were mounted on glass slides and left dry for 1 hour at room temperature. Unwanted residues were washed away with phosphate buffered saline (PBS).

Confirmation of elastin on TPEF images with immunofluorescence
While SHG of collagen in the cervix is optimal at wavelength region of 400 nm region [47,48] with the excitation at 800 nm, elastin and NADH has an overlapping fluorescence emission at 500 nm region. Other glycosaminoglycans (GAGs) and proteoglycans could also contribute to the TPEF signal. Therefore, in order to confirm that cervical elastin is responsible for TPEF signal in our samples, we performed indirect immunofluorescence of a selected slice using rabbit anti-mouse tropoelastin antibody (Elastin Products Company, PR385) as the primary antibody and Alexa Fluor 546-conjugated antibody (Life Technologies, A11035) as the secondary antibody. Slides were washed with 20mM Tris (pH = 8.0) for 15 minutes followed by treating the section with 100 mM iodoacetamide (Sigma-Aldrich Inc., I5161) in the dark for 15 minutes. Diluted goat serum at 10% (ThermoFisher Scientific, 31872) was used to block the section for 1 hour at room temperature. The section was then incubated with 1:250 dilution of primary antibody in 1% goat serum overnight at 4°C. Finally, the section was washed with PBS and incubated in 1:500 dilution of secondary antibody in 1% goat serum for 30 minutes at room temperature. The section was imaged using a commercial linear microscope (Olympus BX61) with laser at 550 nm and a 10X objective. As shown in Fig. 1, TPEF signal ( Fig. 1(a)) obtained with SAMMM shows strong correlation to the immunofluorescence images of elastin ( Fig. 1(b)-(e)). Following this study, TPEF images of the mice cervix in this study were used as ground truth for elastin.

Data processing
In Fig. 2, the process to detect collagen or elastin from Mueller matrix data and its decompositions is presented. The process extracts the ground truth (GTruth), reduces the cervix images to a 200 × 200 pixels images, normalizes them, extracts features, applies two classification methodologies, and compares the results for accuracy. Two commonly used machine vision classifiers [31] were chosen: K-NN and semantic segmentation neural networks. Both methods are extensively used to classify images [32,49], being neural networks especially useful for imaging diagnostics [35,36,39].

Preprocessing
For each data sample, the corresponding SHG, and TPEF are obtained. SHG data corresponds to the ground truth of collagen, while TPEF corresponds to the one of elastin. Each ground truth is represented by one image in which the intensity is related with the density of its respective tissue.
The Mueller matrix elements (M 11 , M 12 , . . . , M 44 ) of data samples, along with its decomposition values, are arranged as a voxel were the channels of information correspond to the elements or the decompositions. All elements and decompositions are aligned so a specific pixel has 26 initial channels of information. Some of the voxel's channels do not bring useful discriminating information. Channels with very low standard deviation are removed since they are mostly noise. Additionally, information on some channels is very similar to others; redundant channels were also removed after comparing them with each other using structural similarity index [50] (Fig. 3). The following variables were used as data channels of the voxel: diattenuation (Eq. (2)), depolarization coefficient (Eq. (3)), linear retardation (Eq. (4)), total retardation (Eq. (11)), orientation (Eq. (12)), and all Mueller matrix elements except M 22 , M 33 , and M 44 .
SHG, TPEF, and data voxels are reduced in resolution to 200 × 200 pixels images, so the computation is more efficient. Due to errors in the sampling process or data gathering, outlier pixels are present. Pixels with very high values compared to their neighbors can skew the classification process. Outliers were removed from the voxel, the SHG, and the TPEF data using the Grubbs method. The outlier pixels were replaced by the mean of their neighbors. Raw values among images varied greatly, so, to make them comparable to each other, all images were normalized independently, based on the maximum and minimum values of each one. The value of any pixel within an image was then between 0 and 1.

Ground truth processing
Although a binary (0 or 1) classification is possible, the SHG and TPEF were quantized into 10 relative discrete levels representing degrees of density of collagen or elastin. These levels were associated to normalized values between [0-1] and can be scaled to fit any range. High levels (from 6 to 10) were associated with a detection of collagen or elastin at different densities. Maintaining this resolution was important to preserve image details. In addition, background, or pixels from other tissues are associated to low levels (from 1-4). All values of SHG and TPEF were rounded to one of the10 discrete values to obtain an appropriate ground truth for classification (Fig. 4). A non-uniform quantization was used to account for the imbalance between the number of low intensity pixels and the number of high intensity pixels and to introduce bias to the classifiers. The 10-level quantization grid favored high intensity pixels by starting high levels at lower intensity values. The grid's thresholds were [0, 0.05, 0.11, 0.19, 0.29, 0.36] for low levels and [0.47, 0.60, 0.85, 0.96, 1] for high levels. For the K-NN case, the ground truth image was arranged as a vector of GTruth levels, the vector position's index was associated with the position of a pixel in the ground truth image (Fig. 5).
For the semantic segmentation case, each level of the ground truth acted as a layer superimposed on the sample data. The ground truth was an image of the same size as the input data with categorical values (levels) for each pixel, in essence, equal to the ground truth of the K-NN classifier.

Feature extraction
K-NN classifiers take several observations and try to group them together based on the similarity of their features. A standard input for a K-NN classifier is a single matrix in which the rows represent independent observations, and the columns are features associated with those observations. The voxel was arranged into a matrix of features where a row of the matrix was linked to a pixel and the columns were its corresponding channels (features). The number of features was reduced further using principal component analysis (PCA). By taking the features that compose 95% of the discrimination information the number of the features reduced to 4 (Fig. 6). For the semantic segmentation classifier, data channels were reduced to only Linear retardation, and Mueller matrix elements M 34 and M 43 since they were the most dissimilar among channels and carried more discriminant (classification) information. Thousands of features were extracted based on the pretrained Resnet18 and DeepLab v3+ [51] architectures (Fig. 7).

Classification training
There was a total of 16 mice cervical slices, 11 slices were used for training and the remaining for testing. However, since the classifiers take each pixel vector (pixels at the same coordinate position in all channels) as one observation and there are 40,000 pixel vectors per cervical slice, the set of observations for the classifiers to train and test on was close to 320,000 and 120,000, respectively. Those two sets were independent from each other.
Two K-NN (k=5 neighbors) classifiers were trained based on the collagen and elastin ground truths, respectively. The algorithm produced two independent models, one to detect collagen and one to detect elastin. To save computational time and reduce the amount of training data in the semantic segmentation case, transfer information from a pre-trained Resnet18 into a DeepLab v3+ [51] semantic segmentation architecture was done. Two models, one to detect collagen and the other elastin, were trained interpedently with 11 cervical slices. The networks used encoder-decoder architectures, dilated convolutions, and skip connections to segment images. Data was artificially augmented to increase the number of samples (images) to a similar number than in the K-NN case; random left/right reflection, and random X/Y translation of +/-10 pixels were applied.

Collagen and elastin predictions
The two classifying methodologies were applied to each of the cervical slices. Intensity levels from 1-10 represent the amount of tissue predicted, where 1 is no amount and 10 is a high amount. The outputs of both, the K-NN model and the semantic segmentation model, were a predicted pixel-level image of the same dimensions than the original inputs. Some classification errors were seen as static noise (salt and pepper noise).
A median filter was used to smooth the noise on the K-NN prediction in postprocessing. The image was more or less granular depending on the size of the filtering window. Two window sizes of [3 × 3] and [10 × 10] pixels were chosen. For the semantic segmentation case, no filter was used due to low noise level in the output images.

Accuracy
The original ground truth and the predicted image were compared for measuring image quality, using three different methods: mean-squared error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity (SSIM) index [50]. MSE and PSNR are purely mathematical metrics and may not agree with human perception of the image quality. SSIM considers local contrast and luminance and agrees more closely with a subjective metric.
Utilizing the MATLAB built-in function "immse.m", MSE metric compared the data matrices of the original and predicted images, pixel by pixel, then calculated the mean-squared errors for each pixel pair, averaged them, and subtracted the value from 1. Meanwhile, function "psnr.m" was used to find the peak signal-to-noise ratio which indicates the ratio of the maximum pixel intensity to the power of the distortion. It compares the original and predicted images as if the classifier were a transmission system introducing noise, the original images were a transmitted signal, and the predicted images were a received signal. An SNR > 40 dB is considered good, SNR <10 dB is considered poor.
MATLAB built-in function "ssim.m" was used to find the structural similarity index which combines local image structure, luminance, and contrast into a single local quality score given as a percentage. Structures are selected as patterns of pixel intensities, particularly among neighboring pixels.

Results
Section 3.1 presents visual results for 2 tested slices from day 18 (D18) and day 6 (D6); for collagen and elastin predictions; and for K-NN and Semantic Segmentation classifiers. Section 3.2 presents the average of all the test samples for each of the three metrics, for the semantic segmentation and the K-NN classifiers with filters. Overall accuracies, involving all the samples, have similar results but are not presented.
On an Intel Core i7, 3.40 GHz with 32 GB of RAM, the K-NN model took 5 minutes to train and 30 minutes to predict a sample. On the contrary, semantic segmentation took substantially more time to train (around 60 hours) but much faster prediction time (3 seconds per sample).
Data used for classification consisted of the normalized Mueller matrix component and the decomposition described in section 2.5.1. Ground truths SHG and TPEF, for samples D18 and D6, along with linear retardation, M 11 , M 34 and M 43 features are presented in Fig. 8. These features carried the most discriminant information and are presented on a 10-level heat scale.

Classifiers qualitative results
Classification results of two tested slices for collagen and elastin are presented in Fig. 9 and Fig. 10, respectively. In each figure, the first column presents the image that was used as a ground truth (SHG for collagen and TPEF for elastin); the second column presents the K-NN prediction pixel-by-pixel where some classification errors are seen as salt-and-pepper noise. The following two columns show the K-NN prediction after two different filters. The last column presents the semantic segmentation prediction. All images were normalized, and pixels scaled from 1 to 10 levels, with 1 defined as low density of the tissue and 10 a high density of the tissue.
In general, structures on collagen predictions present less noise than structures on elastin predictions. K-NN predictions had significant salt-and-pepper noise that was removed by a mean [3 × 3] filter. The K-NN prediction with mean filter of [10 × 10] provides general areas in which the tissue structures are contained, but lacks resolution. The semantic segmentation prediction does not require filtering, presenting general areas containing the tissue while providing good resolution.   Table 1 summarizes the classification results for independent classification of collagen and elastin on the test set (6 cervical slices for testing). All accuracies are above 80% for collagen cases, while elastin cases have accuracies close to 90%. In overall, semantic segmentation gives slightly better results, especially for the SSIM metric.

Discussion and conclusions
We proposed a methodology that integrates Mueller matrix polarimetry, and convolutional neural networks (CNN) and K-nearest neighbor (K-NN) techniques for successfully detecting and classifying cervical collagen and elastin. The methodology is reliable and low cost.
Original digitalized samples have a large size that can cause a significant increase in computing time, especially for K-NN classifiers. We demonstrated that these images can be reduced to 200 × 200 pixels images without sacrificing accuracy. The reduction makes processing, training, and classification more manageable. Some Mueller matrix components or decompositions do not provide relevant classification information (depolarization with differential method, total retardation, and Mueller matrix components M 22 , M 33 and M 44 ). Simple feature reduction techniques, like average standard deviation or PCA, can reduce the components used and the size of the samples to no detriment in the accuracy. The K-NN model is quick to train but significantly slower in making predictions. On the contrary, semantic segmentation takes substantially more time to train but is very fast predicting. Depending on the problem, each method can be used independently to mutually confirm their results. A CNN-K-NN hybrid is also proposed as a next step. Additionally, with a larger sample size, a more complex neural network, like U-net [52], can be trained.
The use of images of SHG and TPEF as ground truths proved to be effective. However, outlier pixels from the initial sampling process can skew the quantization and introduce error to the classification. In addition, the ratio of high-level pixels versus low-level pixels is very small, even after normalization, representing a bias towards low-level pixels detections. A non-uniform quantization grid was used to compensate for this bias. Further work can be focused on improving the imbalance between high-level and low-level pixels by or applying class noise reduction methodologies [53].
Predicted collagen images have greater qualitative similarity to the ground truth than elastic fiber images. This is in part because there is more collagen protein in the samples than elastin, rendering more pixels with higher concentration associated to collagen than ones associated to elastin. In addition, collagen structures occupy a larger area making them easier to detect. However, accuracies for collagen and elastin, based on independently detecting them on the same sample, are comparable. Each pixel of a sample would have a collagen and an elastin level associated after classification; those levels are not mutually exclusive and sometimes overlap. That means that there could be a discrepancy on the classification which would need a joint classification approach (assigning likelyhood of collagen or elastin to each pixel), or it could mean that both tissues are present at the same time since collagen and elastin fibers are frequently intertwined. This is a limitation of independently classifiying collagen and elastic fibers and will be addressed in future work.
Measuring image quality is difficult since some metrics do not match subjective perception. Furthermore, the number of pixels representing collagen or elastin is significantly smaller compared to the "background" pixels which is why some accuracy metrics are relatively high, but the ground truth and predicted images do not look that similar. The classifiers are good at detecting the low-intensity "background" pixels, which are the majority, but they are not as good at properly classifying high-intensity pixels. Accuracy metrics that consider this imbalance or that favor subjective quality (like SSIM) are better suited for this kind of classification assessment.
A larger number of cervix slices is strongly recommended, especially for the semantic segmentation case, however, estimating the exact number of data samples needed to successfully train a neural network is very difficult if not impossible. The minimum number of samples depends directly on the characteristics of the problem and the chosen CNN architecture. Some magnetic resonance imaging (MRI) diagnostic studies required more than 3,000 samples [52] while others close to 300 [54,55]. Although a small number of samples is a limitation, the three accuracy metrics used in this study show overall good results compared to other approaches [30]. Notably, comparing to K-NN, the semantic segmentation classifiers are more robust and less sensitive to noise.
Mueller matrix microscopy presents several advantages compared to other modalities used for the quantification of collagen and elastin such as nonlinear microscopy or SHG. The modality is relatively low cost, easy to use, fast and can be designed with low encumbrance. Combined with machine learning techniques, this modality could expand the toolkit for researchers studying the reproductive system and particularly preterm labor. Training of the system with SHG and TPEF is necessary with our proposed approach, yet beyond the training phase, classification of cervical elastin and collagen can be achieved through a Mueller matrix system alone. Future work will focus on expanding this approach to a standalone system, i.e. systems that are not co-registered as the SAMMM.