Polarimetric measurement utility for pre-cancer detection from uterine cervix specimens

: Prior work demonstrated signiﬁcant contrast in visible wavelength Mueller matrix images for healthy and pre-cancerous regions of excised cervical tissue. This work demonstrates post-processing compressions of the full Mueller matrix that preserve detection performance. The purpose of this post-processing is to understand polarimetric measurement utility for computing mathematical observers and designing future imaging protocols. The detection performance of the full Mueller matrix, and both linear and non-linear parameters of the Mueller matrix will be compared. The area under the receiver operating characteristic (ROC) curve, otherwise known as the AUC , is the gold standard metric to quantify detection performance in medical applications. An AUC = 1 is perfect detection and AUC = 0 . 5 is the performance of guessing. Either the scalar retardance or the 3 smallest eigenvalues of the coherency matrix yield an average AUC of 0.94 or 0.93, respectively. When these four non-linear parameters are used simultaneously the average AUC is 0.95. The J-optimal Channelized Quadratic Observer (J-CQO) method for optimizing polarimetric measurements demonstrates equivalent AUC values for the full Muller matrix and 6 J-CQO optimized measurements. The advantage of this optimization is that only 6 measurements, instead of 16 for the full Mueller matrix, are required to achieve this AUC . data analysis to quantify polarimetric measurement utility and to evaluate the use of adaptive polarimetry for high grade cervical intraepithelial neoplasia (CIN 2-3 or pre-cancer) detection.

knowledge of the samples' optical properties and a well-defined scientific task to reduce the measurement requirement is known as 'adaptive polarimetry' [12] or 'partial polarimetry' [13,14]. Measurement utility relates the individual polarimetric measurements to the performance of a scientific task as an empirical method for exploring a samples' relevant optical properties [15]. This paper reports data analysis to quantify polarimetric measurement utility and to evaluate the use of adaptive polarimetry for high grade cervical intraepithelial neoplasia (CIN 2-3 or pre-cancer) detection.
In prior work on Mueller polarimetry of cervical tissue, scalar retardance was used as a test statistic [7]. The high detection performance of scalar retardance demonstrates that Mueller polarimetry data contains valuable information for cervical cancer detection but is not well-suited for adaptive polarimetry, since this parameter depends non-linearly on a full Mueller matrix. An important contribution of this work is demonstrating equivalent detection performance between: 1) polarimetric parameters calculated from the depolarizing part of the Mueller matrix (i.e., the 3 smallest eigenvalues of the coherency matrix) and 2) the scalar retardance which is computed from the non-depolarizing part of the Mueller matrix. This discovery is novel since it has not been demonstrated before that both the depolarizing and the non-depolarizing parts of the Mueller matrix are discriminatory for cervical pre-cancer screening. Cervical tissue is very depolarizing so this equivalent metric is potentially useful in measurement and calibration protocols.
In this paper maximum detection performance is achieved using polarimetric parameters (e.g., scalar retardance and the eigenvalues of the coherency matrix) which are non-linear functions of the full Mueller matrix. It is not possible to measure these parameters directly. Instead, these parameters are computed in post-processing from the full Mueller matrix. This work demonstrates a method to maximize the detection performance given a linear constraint between the parameters and the Mueller matrix. An even stricter constraint is also imposed: linear parameters of the Mueller matrix that can be optically measured directly. Equivalent detection performance when using the full Mueller matrix or only 6 coefficients of the Mueller matrix is also demonstrated. This result indicates that adaptive polarimetry is a viable optical technique for cervical cancer detection.  Fig. 1. Radiance images of 24 excised cervical specimens labeled with results from histopathology: CIN 2-3 (red), healthy (green) tissue. In total, 59,000 CIN 2-3 and 135,000 healthy pixels are labeled. Patient IDs are counted column-major. Three images contain both CIN2-3 and healthy pixels, these Patient IDs are: 18, 19, and 22. The size of the background square grid cell is 5mm x 5mm.

Experimental methods
A full description of the multispectral imaging Mueller polarimeter used for the ex vivo analysis of uterine cervix specimens is in prior work [7]. This Mueller imaging polarimeter operates in a backscattering configuration. An incoherent white light used to illuminate the sample delivered by a LED source is collimated by a convex lens and then passes through a Polarization State Generator (PSG). This latter consists of a linear polarizer followed by two tunable ferroelectric liquid crystal cells (FLCs) with a quarter waveplate inserted in between [16]. Each FLC has the effect of a fixed retardance waveplate with a fast axis orientation that switches between ≈ 0 • and ≈ 45 • . Thereafter, the light backscattered by the sample passes through a Polarization State Analyzer (PSA) made of the same elements in a reverse order compared to PSG assembly. Four linearly independent states of polarization generated by PSG are projected on four identical states of PSA after interaction with a sample. The sample is imaged through the PSA onto a CCD camera (Stingray F080B, 800 × 600 pixels). The stability of the emitted light during the measurement is monitored by a photodiode. The detection arm is set at normal to sample surface, while the illumination arm is set at ≈ 15 • to the normal. By using 20 nm bandpass interference filters, we chose the measurement wavelength between 450 and 700 nm in steps of 50 nm. The Mueller matrix of the sample at each wavelength is reconstructed from sixteen images acquired at all sixteen unique combinations of four different input and output polarization states. The acquisition of these 16 images takes less than 2 seconds due to the rapid polarization modulation of the FLCs. Calibration of the multispectral imaging Mueller polarimeter was performed pixelwise and at each wavelength using the Eigenvalue Calibration Method (ECM) [17]. This Mueller polarimeter was installed at the pathology department of the University Hospital Kremlin-Bicêtre, France.
Monochromatic (550 nm) radiance images of excised cervical tissue are shown in Fig. 1 with green and red labels of healthy squamous epithelium and high-grade cervical intraepithelial neoplasia (CIN 2-3) regions, respectively. Precancerous transformation of the cervix starts in the cells near the basal membrane at the bottom of the squamous epithelium, and then progresses toward the surface. The lesion is classed as CIN1 or CIN2 if this transformation affects one-third or two-thirds of the epithelium thickness, respectively, and CIN3 if the entire thickness of the epithelium is affected. Results from histopathology are used to label these selected pixels, and the method to register the two modalities has been described in [7]. In our prior studies the first statistical analysis was performed for specimens measured at 450 nm, 550 nm, and 600 nm [7]. The data acquired at 550 nm and 600 nm were quite similar but the best sensitivity and specificity of the optical diagnostics was achieved at 450 nm. In the current paper we present the extended statistical analysis for larger set of specimens measured at wavelength 550 nm. We have chosen this wavelength, because the variability of signal-to-noise ratio from patient to patient was much higher at 450 nm compared to the two other measurement wavelengths.

Mathematical and computational methods
Consider the relationship between an image and an object as g = H f + n. Here g is an M × 1 vector of measurements made by an imaging system that is represented as a continuous-to-discrete operator H. The measurements of the continuous object f are corrupted by measurement noise n. In imaging polarimetry a Mueller matrix (a 16 × 1 vector denoted m) is reconstructed from the raw measurements at each pixel [18]. Functions of the elements of m, or a subset of the elements of m, can be used to perform the detection task. In pattern-recognition, these functions are called features, and the transformation from data elements to features is called feature extraction. If the functions are linear transformations of the data, they are called linear features. In this work, the detection performance will be compared for: 1) a full Mueller matrix denoted as a 16 × 1 vector m; 2) a linear transformation or compression of the Mueller matrix, denoted Tm where T is a full rank L × 16 matrix where L < 16; 3) a constrained linear reduction of the Mueller matrix where the rows of T are constrained to satisfy PSA and PSG states on the Poincaré sphere [19]; and 4) parameter(s) derived from the Mueller matrix through a non-linear function. One of the most conventional methods of data processing in Mueller polarimetry (and an example in the 4 th category) is the Lu-Chipman (LC) decomposition [20]. Here the Mueller matrix is expressed as a product of Mueller matrices from a series of optical elements: a diattenuator, followed by a retarder, then followed by a depolarizer. Figure 2 shows the histogram of the depolarization index and scalar retardance values calculated pixel-wise using LC decomposition for all labeled regions of 24 patients. A mathematical observer computes a scalar-valued test statistic, for the binary classification, from the polarimetric parameter(s). The ideal observer computes this test statistic using the log of the ratio of the likelihoods, which can be written as where v are the parameter(s) used for the classification and pr 1 and pr 2 are the conditional probability density functions (pdfs), also called the likelihoods, of the vector v given each binary classification. Eq. (1) maximizes the area under the receiver operating characteristic (ROC) curve, otherwise known as the AUC, as well as other task-based figures of merit (FOM) [21]. AUC is the gold standard FOM to quantify detection performance [21]. The AUC ranges from 1.0 (i.e., classifier never makes a mistake) to 0.5 (i.e., classifier is guessing a decision).
In this work the likelihoods pr i (v) are modeled as either normal, or when the variable is positively-constrained, log-normal. It is notable that the likelihoods are not always wellapproximated by a Gaussian fit (e.g, in Figs. 3 and 4, some subplots look more normal then others). The normal model is selected for simplicity. Assuming that pr n (v) is a normal pdf with meanv n and covariance matrix K n , Eq. (1) becomes for n = 1, 2. A normal pdf results in the log likelihood ratio in Eq. (1) being quadratic in v and denoted as a quadratic classifier or quadratic observer. In prior work we introduced a mathematical method, called the J-optimal Channelized Quadratic Observer (J-CQO), to optimize the PSA/PSG measurement strategy [19]. J-CQO is a constrained linear reduction of the Mueller matrix of the form Here i is a vector of intensity measurements and the rows of T are constrained to satisfy PSA/PSG states on the Poincaré sphere [19]. If T, is a 16 × L matrix, where L is the number of intensity measurements made then each row of T can be written as a Kronecker product between the l th PSA/PSG state as in Here a l is a 4 × 1 vector of Stokes parameters describing the l th PSA, g l is a 4 × 1 vector of Stokes parameters for the l th PSG. If A is an m × n matrix and B is a p × q matrix, then the Kronecker product A ⊗ B is the mp × nq block matrix The J-CQO optimizes Jeffrey's divergence (J) between two normal distributions: pr 1 (i) and pr 2 (i), with respect to the PSA/PSG states and associated matrix T. Denote the mean of the non-Gaussian distribution on the Mueller matrix pr n (m) as a 16 × 1 vector m n and the covariance as a 16 × 16 matrix K n for n = 1, 2. Assume the intensity measurements are normally distributed where the mean of the n th class is an L × 1 vector i n and the L × L covariance matrix is C n = T t K n T.
The value of J between the L-dimensional normal distributions pr 1 (i) and pr 2 (i) is The dependence between the covariance matrix of the intensity measurements on the PSA/PSG could be made explicit by denoting C n (T), but in the above expression this dependence has been dropped for brevity. An important advantage of Eq. (6) as a merit function is that the scalar-value of J for two Gaussian distributions has a closed-form gradient [19]. The Bhattacharyya distance has also been used to optimize a single PSA/PSG measurement without a closed-form gradient [12]. The mathematical and empirical relationship between J and AUC has been described in prior work [22].
Instead of optimizing eight numbers (i.e., four elements of the vector a and four elements of the vector g) the optimization can be reduced to six numbers. Stokes parameters can be transformed to a total radiance I and three coordinates on the Poincaré sphere: for example a = [I a , ρ a cos(2ψ a ) cos(2 χ a ), ρ a sin(2ψ a ) cos(2 χ a ), ρ a sin(2 χ a )]; similarly for g. In the J-CQO optimization I a and I g are set to unity since any increase in total radiance would always increase J for an otherwise fixed PSA and PSG. The Poincaré coordinates are elements of the vector θ = [ρ, ψ, χ] with constraints 0 ≤ ρ ≤ 1, −π/2 ≤ ψ ≤ π/2, and −π/4 ≤ χ ≤ π/4. The elements of θ a and θ g are six constrained numbers which are optimized for J-CQO measurements. For many applications setting ρ a = ρ g = 1 will reduce the optimization to four numbers without compromising detection performance, see the Results section.
Statistical independence between training and testing sets is required to ensure an unbiased estimate of AUC. Patient data are required in three distinct steps of the computation of the quadratic log likelihood ratio: 1) estimatev i and K i in Eq. (2), i.e train the quadratic classifier, 2) patient data to test the performance of the classifier, i.e., the vector v in Eq. (1), and 3) when linear compression of the form v = T † m is used (see Equation 3) patient data is needed to optimize T. For steps 1 and 2 the patient data must be independent from one another to ensure an unbiased estimate of AUC [23]. In practice we have found that independence between patient data in steps 2 and 3 increases AUC by incorporating intraclass variability into the likelihood estimates. Pixels within a single patient image are correlated and pixels from different patient images are independent. Three non-overlapping sets are formed from patients 1-24; set A = {4, 5, 7, 10, 12, 18, 20, 22}, set B = {2, 3, 9, 13, 14, 15, 19, 24}, and set C = {1, 6, 8, 11, 16, 17, 21, 23}. Other choices for these sets are possible; this particular choice yields approximately equal quantities of labeled pixels (healthy and CIN 2-3) across the three sets. The variability in assigning sets A, B, and C to steps 1, 2, and 3 yields 6 estimates of AUC from which the mean and standard deviation are computed and reported in Table 1. Table 1 is a summary of the AUC estimates for 16 different polarimetric parameters and/or sets of parameters. This table is separated into parameters that are linear or non-linear functions of the Mueller matrix. The LC decomposition is a popular method which yields scalar retardance and depolarization index; both are non-linear parameters of the Mueller matrix [20]. The AUC for scalar retardance is 0.94 ± 0.03 and 0.49 ± 0.06 for depolarization index (see Fig. 2 Fig. 4. Histogram of sorted (λ 1 > λ 2 > λ 3 > λ 4 ) coherency matrix eigenvalues for labeled pixels in the population of 24 patients: red (healthy), blue (CIN 2-3). The y-axis is probability of occurrence and for brevity is unlabelled.

Results
histograms of these quantities). The scalar retardance AUC is high but is not guaranteed to be a global maximum. Histograms of all 16 elements of the Mueller matrix are shown in Fig. 3(a). When all elements of the Mueller matrix are used in the classifier the AUC is 0.90 ± 0.05 which, although less than for scalar retardance, shows the detection performance with minimal post-processing. Next, one-by-one, each element is discarded and the AUC of the reduced dataset is estimated. The element is retained only if the average AUC decreases greater than or equal to 0.01. Using this procedure an AUC of 0.90 ± 0.05 is maintained using only elements: m00, m20, and m21; see subplot labels in Fig. 3(b). A non-polarimetric measurement yields m00 but the other two elements cannot be measured individually. This demonstration, that using the full Mueller matrix or only 3 elements yields the same detection performance, indicates that adaptive polarimetry is indeed promising.
The coherency matrix is linearly related to the Mueller matrix and contains an equivalent amount of polarimetric information. The relationship between coherency and Mueller matrices has been studied by many authors [11,24]. The coherency matrix is a 4 × 4 Hermitian matrix of complex-values and 16 of these values are unique; see histograms in Fig. 3(b). The diagonal values of the coherency matrix are positively-constrained, therefore a log-normal likelihood model is used. The AUC using all 16 unique values of the coherency matrix is the same as the full Mueller matrix: 0.90 ± 0.05. Using the same procedure as described above the AUC is equivalent when only 4 elements: C11, C22, imaginary part of C02, and real part of C12 are retained. This results demonstrates that not all elements are equally informative, and that in consequence, there is room for simplifying the measurement strategy. However, we could not find any simple physical interpretation of the selected coefficients.
The eigenvalues of the coherency matrix are real and positive since it is Hermitian; see Fig.  4. Analysis of the eigenvalues of the coherency matrix has been studied by other authors [25]. For non-depolarizing samples the three smallest eigenvalues of the coherency matrix are zero by definition. Measurements of non-depolarizing samples will result in non-zero values for 2, 3, or all 4 eigenvalues. Since tissue is highly depolarizing (see Fig. 2) the average values of all four eigenvalues are non-zero (see Fig. 4). The average-value of each eigenvalue is only slightly different for healthy and CIN 2-3 populations in Fig. 4. Visual inspection of this figure alone is not enough to conclude whether these parameters are useful to discriminate these two populations. The correlation between these parameters is also important. Correlation between the parameters creates the population separation in Fig. 5 when the difference of a pair the parameters is plotted versus the difference in another pair.
The AUC using all four eigenvalues is 0.93 ± 0.03. The entropy of the eigenspectrum is a popular test statistics in select polarimetric applications [26]. The entropy of the eigenspectrum is related to the depolarization index. The depolarization index yielded low AUC estimates (see Table 1), so as expected the AUC of entropy was also low: 0.48 ± 0.10. As shown by Cloude's decomposition [24], the largest coherency matrix eigenvalue, and its associated eigenvector, define the nondepolarizing part M nd of the generally depolarizing Mueller matrix M. The other eigenvalues and eigenvectors account for the depolarizing part M dep , and M = M nd + M dep . The largest eigenvalues, and associated eigenvector, dominates the LC decomposition parameters such as scalar retardance. The AUC using only the three smallest eigenvalues is also 0.93 ± 0.02. This result indicates that the discriminating qualities of the depolarizing part of the polarimetric measurement are just as high as the nondepolarizing part. This is an important result which can be explored in optical diagnostics of tissue pathology. In applications where the sample is not highly depolarizing the three smallest eigenvalues can be very close to zero and very noisy. In this application, the high AUC from the three smallest eigenvalues is evidence that these measurements are informative and not dominated by noise. Fig. 5(a) is 2-D plots of differences in the 3 smallest eigenvalues for the three patients in this population with both CIN 2-3 and healthy regions. The difference λ 4 − λ 2 is plotted versus λ 2 − λ 3 . Distinctive differences between healthy and CIN 2-3 distributions are visible for each patient although the distribution patterns change for each patient. When scalar retardance and the three smallest eigenvalues of the coherency matrix are used together in the classifier the AUC increases slightly, relative to the separate values, to 0.95 ± 0.02. This is the highest AUC value reported in this work. This increase in AUC may look quite small. Nevertheless, we would like to emphasize that this improvement is achieved by an appropriate data post-processing and does not require any modifications of the measurement protocol. The next part of the discussion is dedicated to studies on optimal measurement protocol for adaptive polarimetry.
The detection performance for J-CQO versus number of measurements is shown in Fig. 6(a). Three J-CQO measurements yield an AUC of 0.88 ± 0.07 and 6 measurements yield an AUC of 0.90 ± 0.03 which is equivalent to the performance using the entire Mueller matrix. The optimal PSA/PSG states, for 3 measurements, are plotted on the Poincaré sphere in Fig.6(b-c) where numerical labels are given to each PSA/PSG pair. The optimal PSA/PSG states are always on the surface of the Poincaré sphere indicating that the optimal value for this dataset is ρ = 1. This result has been predicted in prior work and could be used as an a priori constraint to  reduce the number of parameters in the J-CQO optimization [27]. At a given measurement, the PSA/PSG pair appears approximately cross-polarized in the top-view shown in Fig.6(c). The side-view in Fig.6(b) shows that all PSA/PSG states deviate from purely linearly states towards right circular-polarization (RCP). Although it is faint, this deviation is significant since even small perturbations to the Poincaré coordinates can change the AUC. Measurements 1 and 2 are very close to one another on the Poincaré sphere, and it is difficult to distinguish these markers in Fig.6(b-c). Plots of the intensity measurements from these PSA/PSG states, for three patients, are shown in Fig. 5(b). Here the difference between the first and second measurement (i 1 − i 2 ) is plotted versus the difference between the third and the second measurement (i 3 − i 2 ). These three measurements are different in both the mean and covariance and quadratic observers are sensitive to both of these differences. Although i 1 − i 2 is approximately an order of magnitude less than i 3 − i 2 the separation between the two classes along this dimension is evident. The mean values of i 1 and i 2 are more similar for CIN 2-3 labelled pixels (i.e., centroid of the blue points is close to zero along i 1 − i 2 axis) as compared to the small but non-negligible difference in i 1 − i 2 for healthy labelled pixels (i.e., centroid of the red points is farther from zero along i 1 − i 2 axis). A similar centroid separation is evident on the i 3 − i 2 axis. The differences in the variances and covariances of i 1 , i 2 , and i 3 define the distribution of scattered points in Fig. 5(c).

Conclusion
This paper has demonstrated linear post-processing reductions of the full Mueller matrix that preserve detection performance and non-linear post-processing of the full Mueller matrix that increases detection performance. Element-by-element reductions of the Mueller or coherency matrix do not have direct interpretations concerning a sample's optical properties or measurement protocols. These reductions, which preserve detection performance, motivate the development of adaptive polarimetry for neoplasia diagnostics since not all elements are useful for the detection task. J-CQO optimization of PSA/PSG states achieves equivalent detection performance as the full Mueller matrix using only 6 measurements instead of the full 16. Reducing the number and, consequently, the time of measurements is of paramount importance for the future development of in vivo diagnostics. It will reduce the acquisition artifacts related to the patient's movements.
Experimental verification of the detection performance on reduced polarimetric measurement sets will be the subject of future work. A mathematical relationship between optimal PSA/PSG measurements and eigenanalysis of the coherency matrix will also be considered.