A three-parameter logistic model to characterize ovarian tissue using polarization-sensitive optical coherence tomography

In this paper, a logistic prediction model is introduced to characterize the ovarian tissue. A new parameter, the phase retardation rate, was extracted from phase images of polarization-sensitive optical coherence tomography (PS-OCT). Statistical significance of this parameter between normal and malignant ovarian tissues was demonstrated (p<0.0001). Linear regression analysis showed that this parameter was positively correlated (R = 0.74) with collagen content, which was associated with the development of ovarian tissue malignancy. When this parameter and the optical scattering coefficient and the phase retardation estimated from the 33 ovaries were used as input predictors to the logistic model, 100% sensitivity and specificity in classifying malignant and normal ovaries were achieved. Ten additional ovaries were imaged and used to validate the prediction model and 100% sensitivity and 83.3% specificity were achieved. These results showed that the three-parameter prediction model based on quantitative parameters estimated from PS-OCT images could be a powerful tool to detect and diagnose ovarian cancer.


Introduction
Ovarian cancer has the highest mortality rate among all the gynecologic cancers because it is predominantly diagnosed at late stages due to the unreliable early symptoms and the poor screening techniques. Prophylactic oophorectomy (PO) has been accepted as the standard of care for high risk women and it reduces the risk of ovarian cancer by more than 50% [1,2]. However, there appears to be a higher mortality for premenopausal oophorectomy [3]. Therefore, there is an urgent need to develop effective tools to inspect ovaries, so that the mortality rate of ovarian cancer can be reduced and the quality of patients' life can be improved.
Optical coherence tomography (OCT) is an emerging high resolution imaging technique [4], which measures backscattered light generated from an infrared light source directed to the tissues. OCT typically achieves a resolution of several microns and a penetration depth of several millimeters, and has been used to image tissues in the body that can be accessed by endoscope or catheter. Polarization-sensitive OCT (PS-OCT) is a functional extension of OCT [5,6] and capable of detecting birefringence changes caused by collagen, and collagen changes in human ovary are indicators of malignancy [7,8]. Therefore, PS-OCT could be an effective tool to detect ovarian cancer. In our initial study [9], optical scattering coefficient and phase retardation of 33 ex vivo ovaries obtained from 18 patients were extracted from time domain (TD) PS-OCT intensity and phase images, respectively. While the scattering coefficient was significant in predicting malignancy, the phase retardation achieved low sensitivity of 43%. In this study, a more sensitive parameter, the phase retardation rate, was extracted from PS-OCT phase images and used together with the scattering coefficient and phase retardation to characterize ovarian tissue. In the literature, the PS-OCT phase retardation rate was introduced by M. C. Pierce et al. to quantify collagen denaturation in burned human skin [10]. In our study, these three parameters extracted from 33 ovaries were used as inputs to a logistic model to predict or classify the malignant and benign ovaries. In addition, 10 more ovaries from 5 patients were imaged with our upgraded Fourier domain (FD) PS-OCT system and used to test the model. To the best of our knowledge, this is the first study of using multiple parameters extracted from PS-OCT images as predictors for ovarian tissue characterization.

Ovary sample and histopathology
A total of 43 ovaries were extracted from 23 patients undergoing PO at the University of Connecticut Health Center (UCHC). 33 ovaries from 18 patients were imaged using TD-PS-OCT while 10 ovaries from 5 patients were imaged using FD-PS-OCT. These patients were at risk for ovarian cancer or they had ovarian mass or pelvic mass suggesting malignancy. This study was approved by the Institutional Review Board of UCHC, and informed consent was obtained from all patients. The details of imaging procedures and histological processing were described in [9]. Sirius Red staining protocol was applied to the sectioned slides to analyze the collagen content. The amount of collagen was quantitatively analyzed using ImageJ software (National Institute of Health). The average collagen area fraction (CAF) was measured as "Stained collagen area/tissue area".

PS-OCT systems
The TD-PS-OCT and upgraded FD-PS-OCT systems are shown in Fig. 1(a). The essential optical configurations of the TD-PS-OCT and upgraded FD-PS-OCT systems are the same. The technical details of the TD-PS-OCT system were described in our earlier study [9]. The main differences between the upgraded FD system and the TD system are: (1) the super luminescent diode source was replaced with an 110 nm bandwidth swept source (HSL-2000, Santec Corp., Japan) with center wavelength of 1310 nm and scan rate of 20 kHz; (2) the detectors were replaced with 75MHz bandwidth photodetectors (Thorlabs PDB120C); (3) the reference mirror was fixed instead of moving back and forth by a stepper motor. The conventional OCT intensity images were obtained from calculating the summation of squares of two orthogonally polarized signals, and the phase retardation images were obtained by calculating arctangent between vertical and horizontal polarization components [11].

Phase retardation rate
During imaging, similar conditions for all ovary samples were obtained by mounting the ovary on a three-dimensional stage and adjusting the ovarian tissue surface to the same depth position. The phase retardation rate was obtained by linearly fitting phase retardation depth profile. The region of interest (ROI) selection was consistent with that in our earlier study [9] when calculating scattering coefficient and phase retardation. Overall, each image was evenly divided into several ROIs with 1mm width. Values in all ROIs from all images of one ovary were averaged to obtain the phase retardation rate of this ovary. The same procedures were followed for all ovaries. An example of fitting phase retardation rate of a normal ovary is shown in Figs. 1(b) and 1(c). Figure 1(b) is the phase retardation image, where the dark blue represents phase retardation value of zero degree and the dark red represents 90 degrees. The white dashed rectangular area was selected for fitting. The depth profile of the averaged Alines in the selected area was shown as blue curve in Fig. 1(c), and the numerical fitting curve was plotted as red. The slope of the red curve was calculated as the phase retardation rate. The phase retardation decreases with depth after about 1.5mm. This is because the ratio of vertical and horizontal signals reduces as light penetrates deeper in the tissue. The fitting error of the phase retardation rate is estimated as the norm of the fitting residue divided by the norm of the original curve.

Optical scattering coefficient and phase retardation
The quantification of scattering coefficient and phase retardation were described in our earlier publication [9]. Scattering coefficient was estimated by numerically fitting compounded conventional OCT depth profile to a single scattering model based on Beer's law. 1mm tissue was averaged to minimize the speckle noise. The phase retardation was obtained by calculating the average phase values from PS-OCT phase images of the same area.

Logistic model and receiver operating characteristic curve
Logistic regression belongs to the class of generalized linear model (GLM) based on the exponential distribution family. It is a statistical model that can describe the relationship of several predictor variables X 1 , X 2 , …, X k to a dichotomous response variable Y (0 or 1) [12]. The probability of occurrence of one of the two possible outcomes of Y can be described by the following equation:  Given the data Y, X 1 , X 2 , …, X k , the unknown coefficients , 0,1,..., n nk   can be estimated using the maximum likelihood method. In this paper, we use three predictors (scattering coefficient, phase retardation, and phase retardation rate) to classify normal and malignant ovarian tissue. The MATLAB GLMFIT function was used to fit the logistic model using the predictors and the response (0 represents normal and 1 represents malignant). The coefficients , 0,1,..., n nk   of the model that best follow the actual diagnosis were estimated and used to calculate the estimated responses (the numbers between 0 and 1) using GLMVAL function. The GLMFIT function also computed the deviance, which is a generalization of the residual sum of squares (comparison of log-likelihood function of actual fitted values with perfectly fitted values). The deviance was used to compare different prediction models, in which different parameter-combinations were used as predictors to classify normal and malignant ovaries. The deviance value decreases as the model fit improves.
The quality of the logistic prediction model was evaluated using the area under the receiver operating characteristic (ROC) curve (AUC). The estimated responses from different prediction models were used to compute the ROC curves and AUCs using R package pROC [13]. We also estimated the 95% confidence interval (CI) using bootstrap method with 10,000 stratified bootstrap replicates. The optimal threshold provided by pROC was used to calculate the sensitivity and specificity, positive and negative predictive values (PPV, NPV). To further evaluate the logistic prediction model and testing results, we have also investigated the correlation coefficients R train and R test between calculated responses and the actual diagnosis.

Statistical results of 33 ovaries imaged by TD-PS-OCT
A total of 33 ex vivo ovaries from 18 patients were imaged using TD-PS-OCT system. 26 ovaries were diagnosed as normal and 7 ovaries were diagnosed as malignant. Normal ovaries show higher average values of scattering coefficient and phase retardation than malignant ones, with the normal/malignant ratio of 1.36, 1.11, respectively [9]. For phase retardation rate, the average fitting range of normal group is 36.7-329.8 µm from the tissue surface, and the malignant group is 38.4-347.3 µm. The range of average value of normal group is 28.8-154.8 degree/mm, and malignant group is 8.4-121.6 degree/mm. The normal group has mean value of 79.5 degree/mm (± 19.0), which is higher than that of the malignant group with mean value of 45.0 degree/mm (± 19.6). The normal/malignant ratio of phase retardation rate is 1.77. Phase retardation rate of normal and malignant ovaries shows larger difference (p < 0.0001) than the other two parameters. The fitting error of the phase retardation rate of the normal and the malignant group is 5.13% (± 0.82%) and 4.69% (± 0.96%), respectively.
The scatter plot in Fig. 2(a) shows the average phase retardation rate of each ovary for normal and malignant groups. By setting a threshold of phase retardation rate at the value of 55 degree/mm, we could achieve 85.7% sensitivity and 92.3% specificity. However, by using phase retardation as a classifier, we could only achieve 42.9% sensitivity [9]. Linear regression analysis is shown in Fig. 2(b). A positive correlation was found between phase retardation rate and collagen content, with Pearson's correlation coefficient R = 0.74 (p<0.0001), which is higher than those from scattering coefficient (R = 0.57, p<0.0001) and phase retardation (R = 0.47, p<0.01) [9]. A multiple linear regression shows that those three parameters together positively correlate with collagen content with R = 0.76, which is higher than that using each parameter alone. Collagen is associated with the development of ovarian cancers; the collagen amount and structure are quite different between normal and malignant ovaries. The normal and malignant groups have CAF values of 46.0% (± 9.1%), and 28.4% (± 8.3%), respectively [9]. Since CAF, measured from Sirius Red staining on ovary samples, directly assesses collagen, the highly positive correlation indicates that the phase retardation rate may measure the complicated process of collagen development of ovarian cancer.

Training results based on logistic model using the 33-ovary data
The three parameters extracted from 33-ovary images were used to train the logistic classifier. As shown by the ROC curves in Fig. 3(a), the use of all three parameters shows much better performance than each parameter alone. The more specific prediction results, including sensitivity, specificity, PPV, NPV, AUC (95% CI), correlation coefficient R train between estimated responses and actual responses (p value), and deviance, of different parametercombinations are summarized in Table 1. By using only one parameter as a predictor, none of the models could achieve perfect sensitivity and specificity. By using combinations of any two parameters except one set using phase retardation and phase retardation rate, or using three parameters as predictors, 100% sensitivity and specificity are achieved. The deviance of using three parameters together is smaller than that of using two parameters, which indicates that the three-parameter model is more reliable.

Testing results of 10 ovaries imaged by FD-PS-OCT
10 ovaries (6 normal and 4 malignant) from 5 patients were imaged using the upgraded FD system and were tested using logistic prediction model based on different parameters described above. The testing results are summarized in Table 2. The same threshold of the training group was used for this testing group to calculate the sensitivity, specificity, PPV, and NPV. The R test and AUC are highest (R test = 0.893, p<0.001, AUC = 1.0) when using the three-parameter prediction model. Note that the three-parameter model achieved AUC = 1, but the sensitivity (100%) and specificity (83.3%) are not perfect. This is because we set a threshold of 0.5 for the training and testing groups for classifying normal and malignant ovaries. If we set a threshold of 0.7, we could achieve 100% sensitivity and specificity. Phase retardation rate AUC=0.958 All three parameters AUC=1.00 AUC=0.5 very close to normal response 0, and all estimated responses of malignant cases are very close to malignant response 1, so it makes more sense to set the middle point 0.5 as a threshold based on the training results.  In this study, only 10 ovaries were tested using our logistic model, more ovary data are being collected to validate the initial results. Currently, because all parameter extraction and processing are offline, future work also includes automating our data processing procedures so that we could obtain these parameters and input them to the prediction model in real-time. For translating this technique from bench to bedside, a fiber-based PS-OCT system for in vivo evaluation of ovarian tissue needs to be developed.

Summary
The phase retardation rate quantitatively extracted from PS-OCT has significantly improved the ovarian cancer diagnosis when it is used together with optical scattering coefficient and phase retardation. By using a new three-parameter logistic prediction model, we achieve 100% sensitivity and specificity in the training group, and 100% sensitivity and 83.3% specificity in the testing group. The initial results demonstrate that the three-parameter prediction model based on PS-OCT could be a powerful tool to evaluate ovarian tissue.