Machine learning-based rapid diagnosis of human borderline ovarian cancer on second-harmonic generation images

: Regarding growth pattern and cytological characteristics, borderline ovarian tumors fall between benign and malignant, but they tend to develop malignancy. Currently, it is difficult to accurately diagnose ovarian cancer using common medical imaging methods, and histopathological examination is routinely used to obtain a definitive diagnosis. However, such examination requires experienced pathologists, being labor-intensive, time-consuming, and possibly leading to interobserver bias. By using second-harmonic generation imaging and k-nearest neighbors classifier in conjunction with automated machine learning tree-based pipeline optimization tool, we developed a computer-aided diagnosis method to classify ovarian tissues as being malignant, benign, borderline, and normal, obtaining areas under the receiver operating characteristic curve of 1.00, 0.99, 0.98, and 0.97, respectively. These results suggest that diagnosis based on second-harmonic generation images and machine learning can support the rapid and accurate detection of ovarian cancer in clinical practice.


Introduction
The mortality rate due to ovarian cancer is the highest among gynecologic cancers [1]. Approximately 90% of malignant ovarian cancers originate from epithelial ovarian tissue lesions [2]. According to their characteristics, epithelial ovarian tumors can be divided into three types: malignant, benign, and borderline [3]. These three types of tumors have different treatments and prognoses. Therefore, it is essential to accurately distinguish them. Although cancer antigen 125 is a serum marker commonly used to diagnose ovarian tumors, its specificity is low [4]. In clinical practice, transvaginal ultrasonography is adopted as a noninvasive in vivo imaging modality for ovarian cancer screening, but its resolution is limited to lesion detection at the cellular level. Moreover, this method cannot be used to distinguish between benign and borderline tumors [5]. Therefore, conventional histopathological examination, the gold standard for disease diagnosis [6], should be used to provide the ground truth for physicians. However, the cumbersome procedure, which includes tissue fixation, embedding, sectioning, and special staining, imposes a considerable wait time (1-3 days) [7], rendering histopathological examination unsuitable for rapid treatment planning, time-consuming, and laborious. Furthermore, the sample preparation and histological image interpretation by pathologists may introduce artifacts and interobserver bias [8], possibly undermining the diagnostic accuracy. Therefore, a rapid, label-free, non-destructive, and accurate method should be devised to improve the efficiency and accuracy of ovarian cancer diagnosis.
Second-harmonic generation (SHG) imaging based on a second-order nonlinear optical contrast mechanism allows practitioners to perform label-free and non-destructive visualization of tissue structures (e.g., collagen [9], muscles [10], microtubules [11]) at the cellular level. On the other hand, the extracellular matrix, mainly consisting of collagen, is important to regulate the function of cells and tissues in the body [12], and structural collagen remodeling has been associated with the initiation and progression of many diseases [13,14], especially ovarian cancer [15]. Hence, SHG imaging may be suitable for label-free and non-destructive discrimination of borderline ovarian tumors from benign and malignant tumors at the cellular level.
The SHG imaging technology is capable of non-destructive visualizing the structural remodeling of collagen in lesion tissues. Then, proper analysis and interpretation of SHG images should be conducted for diagnosis. Nevertheless, the advancement in imaging technologies has increased the complexity of medical images. Just like the diagnostic process of histopathology, the analysis of SHG images by specialists is laborious and susceptible to interobserver bias. Although Wen et al. proposed a texton-based approach to classify six types of ovarian tissues with 83-91% accuracy, borderline ovarian cancer has been neglected [16]. Besides, borderline ovarian cancer, a subtype of ovarian cancer different from malignant ovarian cancer (e.g., low-grade serous carcinoma, high-grade serous carcinoma, etc.), degenerate into low grade serous carcinoma with the rate of 5% [17]. Thus, it is worth identifying borderline ovarian cancer in a rapid and accurate way. In this study, we developed a computer-aided diagnosis method ( Fig. 1) involving SHG images and k-nearest neighbors classifier in conjunction with automated machine learning tree-based pipeline optimization tool (TPOT) for rapid and accurate classification of fresh surgically excised human ovarian tissues into malignant, benign, borderline, and normal tissues, achieving areas under the receiver operating characteristic curve (AUROCCs) of 1.00, 0.99, 0.98, and 0.97, respectively. Combining SHG imaging and machine learning may facilitate and improve the efficiency and accuracy of ovarian cancer diagnosis.

Specimen preparation
This study was approved by the Institutional Review Board of the Affiliated Cancer Hospital of Fujian Medical University. After diagnosis by experienced pathologists, 7 malignant, 7 benign, 6 borderline, and 6 normal ovarian tissues were selected from 20 patients. Before SHG imaging, phosphate-buffered saline was used to remove residual blood and prevent tissue shrinkage. The samples were then placed on a glass slide for SHG imaging. Furthermore, to ensure that the tissues types of imaging area were consistent with the diagnosis results, the imaging areas of ovarian tissues were marked and then examined by a pathologist after SHG imaging.

SHG imaging
SHG imaging was performed using a commercial nonlinear optical microscope consisting of a confocal laser scanning microscope (LSM 880; Zeiss, Jena, Germany) and a mode-locked Ti:sapphire femtosecond laser (140 fs, 80 MHz) (Chameleon Ultra II; Coherent, Santa Clara, CA, USA). From the tunable wavelength in 680-1080 nm, a wavelength of 810 nm was employed in this study. The excitation laser was focused on the ovarian tissues through a plan-apochromat objective (20×, numerical aperture of 0.8, part number: 420650-9903-000, Zeiss) In addition, the emission spectra of SHG signals are shown as a narrow peak at half the excitation wavelength and a bandwidth (full width at half maximum) in accordance with the excitation laser spectral width. Then, the backscattered SHG signal was collected by the objective and spectrally separated by passing through a grating onto the Quasar detector that consists of 2 photomultiplier tubes plus 32 channel GaAsP array (detection range: 371-739 nm, part number 410136-1104-230, Zeiss). Furthermore, for selecting the SHG signal, the detection range of Quasar detector was set as 395-415 nm. A pseudo-colored green was used for the SHG signal emitted from collagen in the obtained SHG images. The field of view of the SHG images was 425.1 × 425.1 µm 2 (512 × 512 pixels), and the corresponding pixel dwell time was 2.05 µs. Therefore, imaging took approximately 0.5 s. To ensure that the proposed computer-aided diagnosis method suitably discriminated the different ovarian tissue types, only the lesion area of the tissue was imaged.

Collagen fiber alignment analysis
The analysis of the collagen fiber alignment of the ovarian tissues was performed using the fast Fourier transform and semicircular von Mises distribution computed by FiberFit (version: 2.0) [18], an open-source software package that provides the degree of fiber alignment by parameter k, where larger (smaller) values of k indicate more aligned (disordered) collagen fibers. Recently, FiberFit has been successfully employed to analyze collagen fiber alignment in human colon tissues [19]. Before utilizing FiberFit, the SHG images were processed as follows: First, the images were sharpened (3×3 weighted average), filtered (rolling ball background subtraction), smoothed (3×3 mean filter), and normalized (the intensity values of images), respectively. Then, each preprocessed image was converted into 8-bit grayscale image. Finally, a despeckle operation (3×3 median filter) was utilized to remove noise in converted images. After that, the preprocessed images were imported in FiberFit. A detailed tutorial of FiberFit is available at https://github.com/NTMatBoiseState/FiberFit.

Collagen fiber morphology analysis
The collagen fiber morphology was quantified using the curvelet-transform fiber-extraction algorithm (CT-fire V2.0 BETA) [20], which has been successfully utilized to analyze morphology of collagen fiber in the uninvolved colon lamina propria [19]. First, images were processed using a fast discrete curvelet transform [21] for denoising. Then, the fiber extraction algorithm [22] was used to determine the morphological features (e.g., length, width) of the collagen fibers in pixels. In this study, the length and width were used to characterize the collagen fiber morphology. A detailed description of this algorithm is available at https://eliceirilab.org/software/ctfire/.

Computer-aided diagnosis method
After feature extraction, the k-nearest neighbors classifier [23] was adopted to discriminate malignant, benign, borderline, and normal ovarian tissues. The classifier was implemented in the scikit-learn library [24]. To optimize the classifier performance, the TPOT [25] implementing a genetic algorithm [26] was used. A detailed description of TPOT is available at http://epistasislab.github.io/tpot/. The patients enrolled in this study were split into disjoint training/validation and test datasets at a ratio of 3:1. In addition, fivefold cross-validation was used during training. The machine learning algorithms were executed using the Spyder that is configured to Python 3.6.5. The parameters for optimization using the genetic algorithm are listed in Table 1.

SHG imaging of ovarian tissues
We collected 335 SHG images from the fresh ex vivo ovarian tissues without staining, obtaining 85, 80, 87, and 83 images of diagnosed malignant, benign, borderline, and normal ovarian tissues, respectively. Figure 2 shows representative examples of the ovarian tissue types, which present different collagen fiber organizations and morphologies. Normal ovarian tissue is characterized by straight, thin, and mesh-like collagen fibers. In malignant ovarian tissue, the fibers are more aligned and display a wave-like pattern. Borderline ovarian tissue is characterized by discrete, thicker, and disordered fibers compared with normal tissue. In benign ovarian tissue, the morphology of collagen fibers becomes denser and more tightly packed. The characteristics of collagen organization in malignant, benign, and normal ovarian tissues are consistent with the results of our previous study [27]. These morphological and structural differences reflected in the SHG images between ovarian tissue types are distinguished by the machine learning model.

Collagen fiber alignment analysis
We employed FiberFit to quantify the alignment of collagen fibers in malignant, benign, borderline, and normal ovarian tissues. Figure 3 shows representative collagen alignment quantification results for the four tissue types. The power spectra of the original SHG images were obtained to highlight pixel intensity changes ( Fig. 3(b), 3(f), 3(j), and 3(n)). Then, the orientation of the collagen fibers was obtained using the radial sum [28] (Fig. 3(c), 3(g), 3(k), and 3(o)). The alignment of collagen fibers was subsequently calculated by fitting semicircular von Mises distributions to the data (Fig. 3(d), 3(h), 3(l), and 3(p)). The resulting distributions are parameterized by k. Figure 4 shows the mean k value per ovarian tissue type. A two-tailed Mann-Whitney test was applied to determine significant differences among the mean values (p < 0.05 was considered as significantly different). Only benign and normal tissues exhibit similar collagen alignment. Besides, as can be inferred from Fig. 3 and Fig. 4, the collagen in malignant ovarian tissues is characterized by more aligned fibers, which is capable of facilitating cancer cells migration and tumor growth [29]. Thus, parameter k is a representative feature to classify the ovarian tissue types.

Collagen fiber morphology analysis
The collagen fiber morphologies, widths, and lengths were calculated using curvelet-transform fiber extraction. The collagen fiber lengths and widths are shown in Fig. 5 and 6, respectively. After pairwise comparison, we found that the tissue types were statistically different from each

Computer-aided diagnosis
Based on the feature space (i.e., k, width, length, k/width, k/length, width/length, k/width/length) formed by the combination of three extracted features, we employed the k-nearest neighbors classifier and TPOT to discriminate the four tissue types rapidly and accurately. To determine the optimal feature subset in the feature space to construct the best classifier, different combinations of features (i.e., k, width, length, k/width, k/length, width/length, k/width/length) were explored. In addition, owing to the randomness used by TPOT, we repeatedly processed each feature subset over 10 runs. The average accuracies of the training and test datasets from each feature subset are shown in Fig. 7 and 8, respectively. The highest classification performance was achieved when the three features (i.e., k/width/length) were selected totally. The highest average accuracies for the training and test datasets were 0.976 and 0.960, respectively. Therefore, from the 10 runs, the classifier provided by TPOT in the three feature spaces was selected for computer-aided diagnosis of ovarian cancer. The hyperparameters of the optimized classifier are listed in Table 2. The performance of the optimal classifier was evaluated using the receiver operator characteristic analysis, obtaining the results shown in Fig. 9. The AUROCCs for classification of malignant, benign, borderline, and normal ovarian tissues were 1.00, 0.99, 0.98, and 0.97, respectively.

Discussion
Regarding the growth pattern and malignancy, borderline ovarian cancer is intermediate between benign and malignant cancers [30]. However, the therapeutic regimens and prognoses of these cancers are different. Consequently, when borderline cancer is misinterpreted as being benign or malignant, the patient's condition may aggravate. Thus, borderline ovarian cancer must be accurately distinguished from benign and malignant cancers. Currently, the definitive diagnosis of ovarian cancer is based on histopathological examination, which is time-consuming and laborious. Moreover, pathological interpretation by a specialist may introduce interobserver bias, which influences the diagnostic accuracy. Hence, rapid and accurate diagnosis is required to facilitate the identification of ovarian cancer. To this end, we propose a computer-aided diagnosis method based on SHG imaging and machine learning. Compared with conventional histopathological examination, the proposed method analyzes ovarian tissue at the cellular level with neither cumbersome processing nor special staining. In addition, malignant, benign, borderline, and normal ovarian tissues are objectively distinguished by a machine learning algorithm, avoiding interobserver bias. Experimental results show that combining SHG imaging and machine learning enables the rapid and accurate discrimination of borderline ovarian tissues from malignant, benign, and normal ovarian tissues.
Many clinical imaging modalities have been used to diagnose borderline ovarian cancer before surgery or treatment [31][32][33][34][35]. Transvaginal ultrasound is the primary screening method for ovarian diseases. However, various studies have reported no significant differences between malignant and borderline ovarian cancer images [35,36]. In addition, magnetic resonance imaging has been widely used to characterize ovarian cancer. Unfortunately, the corresponding features of borderline ovarian cancer are similar to those of benign tumors and often misclassified [34]. In addition, computed tomography is unsuitable for ovarian disease screening due to the low soft-tissue contrast [5]. Thus, the proposed method may advance ovarian cancer screening. Although the imaging depth of SHG microscopy is limited to ovarian cancer screening, combining it with a laparoscopy may effectively circumvent this limitation.
Regarding the applicability of the proposed method in clinical settings, some limitations remain to be addressed. The number of patients enrolled in this study was relatively small. More data of patients from multiple centers should be evaluated. Besides, currently used SHG imaging instrument has limitations in monitoring dynamic changes of collagen in clinical setting in vivo. Fortunately, with the advancement of SHG endomicroscopy imaging technology, it is possible to implement in vivo imaging in body cavity [37]. Thus, the proposed method in this study could be combined with SHG endomicroscopy to implement clinical screening in vivo. In addition, the biosafety of the excitation laser used in this study should be studied to systematically determine its potential damage before clinical application.

Conclusion
We observed the remodeling of collagen structures in borderline, benign, and malignant ovarian tissues without staining via SHG microscopy. Then, we combined SHG imaging and machine learning to effectively identify malignant, benign, borderline, and normal ovarian tissues with AUROCCs of 1.00, 0.99, 0.98, and 0.97, respectively. The proposed method is promising for accurate ovarian cancer diagnosis and may increase the diagnostic efficiency. Disclosures. The authors declare no conflicts of interest.
Data availability. Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.