Applying low coverage whole genome sequencing to detect malignant ovarian mass

To evaluate whether low coverage whole genome sequencing is suitable for the detection of malignant pelvic mass and compare its diagnostic value with traditional tumor markers. We enrolled 63 patients with a pelvic mass suspicious for ovarian malignancy. Each patient underwent low coverage whole genome sequencing (LCWGS) and traditional tumor markers test. The pelvic masses were finally confirmed via pathological examination. The copy number variants (CNVs) of whole genome were detected and the Stouffers Z-scores for each CNV was extracted. The risk of malignancy (RM) of each suspicious sample was calculated based on the CNV counts and Z-scores, which was subsequently compared with ovarian cancer markers CA125 and HE4, and the risk of ovarian malignancy algorithm (ROMA). Receiver Operating Characteristic Curve (ROC) were used to access the diagnostic value of variables. As confirmed by pathological diagnosis, 44 (70%) patients with malignancy and 19 patients with benign mass were identified. Our results showed that CA125 and HE4, the CNV, the mean of Z-scores (Zmean), the max of Z-scores (Zmax), the RM and the ROMA were significantly different between patients with malignant and benign masses. The area under curve (AUC) of CA125, HE4, CNV, Zmax, and Zmean was 0.775, 0.866, 0.786, 0.685 and 0.725 respectively. ROMA and RM showed similar AUC (0.876 and 0.837), but differed in sensitivity and specificity. In the validation cohort, the AUC of RM was higher than traditional serum markers. In conclusion, we develop a LCWGS based method for the identification of pelvic mass of suspicious ovarian cancer. LCWGS shows accurate result and could be complementary with the existing diagnostic methods.


Introduction
According to the latest 2018 global cancer data report, the incidence of ovarian tumors in female reproductive system accounted for 3.4% of all female tumors in China, and the number of women who died of malignant ovarian tumors accounted for 4.4% of all female patients who died of tumors [1]. Ovarian cancer has become the second highest incidence and mortality of female reproductive system tumor following cervical cancer [1,2]. Because of the small size of the ovary and its position in the pelvic cavity, ovarian tumor itself lacks typical symptoms in early stage [3]. Patients often find that they have ovarian tumor after the pelvic cavity has a huge mass or bleeding in the vagina [4,5]. At this time, the tumor has developed to the late stage and most of them spread to other pelvic organs, and has missed the best time for treatment [6]. Therefore, the early detection of ovarian tumors is critical for clinical management and prognosis of patients. Multiple efforts have been made to evaluate traditional markers including serum concentration of CA125 and HE4 in the screening of ovarian cancers [7]. However, these markers did not meet the standards required to advocate population-based screening regarding with the diagnostic sensitivity and or specificity [8,9]. In order to improve the accuracy of diagnosis for ovarian cancer, additional cancer-specific diagnostic methods may be required.
In recent years, the rapid development in the field of next generation sequencing (NGS) and its application in low coverage whole genome sequencing (LCWGS) makes the detection of tumor-specific copy number alterations (CNA) in cell-free DNA feasible [10,11]. Evidence has showed that tumor-derived chromosome abnormalities would be detectable in the plasma of patients prior to surgery [10,12].
Previous studies have reported that occult pelvic cancers can be detected by LCWGS testing but it might cause false positive results [13]. However, the diagnostic accuracy of LCWGS platform and analytic pipeline for ovarian cancer remains unknown. The aim of this study is to investigate whether a clinical LCWGS platform could detect ovarian cancers in patients with pelvic masses based on the abnormal plasma DNA copy number variants (CNVs), and to compare the diagnostic accuracy with traditional screening markers including CA125 and HE4, and the score of risk of ovarian malignancy algorithm (ROMA) [14].

Subjects and samples
Sixty-three patients with a pelvic mass suspicious for ovarian malignancy, who were referred to the gynecology department of the First Affiliated Hospital of Sun Yat-sen university from January 2018 to July 2019 were recruited in this study. In addition, a cohort of 39 healthy female individuals were also recruited. Blood samples were collected using EDTA anticoagulated tube and sent for laboratory within 2 h. Another 24 cases from Sun Yat-Sen University Cancer Center from June 2021 to July 2021 were enrolled into the validation cohorts and used to validate our results. The study approval was obtained from the ethical committee of the First Affiliated Hospital of Sun Yat-sen university (S/55904). All participants submitted their written informed consents.

Sample processing and LCWGS
The blood samples were firstly centrifuged at 1600 g for ten minutes at 4 ℃, and then the supernatant was centrifuged at 16,000 g again for ten minutes at 4 ℃. The plasma was stored − 80 °C until analysis. The isolation, purification, library construction and sequencing of cell free DNA from the blood were performed by using a Fetal Aneuploidies Trisomy Detection Kit (Daan Gene Corp, China) on Ion Proton next-generation sequencer (Life Technologies) which was certified by the China Food and Drug Administration. All procedures were performed according to the manufacture's protocol.

Bio-informatics analysis
Raw sequencing reads were mapped to the human reference genome Hg19 using BWA (v0.7.1). Duplicate and low-quality reads were removed by Picard Tools (v1.11) and Samtools (v0.1.18) respectively. TorrentSuit software (v3.6) and a NIPT-plus plugin (provided by the Daan Gene Corp) was used to calculate the Stouffers Z-scores for whole chromosomes and CNV ≥ 5.0 MB. |Z-scores|> = 3 were marked as high risk. Both CNV counts and |Z-scores| (>=3) were extracted from each sample for further analysis.

Analysis of malignant risk
For further analysis of the risk of malignancy, data from 39 healthy females was used to form a baseline. Firstly, we calculated the mean of CNV counts and |Z-scores| (≥3), then the risk of malignancy(RM) of each suspicious sample was calculated as (CNV counts suspicious -CNV counts mean of healthy ) X (|Z-scores| suspicious -|Z-scores| mean of healthy ).

Tumor marker detection and ROMA scores
HE4 and CA125 were tested in stored plasma using the ARCHITECT HE4 and CA125 assays (Abbott Diagnostics, Abbott Park, IL, USA) according to the manufacturer's instructions.

Pathology diagnosis of pelvic mass
All diagnoses of patients were confirmed via pathological examination by pathologists who were blind to the results of clinical laboratory testing. Tumor staging was performed according to the International Federation of Gynecology and Obstetrics (FIGO) criteria (2010).

Statistical analysis
Statistical analysis was carried out by an online statistics tool (http:// dxonl ine. deepw ise. com/) and R software (Version 4.0.1) with pROC and Rattle package (5-7). Receiver operating characteristics (ROC) curve was used to evaluate the diagnostic value. A two-tailed P value of less than 0.05 was considered statistically significant.

Clinical and pathology data of subjects
This study included 63 patients with a pelvic mass suspicious of ovarian malignancy, who were finally identified as 34 (54%) high grade malignancy, 10 (16%) low grade malignancy and 19 (30%) benign mass by pathological diagnosis. The median age of premenopausal patients were 35 years (range, 16-53 years), and the median age of postmenopausal patients were 62 years (range, 46-83 years). The median age of patients with malignancies was 51 years (range: 21-70) and that of benign diseases was 30 years (range: 18-52). There was a significant difference in age distribution between these 2 groups of patients (P < 0.01). The FIGO stage of ovarian cancers patients included 13 (30%) I stage, 6 (14%) II stage, 18 (41%) III stage and 7 (16%) IV stage. The clinical and pathological data of subjects were listed in Table 1.

LCWGS on CNVs
LCWGS used a whole genome low coverage strategy to analyze the CNVs. For each sample, more than 5 M (5.9 ± 0.68 for all samples) reads was obtained. The coverage of each sample is about 0.35 × . A representative LCWGS figure for ovarian cancer and benign disease was shown in Fig. 1. The results from a patient with FIGO Stage III serous cystadenocarcinoma showed multiple regions of CNV (Fig. 1A). And the results from a patient with teratoma showed that no CNV (Fig. 1B). In this study, only 7 patients with malignancy showed trisomy or monosomy as indicated by LCWGS. To further investigate the diagnostic performance of LCWGS, CNV counts, max of Z scores (Zmax) of all CNVs, mean of Z scores (Zmean) and RM was calculated from each sample. Significant difference of LCWGS based index was found between patients with malignant and benign tumors. We have provided all the CNVs in supplement data (Additional file 1: Supplement Table 1 and Additional file 2: Supplement Table 2). However, it is difficult to identify the specific CNVs at the resolution of 5 MB or display all the results in one figure. So we selected 10 samples to generate a heat map to show the difference of CNVs in each chromosome between benign and malignant patients (Fig. 1C). Patients with malignancy showed higher level in LCWGS based index than patients with benign disease. In addition, these indexes were closely related to different FIGO stage (Fig. 2). The positive rates of RM in Stage I, Stage II, Stage III and Stage IV was 76%, 83%, 94% and 100% respectively.

Correlation between traditional tumor markers and LCWGS index
Spearman correlation was used to investigate the relationship between tumor markers and LCWGS index. As shown in Fig. 3 and Table 3, all indexes were statistically correlated (P < 0.01). However, the correlation between traditional tumor markers and LCWGS index was weak (r value range from 0.38 to 0.77). The weak correlation showed that RM and ROMA could be used as a complementary in the diagnosis of pelvic malignant mass.

Comparison of the diagnostic value of LCWGS and traditional tumor markers
Firstly, we evaluated the diagnostic value of single index in the reasearch subjects. The AUC of CA125 and HE4 was 0.775 and 0.866 respectively. HE4 showed better diagnostic accuracy than other markers. Then the integrated indexes were evaluated. The AUC of ROMA and RM was 0.876 and 0.837, respectively. And the AUC of RM combine CA125 and HE4 was 0.888. Both ROMA and RM showed higher diagnostic accuracy than single index. However, no significant difference was found between ROMA and RM (Delong test: P = 0.476), which indicated that ROMA and RM had similar diagnostic value between ovarian cancers and benign diseases.   Table 4).   childbirth and menopause status were found between the two group. In the validation cohort, the AUCs of ROMA and RM were 0.978 and 0.867 respectively.
RM showed better diagnostic value than ROMA. ALL data about the validation study in listed in supplement Table 2.

Discussion
As the second highest incidence and mortality of female reproductive system tumor following cervical cancer, ovarian cancer has the early clinical presentation that are difficult to be differentiated from digestive tract diseases, such as bloating or abdominal pain [15,16]. When ovarian cancer develops and spreads to the abdominal cavity, abdominal mass may appear [17]. Therefore, distinguishing between benign and malignant abdominal masses is very important for the early diagnosis of ovarian cancer. Oncogenesis involves many types of genomic variation, such as point mutation, copy number variation and gene fusion [18]. Tumors are different from genetic diseases, and their genomic variation is frequently acquired [19]. The development of ovarian cancer is a complex process involving the changes of DNA, RNA, and proteins [20,21]. The abnormal DNA of cancers could release from cancer tissues and be detected in blood samples in the form of cell free DNA [22]. Therefore, the detection of CNVs would be a promising method for the identification of malignant abdominal masses.
In this study, we evaluated whether CNVs detected by LCWGS platform could accurately predict the existence of malignancy. In our study cohort, the number of patients with malignant (43 cases) was higher than the patients with benign disease (19 cases). In addition, the patients with malignant disease were older than patients with benign disease. The difference in age distribution between malignant and benign patients would have impact on the level of tumor markers, however, the impact of age on CNVs was little. Our results showed that, chromosome variation could be detected in cell free DNA in patients with malignancy. However, only a few cases with malignant mass showed trisomy or monosomy. Despite that chromosome instability was common in tumor cells, owing to the low concentration of tumor derived cell free DNA, detection of trisomy or monosomy might lack sensitivity for clinical diagnosis [23]. We set our detection target to CNVs at the resolution of 5 MB. With this strategy, more chromosome instabilities could found in the subjects, however, the specificity might reduce. To solve this problem, we extracted more indexes from the LCWGS results and a healthy cohort was used to calibrate our results. Our results indicate that LCWGS based indexes were significantly different between patients with malignant and benign diseases and closely related to FIGO Stage, which would be valuable in the diagnosis of malignant mass. The diagnostic value of LCWGS based indexes were evaluated by ROC curve. Despite that CNV counts, Zmax and Zmean were useful for the diagnosis of malignant mass, however, the AUCs were less than 0.80. An integrated RM index which is calculated by CNV and Zmean and calibrated by a healthy cohort, showed better diagnostic performance with a AUC of 0.837. With the cut-off value of 1.25, RM is highly sensitive in the detection of malignant mass with all stage.
Both CA125 and HE4 were the most widely used markers in ovarian cancer diagnosis [24]. In our study, CA125 and HE4 showed significant difference between the malignant mass and benign disease, which is consistent with previous reports. In 2009, Moore proposed ROMA as a new algorithm. He correlated HE4 and CA125 levels with menopausal status, which was defined as 6 months of menopause without menstruation or clinical symptoms. The ROMA corresponds to the predicted probability [PP], expressed as a percentage [14]. The sensitivity of ROMA for ovarian cancer diagnosis varies from 75 to 97%, however, the detection of early stage malignancy was still a problem [25][26][27]. We compared the diagnostic value between RM and ROMA, despite that ROMA showed higher AUC than RM, however, the difference was not statistically significant. The sensitivity of RM (0.895) is superior to that of ROMA (0.684), while the specificity of RM (0.773) is inferior to that of ROMA (0.909). The CA125 and HE4 were correlated with LCWGS based index. However, the correlation was weak. Therefore, RM and ROMA could be used as a complementary in the diagnosis of pelvic malignant mass.
To validate our results, another 24 patients from Sun Yat-Sen University Cancer Center were recruited with the same inclusion criteria and tested by LCWGS. Our results showed that the LCWGS strategy was still a useful tool in the discrimination of malignant and benign diseases and showed better diagnostic performance than ROMA. In the validation study, the patients with malignant disease were at advanced stage, which would explain that why the AUC of RM is higher than that in the training study.
Low specificity of RM may originate from the bio-informatics pipeline in LCWGS. All CNVs in whole genome were used for further analysis. Ovarian cancers showed specific gain or loss of chromosomes in tissues as demonstrated by other studies, however, there was no widely accepted specific CNVs in cell free DNAs [28]. Further studies should be developed and focus on ovarian cancer specific CNVs to improve the diagnostic specificity. In addition, the increase of sequencing depth would be helpful in increasing the diagnostic value. Further studies could try to ascertain the sequencing depth regarding with the cost and effect.
A limitation of this study was that the number of patients was small. A larger sample size is needed to validate our findings, and to conduct further studies on different FIGO stages of ovarian cancer or in patients with pre-and post-menopause.