Ovarian cancer is the second mostly encountered gynecologic cancer worldwide with high mortality and low rate of five-year survival due to the difficulty of early diagnosis and high rates of recurrence and metastasis [1]. If ovarian cancer could be diagnosed at the early stage, 90% patients could survive for more than five years with appropriate treatment [2]. Despite numerous attempts, the common molecular events for early diagnosis of ovarian cancer remain to be established. Gene expression profile analysis is a powerful research strategy, which integrates data in genetics, molecular transcription, and functional genomics to reveal dysregulated genes between patients and healthy donors. Microarray provides increasing body of gene-wide transcriptional data regarding ovarian cancer. For example, Moreno et al. proposed that cellular proliferation, cell cycle, DNA damage, and apoptosis were up-regulated in ovarian cancer by using microarrays [3]. Bowen et al. identified alterations in expression levels of gene products functioning in several signaling pathways including Wnt, Notch, TGFβ/BMP and canonical cell cycle in ovarian cancer by microarray analysis [4]. However, results vary between studies due to the diversity in cohort selection, specimen source, and experimental designs. Combining different microarray datasets is advantageous to enhance statistical power to detect the dysregulated genes that might be functional in ovarian cancer development and progression.
Microarray data integration-based meta-analyses depend on efficient in silico tools. With the advances of ever-growing theories and bio-informatics tools, we can now employ in silico tools to efficiently combine multiple microarray datasets ignoring different populations, experimental designs, and different specimen sources [5–7]. NetworkAnalyst is a useful web-based tool, which functions in many aspects such as preliminary data processing, sample annotation, batch effect adjustment, dataset integration, and results visualization [8–12]. o maximally overcome the impact caused by the differences in study design and platform usage among different datasets, “Combining Effect Size (ES)” analysis and Random Effect Modeling (REM) were applied to achieve more consistent and accurate results by taking into consideration of both direction and magnitude of gene expression changes. In this study, 16 eligible microarray datasets for ovarian cancer were selected from publicly available dataset repositories. Totally, 972 differentially expressed genes (DEGs) with P-value < 0.001 were identified, including 541 up- and 431 down-regulated genes, respectively. Interestingly, by comparing the list of meta-based DEGs to that of individual microarray dataset, 92 additional DEGs were demonstrated as gained DEGs. Expression profiles for the CD24 molecule (CD24), Sex determining region Y-box 17 (SOX17), WAP four-disulfide core domain 2 (WFDC2), the epithelial cell adhesion molecule (EPCAM), innate immunity activator (INAVA), and Aldehyde oxidase 1 (AOX1) were validated in clinical cancer samples of ovarian cancer, which are among top five dysregulated DEGs revealed by our meta-analysis. Functional analysis uncovered that WFDC2, INAVA, and AOX1 affect cell proliferation, migration, and apoptosis respectively. Our results provided with novel molecular mechanisms for the development and progression of ovarian cancer and propose potential targets for the prevention and treatment for clinics [13].
Materials Identification and Selection of Eligible Gene Expression Datasets for Meta-analysis
Microarray-based gene expression profiling studies for ovarian cancer were identified in the PubMed database (http://www.pubmed.gov), Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/gds/) and ArrayExpress dataset of the European Molecular Biology Laboratory-European Bioinformatics Institute (http://www.ebi.ac.uk/arrayexpress/). The following keywords were used: ("Ovarian neoplasms"[Mesh]) and (microarray or gene profile or gene profiling). Eligible studies and datasets should follow these inclusive criteria: (1) case and control studies of human; (2) analysis of gene expression profiling; (3) comparable experimental conditions and untreated; (4) available complete raw and processed microarray data. Studies were excluded if they were: (1) letters, abstracts, meta-analysis, review articles and human case report; (2) cell lines were used in experimental design; (3) RT-PCR only for profiling studies; (4) studies without healthy control. All the datasets and references, which are conformed to the above-mentioned criteria, were manually screened. The complete workflow was shown in Fig. 1 for eligible dataset selection. The latest search was performed on Mar. 15, 2019.
Data Extraction and Processing
Full text and supplementary materials of selected articles were extracted from each identified study: GEO series accession number, number of patients and healthy donors, specimen sources, platform of microarray and references. The series matrix files were downloaded from GEO datasets for all studies, and common Entrez IDs were used to substitute all the gene probes in accordance with the corresponding microarray platforms, with the exception of GSE17308, in which the gene probe was substituted by Genbank ID. Before integrative meta-analysis, individual dataset was normalized by R-mediated mean, log2 transformation and quantile normalization. Expression data of patients with ovarian cancer and healthy donors were defined as class 2 and class 1 respectively, according to the guidelines of NetworkAnalyst [6].
Batch Effect Adjustment and Individual Data Analysis
Batch effect correction option in NetworkAnalyst was used to reduce potential study-specific batch effects, by which the normalized individual datasets were subjected to the well-established ComBat procedures [11]. Emperical Bayes methods were used to stabilize the expression ratios for genes with very high or very low ratios, stabilize gene variances by shrinking variances across all other genes, and protect their inference from artifacts in the data. To compare the sample clustering patterns with and without applying the ComBat procedures, the results were visually examined using the principal component analysis (PCA).
Meta-analysis
We conducted the meta-analysis using NetworkAnalyst, a web interface for integrative statistical analysis and visualization. In the option of “multiple gene expression data”, all the datasets were uploaded to the “multiple gene expression data” input area and analyzed in a streamlined manner, including data processing for Entrez ID/Genbank ID, annotation check, and PCA plot examining, confirmation of data normalization, individual DEG analysis and data summary. For NetworkAnalyst based DEGs discrimination, the cut-off of P-value was adjusted to 0.001. Furthermore, all datasets were subjected to integrity check to ensure that the merged data could be carried out by ES combination, which would generate more biologically consistent meta-based DEGs by incorporation of both the magnitude and direction of gene fold change. While performing Cochran’s Q test on the integration data, the random effects model was suggested to be used when Q values deviates significantly from a chi-squared distribution; otherwise, the fixed-effects model was selected. The Q values deviates significantly from a chi-squared distribution of the current study, so this study used REM which assumes that each study contains a random effect size that could incorporate unknown cross-study heterogeneities, as demonstrated by Cochran’s Q tests [10, 12]. “Define custom signature” tool from NetworkAnalyst was used to produce the heatmap visualization for both top 25 up-and down-regulated genes.
Patients and Specimens
Thirty epithelial ovarian carcinoma samples and thirty unpaired normal ovary samples were collected at the Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences. No local or systemic neo-adjuvant radiotherapy, or/and chemotherapy and targeted therapy had been received. Written informed consent was obtained from each patient. Our study was approved by the Ethics Committee of Shandong First Medical University and Shandong Academy of Medical Sciences.
Cell Culture and siRNA Transfection
OVCAR-3 cell line was obtained from Procell Life Science & Technology Co., Ltd (Wuhan, China). HOSEpiC, and CAOV-3 cell was purchased from BeNa culture collection (Beijing, China). CAOV-3 cells were maintained in Dulbecco's Modified Eagle Medium (DMEM) (Bioind, Kibbuiz, Israel). OVCAR-3 cells were cultured in Roswell Park Memorial Institute (RPMI) 1640 Medium (Bioind, Kibbuiz, Israel). All culture medium was supplemented with10% Fetal Bovine Serum (FBS) (Bioind, Kibbuiz, Israel) and 1% penicillin/streptomycin. Cells were cultured in a humidified incubator at 37 °C and 5% CO2. siRNAs for CD24, SOX17, WFDC2, EPCAM, INAVA, basonuclin 1 (BNC1), AOX1, gasdermin E (GSDME), receptor accessory protein 1 (REEP1) and heart and neural crest derivatives expressed 2 (HAND2) were transfected into OVCAR-3 and CAOV-3 cells by HiperFect at the concentration of 50 nM according to the manufacturer's instructions. The sequences of siRNAs used for CD24, SOX17, WFDC2, EPCAM, INAVA, BNC1, AOX1, GSDME, REEP1, and HAND2 knockdown are listed in Table SI.
Real-time quantitative RT-PCR (qRT-PCR)
Total RNA was extracted by using TRIzol Reagent (Invitrogen, Carlsbad, USA) according to the manufacturer’s instructions. RNA was reverse transcribed with the mRNA 1st Strand cDNA Synthesis Kit (Vazyme, Nanjing, China) for mRNA. qRT-PCR assay was performed on an Applied Biosystems 7500 instrument (Applied Biosystems, Foster, USA) by using SYBR Green (Invitrogen, Carlsbad, USA). Relative RNA quantification was calculated via the comparative 2−ΔΔCt method. The relative expression of indicated genes was normalized to that of GAPDH in each sample.
Cell Apoptosis Assay
For apoptosis assay, cells were seeded into 12-well plates and allowed to adhere overnight. Then, the cells were digested with EDTA-free trypsin and left to dissociate into single cells. After being washed with PBS, the harvested cells were resuspended at a density of 1 × 106 cells/ml in 1 × binding buffer and double stained with FITC Annexin V and Propidium Iodide using FITC Annexin V Apoptosis Detection Kit I (BD Biosciences, USA). A minimum of 1 × 104 single-cell events were acquired by flow cytometry (BD FACSVerse, USA), and cell apoptosis was analysed by using FlowJo software.
Cell Proliferation Assay
OVCAR-3 and CAOV-3 cells were seeded at 2000 cells per well in 96-well plates after transfection. CCK8 (Cell Counting Kit-8) assay was used to measure cell number and viability, and the absorbance was determined using a spectrophotometric plate reader at 450 nm (SPECTRAMAX 190, USA). The data were represented as mean ± SD. Each experiment was repeated three times.
Cell Migration Assays
Cell migration was assessed by transwell experiments. For cell migration assay, indicated cells were trypsinized, re-suspended in serum-free DMEM medium, and placed in the upper chamber (4 × 104 cells/well). Then, DMEM medium supplemented with 10% FBS was added to the lower chamber. The plates were incubated for 48 h. Cells in the upper chamber were completely removed with a cotton swab. Cells migrating into the lower chamber were washed with PBS, fixed in 4% paraformaldehyde, and stained with 0.5% crystal violet. Finally, the cells were counted under a microscope in five random fields.
Statistical Analysis
The web-server NetworkAnalyst analyzed the data and performed meta-analysis by REM-based combined ES statistics. This study adjusted P-value < 0.001 to discriminate the DEGs. All experiments were performed at least three independent times, unless otherwise stated. Values are presented as the mean ± SD. The 2-tailed Student's t-test was used for comparisons between two groups. Graphpad 8.0 was used for statistical analysis of data and data plotting. P-value ≤ 0.05 was considered significant.