Meta-analysis of differentially expressed genes in osteosarcoma based on gene expression data

Background To uncover the genes involved in the development of osteosarcoma (OS), we performed a meta-analysis of OS microarray data to identify differentially expressed genes (DEGs) and biological functions associated with gene expression changes between OS and normal control (NC) tissues. Methods We used publicly available GEO datasets of OS to perform a meta-analysis. We performed Gene Ontology (GO) enrichment analysis, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis and Protein-Protein interaction (PPI) networks analysis. Results Eight GEO datasets, including 240 samples of OS and 35 samples of controls, were available for the meta-analysis. We identified 979 DEGs across the studies between OS and NC tissues (472 up-regulated and 507 down-regulated). We found GO terms for molecular functions significantly enriched in protein binding (GO: 0005515, P = 3.83E-60) and calcium ion binding (GO: 0005509, P = 3.79E-13), while for biological processes, the enriched GO terms were cell adhesion (GO:0007155, P = 2.26E-19) and negative regulation of apoptotic process (GO: 0043066, P = 3.24E-15), and for cellular component, the enriched GO terms were cytoplasm (GO: 0005737, P = 9.18E-63) and extracellular region (GO: 0005576, P = 2.28E-47). The most significant pathway in our KEGG analysis was Focal adhesion (P = 5.70E-15). Furthermore, ECM-receptor interaction (P = 1.27E-13) and Cell cycle (P = 4.53E-11) are found to be highly enriched. PPI network analysis indicated that the significant hub proteins containing PTBP2 (Degree = 33), RGS4 (Degree = 15) and FXYD6 (Degree = 13). Conclusions Our meta-analysis detected DEGs and biological functions associated with gene expression changes between OS and NC tissues, guiding further identification and treatment for OS.


Background
Osteosarcoma (OS), the most common non-haematological primary malignant tumor of bone, occurs most commonly in the metaphyseal regions of long bones mainly in adolescents and young adults, but also in patients over 40 years of age [1]. Though the survival rate has been improved after the introduction of neoadjuvant chemotherapy, the need for advances in treatments is still very urgent [2,3]. Therefore, an in-depth understanding of the pathobiology of OS is needed to develop rational treatment options for OS. Cytogenetic analyses have revealed that most conventional OShave complex karyotypes with numerous and highly variable genomic aberrations [4]. Many genes become dysregulated due to genomic aberrations, and DNA copy number and DNA methylationand and gene expression data combined to identify oncogenes and tumor suppressor genes in OS [5,6].
As the high-throughput technologies have been used in many fields, detection of expression level across the whole genome is a useful way to find unusual genomic alteration in OS patients with microarray. Recently, researchers have used this technique to more comprehensively increase knowledge about the cellular and molecular changes in OS [7][8][9][10][11][12][13]. Although these studies have shown lists of differently expressed genes (DEGs), there tends to be inconsistencies among studies due to limitations of small sample sizes and varying results obtained by different groups, accomplished by different laboratory protocols, microarray platforms and analysis techniques [14]. Recent studies have shown that the systematic integration of gene expression data from multiple sources, so-called meta-analyses, can increase statistical power for detecting differentially expressed genes while allowing for an assessment of heterogeneity, and may lead to more robust, reproducible and accurate predictions [15,16]. Similar meta-analysis has never been conducted for OS, and we first perform a meta-analysis of gene expression data sets from various OS studies to overcome the limitations of individual studies, resolve inconsistencies, and reduce the likelihood that random errors are responsible for false-positive or false-negative associations, and lay a foundation for uncovering the pathology of OS and further generating new therapies for OS.

Identification of eligible OS gene expression datasets
OS expression profiling studies were identified by searching PubMed database. The following key words and their combinations were used: "osteosarcoma, gene expression, microarray, genetics". In addition, the Gene Expression Omnibus database (GEO, http://www.ncbi.nlm.nih.gov/geo) was also searched to ensure the relevant studies were not missed [17]. We only retained the original experimental articles that analyzed gene expression profiling between OS and normal control (NC) tissues. Non-human studies, review articles and integrated analysis of expression profiles were excluded ( Figure 1). We conducted this meta-analysis in accordance with the guidelines provided in the PRISMA statement (Additional file 1: The PRISMA Checklist S1). Data were extracted from the original studies by 2 independent reviewers. Any discrepancies between reviewers were resolved by consensus or a third reviewer. The following information was extracted from each identified study: GEO accession number, sample type, platform, number of cases and controls, references, and gene expression data.

Data preprocessing
Normalization is important for comparison of microarray data sets. The heterogeneity of different datasets caused by different platforms, different gene nomenclature and different control tissues may make it difficult to directly compare the data sets from various sources. The improperly normalization used in the comparisons of microarray data sets may run a high risk of skewing comparison results and reduce the credibility of measurements of individual gene expression change. In this case, a global normalization method to minimize the inconsistency should be considered. For this purpose, we used the Z-score transformation approach to calculate the expression intensities of each probe in gene expression profiles. Z-scores were calculated according to the following formula: Where x i represents raw intensity data for each gene; x represents average gene intensity within a single experiment and δ represents standard deviation (SD) of all measured intensities.

Statistical analysis
The significance analysis of microarray (SAM) software was then used to identify the DEGs between pathological and control samples. This procedure identifies DEGs by carrying out gene specific t-statistics, with a "relative difference" score for each gene. The D value was defined as the average expression change from different expression states to the standard deviation of measurements for that gene. Genes exhibiting at least two-fold changes corresponding to a false discovery rate (FDR) less than 0.05 were selected as the significantly DEGs [18].

Functional classification of DEGs
In order to interpret the biological significance of the DEGs, we performed Gene Ontology (GO) enrichment analysis to investigate their functional distribution in OS. The online based software GENECODIS (http://genecodis. cnb.csic.es) was used to perform this analysis [19]. In addition, we also performed the pathway enrichment analysis based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database.

PPI network construction
The protein-protein interactions (PPIs) research could reveal the functions of proteins at the molecular level and help discover the rules of cellular activities including growth, development, metabolism, differentiation and apoptosis [20]. The identification of protein interact ions in a genome-wide scale is an important step for the interpretation of the cellular control mechanisms [21]. In this analysis, we used Biological General Repository for Interaction Datasets (BioGRID) (http://thebiogrid.org/) to construct PPI network and visualized the distribution characteristics of the top 10 up-and down-regulated DEGs in the network in Cytoscape [22].

Short overview of the studies included
In this work, we collected a total of 8 expression profiling studies according to the inclusion criteria, among which it included240 samples of OS and 35 samples of controls. Selected details of the individual studies are summarized in Table 1. 4 studies utilized bone tissues of OS samples, 3 studies utilized OS cell lines and 1 study utilized bone tissues and cell lines simultaneously.

Detecting genes associated with OS
To identify the genetic markers involved in the development and progression of OS, we firstly unified the probe ID, for a microarray platform that represent a named gene, and the HUGO symbol of that gene, to the Entrez gene ID. The expression value was logarithmically transformed (base 2). This lead to a total of 14722 genes obtained. The expression values for each gene were then transformed to the z-score for the purpose of global normalization. By using the assembled expression compendium, we investigated the global shifts of gene expression between OS and NC. Then, we used SAM method to identify DEGs between pathological and control samples. With a FDR of 0.05 and by applying a minimal fold change of 1.4, a total of 979 genes were found to show altered expression in samples of OS compared with normal control. Among those DEGs, 472 genes were up-regulated and 507 genes were downregulated. A list of the top 10 most significantly up-or down-regulated DEGs was presented in Table 2. The up-regulated gene with the lowest P-value (P = 5.08E-15) was CPE (carboxypeptidase E), previously reported to correlate with tumor growth and metastasis [23], which is a carboxypeptidase that cleaves C-terminal amino acid residues and is involved in the biosynthesis of peptide hormones and neurotransmitters, including insulin [24]. The down-regulated gene with the lowest P-value (P = 1.86E-48) was NPR3 (natriuretic peptide receptor 3) that acts as a decoy/clearance receptor functioning to limit the effects of natriuretic peptides. The full list of these genes was provided as Additional file 2: Table S1.

Functional annotation
To gain insights into the biological roles of the DEGs from OS, we performed a GO categories enrichment analysis. Gene ontology provides a common descriptive framework and functional annotation and classification for analyze the gene sets data. GO categories are organized into three groups: biological process, cellular component, and molecular function. The biological process and molecular functions are thus examined separately in our analysis by web-based software GENECODIS. Genes that showed a nominal significance level of P < 0.01 were selected and were tested against the background set of all genes with GO annotations. We found GO terms for molecular functions significantly enriched in protein binding (GO: 0005515, P = 3.83E-60) and calcium ion binding (GO: 0005509, P = 3.79E-13), while for biological processes, the enriched GO terms were cell adhesion (GO: 0007155, P = 2.26E-19) and negative regulation of apoptotic process (GO: 0043066, P = 3.24E-15), and for cellular component, the enriched GO terms were cytoplasm (GO: 0005737, P = 9.18E-63) and extracellular region (GO: 0005576, P = 2.28E-47) (Figure 2). To further evaluate the biological significance for the DEGs, we also performed the KEGG pathway enrichment analysis. Hypergeometric test with P value < 0.05 was used as the criteria for pathway detection ( Table 3). The most significant pathway in our KEGG analysis was Focal adhesion (P = 5.70E-15). Furthermore, ECM-receptor interaction (P = 1.27E-13) and Cell cycle (P = 4.53E-11) were found to be highly enriched.

Protein-Protein interaction (PPI) network construction
Nodes represent proteins, edges represent interactions between two proteins. The higher the node shape, the greater deagree of connection. The PPI networks we established for the top 10 up-regulated and down-regulated DEGs by Cytoscape software included 129 nodes and 182 edges. The significant hub proteins containing PTBP2 (polypyrimidine tract binding protein 2, Degree = 33), RGS4 (regulator of G-protein signaling 4, Degree = 15) and FXYD6 (FXYD domain containing ion transport regulator 6, Degree = 13) (Figure 3), we also annotated the edges connecting the top 10 up-regulated and down-regulated DEGs directly or indirectly in Additional file 3: Table S2, and numbered in Figure 3.

Discussion
Osteosarcoma (OS) is an aggressive cancer demonstrating both high metastatic rate and chemotherapeutic resistance. A comprehensive analysis of the mechanism underlying OS development is of crucial importance for management policy. In this paper, we chose a meta-analysis approach that combines differently expressed genes (DEGs) from microarray datasets to highlight genes that were consistently expressed differentially with statistical significance, and performed GO enrichment analysis and KEGG pathway analysis, and construct the protein-protein interaction (PPI) networks. We performed a meta-analysis using 8 publicly available GEO data sets to identify common biological mechanisms involved in the pathogenesis of OS. OS is a kind of rare tumor, where material from clinical samples is scarce, therefore data from both bone tissues and cell lines were included in our meta-analysis. In total, 979 genes across the studies were consistently expressed differentially in OS (472 up-regulated and 507 down-regulated). The up-regulated gene with the lowest P-value (P = 5.08E-15) was CPE (carboxypeptidase E), which is a carboxypeptidase that cleaves C-terminal amino acid residues and is involved in the biosynthesis of peptide hormones and neurotransmitters, including insulin [24], but at the present the role and association with OS have not yet been reported. In line with previous findings, We found that some genes have been closely related to the development of OS among the top ten up-regulated DEGs, such as  HEY1, FXYD6, EFNA1. HEY1, one of target genes of NOTCH1, was reported to be up-regulated in OS from p53 mutant mice, suggesting that activation of Notch signaling contributes to the pathogenesis of OS [25]. Another study also found that HEY1 and other downstream target genes of Notch signaling including HES1, NOTCH1 and NOTCH2, were elevated in canine osteosarcoma by gene expression microarray analysis and reverse transcriptase -quantitative PCR (RT-qPCR) [26]. Olstad OK et al. applied directional tag PCR subtractive hybridization to construct a cDNA library generated from three different human osteosarcoma (OS) target cell lines (OHS, SaOS-2 and KPDXM), and identified FXYD6 was enriched in OS cell lines [27]. EFNA1 was significantly elevated in OS samples by using genome-wide microarrays, and in vitro study on the functional role of EphA2 and EFNA1 showed that EFNA1 ligand binding induced increased tyrosine phosphorylation, receptor degradation and downstream mitogen-activated protein kinase (MAPK) activation [28]. The down-regulated DEGs with the lowest P-value (P = 1.86E-48) was NPR3 (natriuretic peptide receptor 3) that acts as a decoy/clearance receptor functioning to limit the effects of natriuretic peptides. NPR3, which is an important anabolic regulator of endochondral bone growth, is enriched in bone marrow-derived mesenchymal stem cells, and there is no relevant report to OS at present. In our meta-analysis Gas6 was identified to be one of the top ten down-regulated DEGs. In OS cell lines, rhGas6 could activate Axl to protect the tumor cells from apoptosis caused by serum starvation, and promote tumor cells' migration and invasion in vitro [29].
In order to uncover the biological roles of the DEGs from OS, we performed a GO categories enrichment analysis. We found GO terms for molecular functions significantly enriched in protein binding and calcium ion binding, while for biological processes, the enriched GO terms were cell adhesion and negative regulation of apoptotic process, and for cellular component, the enriched GO terms were cytoplasm and extracellular region. To further evaluate the biological significance for the DEGs, we also performed the KEGG pathway enrichment analysis. Focal adhesion, ECM-receptor interaction and Cell cycle in our KEGG analysis were found to be highly enriched. Many signal transduction pathways involved in OS development were stimulated by bone morphogenetic proteins (BMPs), transforming growth factors (TGFs), Notch family proteins and Wnt family proteins, and components of each of these pathways have been implicated in OS. Interestingly, we noted that the most significant pathway in our KEGG analysis was Focal adhesion. Focal adhesions are associated with cell migration dynamics. However in the human cells focal adhesion would initially appear to be contradictory to their migratory phenotype. It has been proved previously that knockdown of paxillin in highly metastatic OS sub-lines M112 and 132 would inhibit cell migration [30].
Furthermore the results from PPI network analysis of the top 10 up-regulated and down-regulated DEGs indicated the significant hub proteins containing PTBP2, RGS4 and FXYD6. PTBP2, a member of PTB (polypyrimidine tract binding protein) family of RNA-binding proteins which plays a critical role in development through the regulation of post-transcriptional events, is expressed in the nervous system including the brain, the neural retina and the spinal cord and the intermediate mesoderm [31]. PTBP2 regulates the generation of neuronal precursors in the embryonic brain by repressing adult-specific splicing [32], but the function involved in OS development has not been discovered. Our result of PPI suggested that PTBP2 may play an important role in the development.
The present study has some limitations. First, heterogeneity and confounding factors may have distorted the analysis. Clinical samples might be heterogeneous with respect to clinical activity, severity, or gender. Although we conducted global normalization for different data sets, the heterogeneity of various microarray platforms used in different studies can't remove. Second, there are differences in gene expression between tissues such as bones, cell lines and lung that were not taken into account. However, our meta-analysis integrated data from different studies which may enable us to detect genes that we would otherwise have missed in an analysis. Despite these limitations, our discover have important implications for the molecular mechanisms of OS,but further experimental research is still need to confirm our study.

Conclusions
In conclusion, by this meta-analysis based on gene expression data of osteosarcoma we have shown the underlying molecular differences between NC tissues and osteosarcoma, including DEGs and their biological function which may contribute to the successful identification of therapeutic targets for osteosarcoma. Further functional studies may provide additional insights into the role of the differentially regulated genes in the pathophysiology of osteosarcoma.