8 Data Mining Pubmed Identifies Core Signalings and miRNA Regulatory Module in Glioma

s were retrieved using National Center for Biotechnology Information (NCBI)


Introduction
Glioblastoma multiforme (GBM) is the most common form of malignant brain cancer and persist as serious clinical and scientific problems.The current standard of therapy for GBM patients, include surgery, radiotherapy and chemotherapy with temozolomide, produces a median survival of only 14.6 months (Stupp et al., 2005).Now, new intervention is increasingly being tested, particularly with inhibitors of neo-angiogenesis and growth factor receptors, and high throughout profiling studies are leading to the discovery of novel genetic alterations and signaling pathways.The Cancer Genome Atlas Network recently catalogs recurrent genomic abnormalities in GBM, and proposes a molecular classification of GBM into Proneural, Neural, Classical, and Mesenchymal subtypes and integrates multidimensional genomic data to establish patterns of somatic mutations and DNA copy number (Verhaak et al., 2010).In recent years, microRNAs (miRNAs), small noncoding RNA molecules, have been identified in the progression of various human cancers and used to a notable molecular label to cancers.In glioma, have been proven to play critical roles in gliomagenesis and proposed as novel targets for antiglioma therapies (Shi et al., 2008;Shi et al., 2010;Zhang et al., 2009b;Zhang et al., 2010c;Zhou et al., 2010a;Zhou et al., 2010b).Thus, molecular regulation of glioma is comprehensive and still unclear and under further investigation.Biomedical literature is growing at a double-exponential pace, with approximately 20 million publications in MEDLINE.Up to now, there have been more than 50 thousand of glioma-related publications in MEDLINE (Pubmed with: glioma).Thus, a massive wealth of information is embedded in the literature and waiting to be discovered and extracted.Literature mining is a promising strategy to utilize this untapped information for knowledge discovery and has been applied successfully to various biological problems including the discovery and characterization of molecular interactions (protein-protein, gene-protein, gene-drug, protein sorting and molecular binding) (Friedman et al., 2001; Table 3. Set of GO terms with highly enriched genes. To further explore the pathway involved in these genes, we searched KEGG database for their pathway information.16 pathways whose P-value was less than 0.01 were kept (Table 4).The most top enriched pathway is p53 signaling pathway, including 27 genes and Tolllike receptor signaling pathway including 33 genes.

Interaction network of glioma-related genes
To uncover the potential interaction networks or synergistic effects of these glioma-related genes, we employed each gene set as queries and searched for their interaction partners by accessing the database STRING.STRING integrates different public databases containing information on direct and indirect functional protein-protein associations by benchmarking them against the common reference set, KEGG pathway database.204 genes had interactions in the database STRING.We next tried to connect these genes into a network to identify biologically informative linker genes which were statistically enriched for connections to member of glioma-related gene list.Figure 1A summarized PIK3CA, PIK3CB and JAK2 three queries served as "hubs" (label with red circle), which has high connection and was an indicator for essentialness in a network.Surprisingly, further analysis found that PIK3CA, PIK3CB and JAK2 were associated with signaling transduction, MAPK pathway, growth factor, cell apoptosis, cell proliferation, cell adhesion and cell migration (Figure 1B).Given that PIK3CA and PIK3CB encode the protein PI3K subunit p110 and p110 , respectively, these data suggested that PI3K and JAK2 signalings provided excellent biomarkers for glioma aggressiveness.(A) Connectivity analysis was performed using the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) to generate glioma-related gene knowledge-driven network, as described in Methods.Analysis revealed PI3KCA, PI3KCB and JAK2 are hub genes with P-value<0.001,which had an influential role in network stability.(B) PI3K and JAK2 hub signalings located at the key status of glioma-related gene knowledge-driven network, and exerted a wide effect on kinds of biological functions and pathways, including signaling transduction, MAPK pathway, growth factor, cell apoptosis, cell proliferation, cell adhesion and cell migration.Purple lines correspond to activation, blue lines to inhibition, and yellow lines to association.Red circles (PI3KCA, PI3KCB and JAK2) are indicated for hub genes.

Glioma-related miRNA pathway
Because each miRNA target prediction program uses a different computer-aided algorithm for prediction, encompassing all these methods will probably produce a more reliable model of target prediction.Thus, a union target gene list of 14 glioma-related miRNAs was generated from 3 target prediction programs (PicTar, TargetScan and miRanda).To further explore the signaling pathway in these target genes, pathway analysis was performed.Table 5 showed that p53 signaling pathway, Apoptosis, Focal adhesion, MAPK signaling pathway, Toll-like receptor signaling pathway and Cell cycle pathways were significantly overrepresented.Actually, these 6 pathways were included in the pathways of glioma-related genes.These findings imply that glioma-related genes and miRNAs prefer a common set of signaling pathways.Over-represented KEGG pathways were identified after multiple testing adjustments (P-value<0.05).

Pathway
Table 5. Set of signaling pathways with highly enriched microRNA targets.
In order to construct the network between glioma-related miRNAs and the signaling pathway, integrated analysis of the targets of glioma-related miRNAs was performed.This procedure obtained 6 miRNA-pathway networks.The network was visualized with Medusa software.Blue quadrangles represent gliomarelated miRNAs.Red circles represent miRNA targets that were overlapped by gliomarelated geges.

Discussion
The overall utility of our data mining approach, including the strategy for constructing interaction networks, is to explore biological mechanisms involved in glioma progression.In this study, we obtained 670 genes and 14 miRNAs that interacted with glioma and generated interaction networks from abstract-based text mining.Importantly, our analysis identified PI3K and JAK2 hub signalings and miRNA regulatory module in glioma.

Core signalings in glioma
By integration of PubMed text mining, homology prediction, gene neighbor, protein-protein interaction, gene fusion and other data sources, we constructed glioma-related genes knowledge-driven network.Further analysis revealed that PI3K and JAK2 hub signalings that had an influential role in network stability, located at the key status of glioma-related genes knowledge-driven network.These signaling exerted a wide effect on kinds of biological functions and pathways, including signaling transduction, MAPK pathway, growth factor, cell apoptosis, cell proliferation, cell adhesion and cell migration.Further, integrating GO and pathway analysis, data revealed that proliferation without control and invasive growth were the essential characteristic of glioma.PI3Ks are heterodimers comprised of a regulatory subunit (p85) and a catalytic subunit (p110).Activated receptor tyrosine kinases recruit the PI3 kinase complex to the membrane via the p85 regulatory subunit, thereby activating the catalytic subunit p110, which then phosphorylates phosphatidylinositol-4,5-bisphosphate (PIP2) to phosphatidylinositol-3,4,5trisphosphate (PIP3).PIP3 recruits protein AKT to the plasma membrane where AKT is phosphorylated at Thr308 and Ser473 (Cheng et al., 2009).A high frequency of mutations in PIK3CA, the gene encoding the p110alpha subunit of PI3K, was found in glioblastoma (Gallia et al., 2006;Kita et al., 2007).Our recent data showed that PI3K activity were greatly increased with the ascending of tumor grade and correlated positively with AKT2 expression (Wang et al., 2010).Activation of PI3K/Akt signaling cascade results in cell survival and proliferation as well as inhibition of cell apoptosis through regulating downstream targets.AKT contributes to glioma cell migration and invasion by regulating the formation of cytoskeleton, influencing adhesion and MMP2/9 expression (Pu et al., 2004;Zhang et al., 2009a;Zhang et al., 2010d).AKT promotes the cell cycle progression by suppression of cyclin-dependent kinase inhibitors p21 and p27 and increase of Cyclin D1 (Guillard et al., 2009;Koul et al., 2010;Pu et al., 2006).AKT inhibits cell apoptosis by inactivation of caspase pathway, and activation of BCL2, NFκB and mTOR signaling cascade (Jiang et al., 2009;Ruano et al., 2008;Zhang et al., 2010d).Further, prosurvival signaling by PI3K contributes to therapeutic resistance in the setting of established antiglioma therapies.
Several studies have shown that PI3K inhibition sensitizes glioma cells to radiation and chemical therapy (Opel et al., 2008;Prevo et al., 2008).Additionally, our study recently has showed that co-suppression of PI3K and AKT exerts significant proliferation and invasion inhibition effects on glioma cells (Fu et al., 2009).In the current study, we found that is PI3K is a molecular hub in glioma-related genes knowledge-driven network, and associated with a wide variety of cell biological functions and signaling pathways.Therefore, it is urgent to develop novel therapies for targeting PI3K/AKT signaling in glioma treatment.
In our case, network analysis also identifies a new candidate hub gene JAK2 in glioblastoma.JAKs, which have four members, JAK1, JAK2, JAK3 and Tyrosine kinase 2 (Tyk2) in mammals, are non-receptor tyrosine kinases involved in upstream intracellular signaling pathways that become activated after extracellular ligand binding to a variety of cytokine and growth-factor receptors (Pesu et al., 2008).JAK2 is known to be able to phosphorylate members of the signal transducers and activators of the transcription (STAT) protein family, subsequently leading them to translocate to the nucleus and bind to specific DNA sequences in the promoters of multiple responsive genes (Ghoreschi et al., 2009;Rane and Reddy, 2000).STAT family has been reported to be involved in the development of glioma.Of note, STAT3, is aberrantly activated in human glioblastoma tissues, and this activation is implicated in controlling critical cellular events thought to be involved in gliomagenesis, such as cell cycle progression, apoptosis and angiogenesis (Brantley and Benveniste, 2008).Recently, a glioma-specific regulatory network has revealed the transcriptional module that activates expression of mesenchymal genes in malignant glioma and STAT3 is one of key transcription factors necessary in human glioma cells for mesenchymal transformation (Carro et al., 2010).Additionally, nuclear staining of phospho-STAT5 is overexpressed in glioma tissues, and cytoplasm staining of STAT5b is markedly increased in glioblastoma multiforme compared with that in normal brain (Kondyli et al., 2010;Liang et al., 2009).Reduction of STAT5b inhibits glioma cell growth, cell cycle progression, invasion and migration through regulation of gene expression, such as Bcl-2, p21, p27 and VEGF (Liang et al., 2009).As another member of STAT family, STAT1 is upregulated in the majority of glioblastomas (Haybaeck et al., 2007).Little evidence exists to show the mechanism of JAK2 (upstream regulator of STAT family) involved in glomagenesis.However, data mining analysis displays that JAK2 occupies a core regulatory node of glioma-related genes knowledge-driven network.These data indicate that modulation of the mechanism responsible for JAK2 in glioma would help us to elucidate the development of glioma and inhibition of JAK2/STAT signaling could be used as a new therapeutic strategy to treatment glioma.The JAK/STAT pathway plays a central role in principal cell fate decisions, regulating the processes of innate immunity, adaptive immunity, cell proliferation, differentiation, and apoptosis.
In addition, we found the gene CTNNB1 (encoding -catenin) at the lower right corner of Fig. 1 would warrant further investigation.-catenin and Tcf-4 are the core components of the canonical Wnt/ -catenin/Tcf pathway, which is a crucial factor in the development of many cancers (MacDonald et al., 2009;Ying and Tao, 2009).-catenin accumulates in the nucleus, where it interacts with coregulators of transcription including Tcf-4 and Lef-1 to form a -catenin/Tcf/Lef complex.This complex regulates transcription of multiple genes involved in cellular proliferation, differentiation, survival and apoptosis, including Fra-1, cmyc and Cyclin D (Wang et al., 2002;Yochum et al., 2008).Recently several reports have showed that aberrant activation of Wnt/ -catenin/Tcf signaling pathway is an important contributing factor in gliomas (Liu et al., 2010;Pu et al., 2009;Sareddy et al., 2009).-catenin and Tcf-4 were up-regulated in glioma tissues in comparison to normal brain tissues.Knockdown of -catenin by siRNA in human glioma cells inhibited cell proliferation and invasive ability, induced apoptotic cell death and delayed the tumor growth (Pu et al., 2009).However, up to now, little direct evidence exists to show the mechanism of -catenin and Tcf-4 involved in gliomagenesis.Actually, our data doses not well confirm the update results of The Cancer Genome Atlas Network (TCGA) (Verhaak et al., 2010).TCGA catalogs recurrent genomic abnormalities in GBM, and describes a gene expression-based molecular classification of GBM into Proneural, Neural, Classical, and Mesenchymal subtypes.Aberrations and gene expression of EGFR, NF1, and PDGFRA/IDH1 each define the Classical, Mesenchymal, and Proneural subtypes, respectively.Despite of the differences of two studies, our data showed another approach to explore the mechanism involved in glioma using existing data.

MiRNA regulatory module in glioma
miRNAs are a new class of small, non-coding RNAs located in noncoding regions or the introns of the genome, and regulate gene expression by binding to the 3'-untranslated region (3'-UTR) of specific mRNAs.Extensive studies have indicated that miRNAs could function as oncogenic miRNAs or tumor suppressor miRNAs, playing crucial roles in carcinogenesis.Expression profiling of glioma has unveiled miRNA signatures that not only distinguish glioma from normal tissues, but can also differentiate histotypes or molecular subtypes with altered genetic pathways (Ciafre et al., 2005;Lavon et al., 2010).Our data mining analysis showed that 6 pathways involved in 14 glioma-related miRNAs, in line with the pathway analysis of glioma-related genes, indicating that glioma-related genes and miRNAs exert an effect on a common set of signaling pathways.Moreover, we found that the pathway regulatory control mediated by miRNAs differs from pathway to pathway and the targets of a specific miRNA are significantly enriched in multiple pathways.In p53 signaling pathway network, 12 miRNAs and 19 genes are involved.Among these miRNA and target gene relationships, MDM2, CDK6, CDKN2A and CCNE1 are successfully identified as direct targets of miR-221, miR-34a, miR-125b and miR-15b, respectively (Kim et al., 2010;Pogue et al., 2010;Sun et al., 2008;Xia et al., 2009).Our data recently showed that miR-221 and miR-222 directly modulate PTEN expression via targeting PTEN 3'-UTR (Zhang et al., 2010a).In addition, we have also evidenced that BBC3, also named p53 upregulated modulator of apoptosis (PUMA), is a new target of miR-221, consistent with bioinformatics analysis (Zhang et al., 2010b).Further, a recent publication revealed that miR-21 can impair p53-mediated apoptosis in response to chemotherapeutic (doxorubicin)induced DNA damage, therefore contributing to drug resistance in glioblastoma cells (Papagiannakopoulos et al., 2008).Thus, modulation of these p53-related targets by miR-21 may potentially explain previous observation that p53 signaling pathway were up-regulated in response to miR-21 knockdown (Frankel et al., 2008).These exciting results prompt us to further elucidate the intricacy of the interaction between miRNAs and the signaling pathway.
In conclusion, using data mining analysis, we construct glioma-related genes knowledgedriven network and show that PI3K and JAK2 hub signalings are key steps leading to oncogenesis in glioma, and further propose miRNA regulatory module in glioma.These data demonstrate the power of data mining strategies as tools for biological discovery and identify core signalings and miRNA regulatory module in glioma, suggesting that the application of this strategy to consolidate all existing data for other diseases may yield important discoveries in disease pathogenesis.

Experimental procedures
4.1 Natural language processing (NLP) system Medline/PubMed is used as information source for bioinformatics text mining.Medline abstracts were retrieved using National Center for Biotechnology Information (NCBI) PubMed portal.We queried Pubmed with: glioma[title] AND ("1980/01/01"[PDAT] : "2010/04/01"[PDAT]).All abstracts were downloaded as HTML text without images and then converted into XML documents.Sentence tokenization was performed with Lingpipe tools.Subsequent analysis is based on the sentence as the basic units.Gene mentions were tagged using ABNER (Settles, 2005).To solve the matter of plethora of gene aliases, all gene mentions were normalized to Entrez gene (http://www.ncbi.nlm.nih.gov/Entrez/)official gene symbols.Only sentences with glioma, the genes were selected.
In order to test the null hypothesis 'the relationship between glioma and the gene is random', hypergeometric distribution test was employed.Let N be the total number of PubMed abstracts and m, n be the number mentions in PubMed for glioma and a related gene, respectively.
The "glioma-gene" relations with P-value<0.05were then summarized and subjected to a relational database for further analysis.

Gene ontology analysis
Gene ontology analysis was performed by GSEA Base package of BioConductor (http://www.bioconductor.org/).The glioma-related genes were performed a gene set enrichment analysis based on the gene ontology (GO) categories.

Pathway analysis
Expression Analysis Systematic Explorer (EASE) (Hosack et al., 2003) was used to analyze KEGG pathways.Over representation of genes in a KEGG pathway is present if a larger fraction of genes within that pathway is differentially expressed compared with all genes in the genome.The "glioma-gene" relationships retrieved by our NLP system were filtered by pathway enrichment analysis.The links between glioma and related genes were then visualized in Cytoscape software (Cline et al., 2007) (http://www.cytoscape.org/).Genes were grouped according to pathways.Genes that involves in multiple pathways are assigned to a single pathway with the smallest enrichment P-value.

Gene network analysis
Integrating PubMed text mining, homology prediction, gene neighbor, protein-protein interaction, gene fusion and other data sources through the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING), we created glioma-related genes knowledge-driven network (von Mering et al., 2005).Linker genes below a P-value threshold of 0.01 were identified as "hubs".The results from the search are saved in data files describing links between two genes and then handled in Medusa software.

MiRNA-target network analysis
The overlap of target genes of glioma-related miRNAs predicted by computational tools and glioma related genes derived from NLP analysis was calculated.A bipartite network of microRNAs and corresponding target genes was constructed.The network was displayed in separated pathways.

Fig. 1 .
Fig.1.Visualization of glioma-related gene interaction network.(A) Connectivity analysis was performed using the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) to generate glioma-related gene knowledge-driven network, as described in Methods.Analysis revealed PI3KCA, PI3KCB and JAK2 are hub genes with P-value<0.001,which had an influential role in network stability.(B) PI3K and JAK2 hub signalings located at the key status of glioma-related gene knowledge-driven network, and exerted a wide effect on kinds of biological functions and pathways, including signaling transduction, MAPK pathway, growth factor, cell apoptosis, cell proliferation, cell adhesion and cell migration.Purple lines correspond to activation, blue lines to inhibition, and yellow lines to association.Red circles (PI3KCA, PI3KCB and JAK2) are indicated for hub genes.

Table 4 .
Set of signaling pathways with highly enriched genes. www.intechopen.com