A Graphic Method for Identification of Novel Glioma Related Genes

Glioma, as the most common and lethal intracranial tumor, is a serious disease that causes many deaths every year. Good comprehension of the mechanism underlying this disease is very helpful to design effective treatments. However, up to now, the knowledge of this disease is still limited. It is an important step to understand the mechanism underlying this disease by uncovering its related genes. In this study, a graphic method was proposed to identify novel glioma related genes based on known glioma related genes. A weighted graph was constructed according to the protein-protein interaction information retrieved from STRING and the well-known shortest path algorithm was employed to discover novel genes. The following analysis suggests that some of them are related to the biological process of glioma, proving that our method was effective in identifying novel glioma related genes. We hope that the proposed method would be applied to study other diseases and provide useful information to medical workers, thereby designing effective treatments of different diseases.


Introduction
Glioma is the most common and lethal intracranial tumor. It always revealed itself as malignant glioma which is usually divided into astrocytoma, oligodendroglioma, and oligoastrocytoma. Besides the classification based on histopathological features, glioma could also be graded on a WHO consensus-derived scale of I to IV by means of the degree of malignancy [1]. Clinically, most of the gliomas are highgrade gliomas (HGG). Glioblastoma (GBM), one of the HGG, accounts for more than half of gliomas [1]. Although the knowledge of glioma, especially HGG, has increased dramatically in recent years, many questions are still waiting for further elucidation. On the other hand, the overall 5year survival rate of GBM remains less than 5% despite the advances in surgery, radiation, and chemotherapy [2].
In the previous reports, glioma always manifested itself with disordered pathways which regulated proliferation, survival, invasion, and angiogenesis. Among these biological processes, RB and p53 pathways are more inclined to be dysfunctional in GBM, and the disrepair could lead to the destruction of cell cycle by regulating the G1-to-S-phase transition [3,4]. Furthermore, other pathways such as MAPK, PI3K/PTEN/AKT, and NF-B pathway are also overactivated in glioma and contribute to the uncontrollable cellular proliferation [5][6][7]. As we know, the tumorigenesis of glioma is a complicated process which involved intricate pathways beyond the above ones. To widely understand the mechanism underlying this disease, identification of its related genes and uncovering the relationship of them and the biological process of glioma are very important. However, it is timeconsuming and expensive to identify novel glioma related 2 BioMed Research International genes by conventional experiments. On the other hand, encouraged by the successful application of computational methods to deal with various biological problems such as drug design [8][9][10][11][12][13] and analysis of complicated biological pathway [14][15][16][17], computational methods may address this problem and provide some useful information for investigators.
In this study, we proposed a graphic method and attempted to apply this method to discover novel glioma related genes. The current known glioma related genes collected from various sources were the firsthand information. Based on these genes, some new discovered genes were obtained by the well-known shortest path algorithm. Furthermore, a permutation test was conducted to exclude false positives among them. The analysis of the final remaining genes suggests that some of them had direct or indirect relationship to the biological process of glioma, indicating that this method was effective and may give new insight to study other diseases.

Materials and Methods
2.1. Materials. The current known glioma related genes were retrieved from the following sources. (1) All the 11 data sheets listed on the web page of COSMIC (http://cancer.sanger.ac .uk/cancergenome/projects/census/) were downloaded, from which we obtained 18 glioma related genes; (2) search for human diseases in UniProt (http://www.uniprot.org/) with keywords "human glioma oncogene" and "human glioma suppressor gene, " thereby obtaining 49 and 32 genes (only reviewed genes were selected), respectively; (3) select "Literature Search" in TSGene (http://bioinfo.mc.vanderbilt.edu/ TSGene/search.cgi) and input "glioma" as keyword, obtaining 7 genes. After collecting all of the genes mentioned above, we finally obtained 77 glioma related genes, which were available in Supplementary Material I available online at http://dx.doi.org/10.1155/2014/891945.

Construction of a Weighted Graph from Protein-Protein
Interactions. Protein-protein interaction (PPI) is useful information for investigating various biological problems [18][19][20][21][22]. Many computational methods were proposed based on the fact that proteins that can interact with each other always share similar functions. Since the known glioma related genes must have some common features related to glioma, it is reasonable to discover novel glioma related genes based on protein-protein interaction and known glioma related genes. The data concerning proteinprotein interactions was downloaded from STRING (Search Tool for the Retrieval of Interacting Genes/Proteins, http://string.embl.de/) [23], a large database containing known and predicted protein interactions which are derived from genomic context, high-throughput experiments, (Conserved) Coexpression and Previous knowledge. In the obtained file, we extracted all protein-protein interactions of human. Each obtained interaction consists of two proteins and a score with range between 150 and 999, which can quantify the likelihood that an interaction may occur. For later formulation, let ( 1 , 2 ) denote the score of the interaction between two proteins 1 and 2 . The constructed graph took proteins occurring at least one protein-protein interaction of human as nodes, while two nodes were adjacent if and only if the score of the interaction between the corresponding proteins was greater than zero. The obtained graph consisted of 18,600 nodes and 1,640,707 edges. Furthermore, to correctly reflect the strength of the interaction, each edge with nodes V 1 and V 2 was labeled by a weight, which can be computed by where 1 and 2 were two corresponding proteins of nodes V 1 and V 2 , respectively.

Selection of Candidate Genes.
It is obvious that the glioma related genes must have some common features which are related to glioma. On the other hand, as mentioned in Section 2.2, two proteins that can interact with each other, that is, they are adjacent in the constructed weighted graph, always share common features. The idea of our method was based on these facts. To clearly elaborate the idea of the method, we constructed a simple weighted graph which is shown in Figure 1, because the original graph was too large to exhibit in the paper. It is easy to observe from Figure 1 that the shortest path connecting and contains , and as the inner nodes. Based on the weights of edges on this path, we can obtain that genes and can share common functions with high probability, because the confidence score of the interaction between and is very high, which is 1000 − 20 = 980. The similar results also hold for and , and , and . If genes and are two known glioma related genes, genes , , and are actual glioma related genes with high likelihood. In view of this, shortest paths between any pair of known glioma related genes obtained by Dijkstra's algorithm, the most famous shortest path algorithm proposed by Dijkstra in 1956 [24], are useful information for further investigation.
After obtaining the shortest paths connecting any pair of known glioma related genes, it can be seen that some nodes/ genes occurred in many paths, while most of nodes/genes in the graph were not contained in any path. Thus, for each node/gene in the graph, we counted the number of paths containing the node/gene, termed as betweenness which is defined as the number of shortest paths containing the node/gene as an inner node. The concept of betweenness has been employed in some studies of natural and manmade networks [25][26][27][28][29]. In fact, the betweenness of some node/gene reflects the direct and indirect relationship of the gene and known glioma related genes. Thus, the likelihood of genes with high betweenness to be related to glioma was higher than those with low betweenness. In view of this, we selected genes with betweenness greater than 0 as the candidate genes, which may be the novel glioma related genes with high probability. It is necessary to point out that the known glioma related genes were not included in the set of candidate genes.

Filtering Candidate Genes by Permutation Test.
As described in Section 2.3, some candidate genes can be obtained by researching the shortest paths connecting any two known glioma related genes. However, some of them may be false positives, because some nodes/genes may easily receive a high betweenness due to their location in the weighted graph even if we randomly select genes in STRING as the known glioma related genes. To exclude these false discoveries, a permutation test should be executed as follows.
(I) 1,000 node/gene sets, denoted by 1 , 2 , . . . , 1000 , were randomly selected in the weighted graph such that each of them had the same size of known glioma related gene set.
(II) For each candidate gene discovered in Section 2.3, calculate its betweenness on each set (1 ≤ ≤ 1000).
(III) Calculate the permutation FDR of each candidate gene by where was set to be 1 if the betweenness of on was larger than that of on the known glioma related gene set; otherwise, it was set to be 0.
Obviously, small permutation FDR of one candidate gene implies that it is the true positive with high probability.

Candidate Genes.
For the 77 genes mentioned in Section 2.1, we searched the shortest path connecting any two of them by Dijkstra algorithm. After calculating the betweenness of each node/gene in the weighted graph, 215 candidate genes with betweenness larger than zero were obtained. These 215 genes and their betweenness were available in Supplementary Material II. To exclude the false positives, the permutation test was conducted as described in Section 2.4. By (2), we can calculate the permutation FDR of each candidate gene. These values were also provided in Supplementary Material II. Since the likelihood of gene with small permutation FDR to be the actual glioma related gene is high, we set the threshold to be 0.05, that is, selecting genes with permutation FDRs lower than 0.05 among 215 candidate genes, thereby obtaining 67 genes, listed in Table 1. These genes were deemed to have strong relationship with glioma and further discussions were based on these genes.

Analysis of Enriched KEGG Pathways of Candidate
Genes. As mentioned in Section 3.1, 67 candidate genes were obtained. To analyze the relationship of them and glioma, we employed DAVID (Database for Annotation, Visualization and Integrated Discovery) [30], a functional annotation tool to understand biological meaning behind large list of genes. 67 candidate genes comprised the input gene list of DAVID, thereby obtaining 9 KEGG pathways that were enriched by these 67 candidate genes. The detailed output of DAVID for KEGG pathway enrichment analysis was available as Supplementary Material III.
The top 5 pathways have the value less than 0.05 which were discussed below. The most enriched pathway is hsa04360: axon guidance ("count" = 8). Among the 8 genes, 7 ephrins-related genes are enriched whose corresponding proteins include 4 members (EFNA3, EFNB1, EFNB2, and EFNB3) of the ephrins family and 3 members (EPHA1, EPHA4, and EPHA7) of the ephrins receptor subfamily. Eph receptor tyrosine kinases (Ephs) and ephrins (EPH) could navigate cells by controlling cell-cell adhesion and segregation [31]. In other words, with the function of axon guidance Ephs/EPH could regulate the invasion, neoangiogenesis, and metastasis of gliomas [32,33]. Ding and his colleagues have identified several somatic mutations of Ephs especially EphA7 in lung cancer [34]. Although the close connection between Ephs/EPH and cancer has been reported, its pathogenic mechanism in gliomas is still unknown. The second pathway is hsa04510: focal adhesion ("count" = 7). As we know, infiltration of tumor cells and angiogenesis are critical for the growth of tumor. Zagzag et al. reported that focal adhesion kinase (FAK), highly associated with these biological processes, plays an important role in tumorigenesis of gliomas via enhancing the ability of infiltration and angiogenesis [35]. The FAK-related genes, like CTNNB1, VEGFA, KDR, and FLT4 enriched in this pathway, are always mutated or aberrantly expressed in various types of cancers [36,37]. The third pathway is hsa04530: tight junction ("count" = 5). In the brain, the expression of the tight junction proteins is important for blood-brain tumor-barrier (BTB) permeability. Hence destruction of the tight junction could facilitate the development of gliomas by increasing BTB permeability [38,39]. The fourth pathway is hsa04520: adherens junction ("count" = 4). Adherens junction is reported to be disordered in the glioblastoma and to affect the invasive behavior of GBM [40,41]. The last significantly enriched pathway is hsa05200: pathways in cancer ("count" = 7). The result shows that a common mechanism is shared by the gliomas and other types of cancers. Although these significant enriched pathways have been reported to be related to gliomas more or less, our results might expand the avenues to explore new mechanisms in the tumorigenesis of gliomas.

Analysis of Enriched GO Terms Candidate Genes.
In addition to KEGG pathway enrichment analysis, DAVID also provided the GO terms enrichment analysis of the It can be seen that 227 GO terms were enriched by these 67 genes. Top 10 Go terms sorted by value are investigated and discussed as below. Among the top 10, 4 GO terms are biological process (BP) which included GO: 0007169: transmembrane receptor protein tyrosine kinase signaling pathway, GO: 0007167: enzyme linked receptor protein signaling pathway, GO: 0042127: regulation of cell proliferation, and GO: 0000904: cell morphogenesis involved in differentiation. From the results, we found that all these processes are connected with receptor-dependent signaling pathways. The cancer genome atlas (TCGA) group has revealed that the receptor tyrosine kinase (RTK) pathway was deregulated in 88% of the patients with glioblastoma [42]. After deregulation of the RTK, its downstream genes could function uncontrollably in the cellular proliferation and morphogenesis which are very pivotal for the growth of gliomas. In the top 10 GO terms, we also find 4 molecular function (MF) GO terms: GO: 0004714: transmembrane receptor protein tyrosine kinase activity, GO: 0005003: ephrin receptor activity, GO: 0046875: ephrin receptor binding, and GO: 0004713: protein tyrosine kinase activity. The MF classification also suggests the importance of RTK signaling pathways especially the Ephs/EPH pathway in the tumorigenesis of gliomas. As the previous reports, RTK pathways could regulate cell proliferation and migration which were indispensable for the development of gliomas [42,43]. Besides BP and MF GO terms, 2 cellular component (CC) GO terms are also enriched in the top 10 GO terms: GO: 0044459: plasma membrane part and GO: 0005887: integral to plasma membrane. As we know, the transformation of cell membrane is necessary for the migration and invasion process during tumorigenesis of gliomas. Our results pave the way for understanding potential pathogenic mechanism of gliomas.

Analysis of Some Candidate Genes.
Among the 67 genes, several genes are intriguing which may play pivotal role in tumorigenesis of glioma. This section gave the detailed discussion of some candidate genes.
VEGFA, also known as the vascular endothelial growth factor (VEGF), is a member of a large family of growth factors that also includes VEGFB, VEGFC, VEGFD, and placental growth factor (PLGF). VEGF is the only mitogen that specifically acts on endothelial cells and also a tumor angiogenesis factor in human glioma in vivo [44]. Knizetova et al. have demonstrated that the autocrine VEGF signaling is mediated via VEGFR2 (KDR), another gene in our list. They found that blockade of VEGFR2 would abrogate the VEFG-mediated enhancement of astrocytoma cell growth and viability [37]. In the in vivo level, Millauer et al. found that disrepair of VEGFR2/VEGF system in angiogenesis could prevent tumor growth in nude mice [45]. Another VEGF receptor found in our list is VEGFR3 (also known as FLT4). In contrast to VEGFR1/2, VEGFR3 does not bind VEGFA and mainly functions in lymphangiogenesis as a receptor of VEGFC and VEGFD [46,47]. Jenny et al. reported that VEFGR3 was expressed in some tumor types such as haemangioblastoma and glioblastoma, despite their lack of lymphatic vessels [48]. Although the roles of VEGF signaling pathway in the tumorigenesis of glioma have been well studied, new findings have been explored in succession recently. CTNNB1, with more famous name of catenin beta 1, encodes -catenin protein which plays important roles in cellular morphogenesis, differentiation, and proliferation via regulating the Wnt signaling [49]. Yano et al. induced glioma in rat using N-ethyl-N-nitrosourea display aberrant nuclear accumulation of -catenin in contrast to normal brains [50]. Moreover, Pu et al. found that the downregulation of -catenin by siRNA could suppress malignant glioma cell growth [36]. To elucidate the connection betweencatenin and glioma, Liu and his colleagues performed a systemic research. They found a higher expression level of -catenin in astrocytic glioma patients with high grade in comparison with the normal controls. Furthermore, they also illustrated that the overexpression of -catenin may be an important contributing factor to glioma progression by means of facilitating proliferation and inhibiting apoptosis [51]. After Wnt pathway is activated, -catenin accumulates and enters the nucleus where it can act as a coactivator for TCF/LEF-mediated transcription [52]. As the downstream of beta-catenin, LEF1, another gene in our list, also plays a crucial role in the Wnt signaling pathway. LEF1, with full name of lymphoid enhancer-binding factor 1, tends to be mutated in the tumors. Liu et al. have investigated that MiR-218 could reduce the invasiveness of glioblastoma cells by targeting LEF1 [53].
SRC, whose corresponding protein is a tyrosine-protein kinase, could play a pivotal role in the regulation of embryonic development and cell growth [54]. Besides functioning in the embryonic development, SRC could also regulate the tumorigenesis of various types of cancers like breast cancer, colon cancer, and brain cancer [55,56]. Src protein always maintains an inactive state until its Y530 residue is dephosphorylated by protein tyrosine phosphatase- [57]. Src could also be activated by direct binding of its SH2 and SH3 domains to intracellular proteins or activated tyrosine kinase growth factor receptors [58]. Stettner et al. have found elevated SRC activity in GBM compared with normal brain [59]. On the other hand, Lund et al. found that the infiltration of glioma reduced in Src-deficient mice [60]. It is reported that the increased SRC activity in GBM may be due to increased activation of cell surface growth factor receptors and integrins that activate SRC-family kinases (SFKs) rather than the amplification or mutation of SFK genes [42,61].

Conclusion
In biomedicine and genomics, identification of disease genes is an important topic. This contribution proposed a graphic method to identify novel disease genes and the method was applied to glioma, one kind of cancers. The findings indicate that this method is quite effective. It is hopeful that the contribution can provide help for medical workers to discover effective treatments of glioma and give new insight to study various diseases.