Algebraic and topological indices of molecular pathway networks in human cancers.

Protein-protein interaction networks associated with diseases have gained prominence as an area of research. We investigate algebraic and topological indices for protein-protein interaction networks of 11 human cancers derived from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. We find a strong correlation between relative automorphism group sizes and topological network complexities on the one hand and five year survival probabilities on the other hand. Moreover, we identify several protein families (e.g. PIK, ITG, AKT families) that are repeated motifs in many of the cancer pathways. Interestingly, these sources of symmetry are often central rather than peripheral. Our results can aide in identification of promising targets for anti-cancer drugs. Beyond that, we provide a unifying framework to study protein-protein interaction networks of families of related diseases (e.g. neurodegenerative diseases, viral diseases, substance abuse disorders).


Introduction
Biological networks have been an active area of research for some years, see e.g. [12,16,17] and the references therein. In earlier work [6] we reported that molecular signaling network complexity is correlated with cancer patient survival. In that work we reported a statistical mechanics measure of network complexity. Here we focus on the relative sizes of automorphism groups and the dimensions of the cycle spaces (cyclomatic numbers).
Complex real-world networks contain feedback loops to enable the network "communication" to continue in the face of node failure [2]. In the case of protein-protein interaction (PPI) networks this means that inhibition of a specific node may or may not have any effect. It is well known that targeting hub nodes in networks often causes the network to break up into multiple components and this could be lethal, because many protein hubs in PPIs for cancer are also important proteins in metabolic networks. As we argued in [6], targeting nodes with high-betweenness has higher potential for improved cancer treatment. Selective targeting of nodes in a PPI for cancer treatment is fraught with difficulties.
In this letter we apply two more algebraic and topological indices to study cancer PPI networks and show correlation with 5 year patient survival. We identify several repeated motifs of proteins that are "interchangeable" in a sense to be specified below. In the long run we anticipate that the methods described here will aid identification of potential drug targets.

Results and Discussion
A network is an undirected graph G = (V, E) with vertex set V and edge set E. The vertices are proteins and two vertices are connected by an edge if there is a known interaction of the two partners, either by direct binding or by enzymatic catalysis. Beyond cancer pathways, the Kyoto Encyclopedia of Genes and Genomes (KEGG) database also contains pathways related to immune diseases (e.g. asthma), neurodegenerative diseases (Alzheimer's disease, Parkinson's disease), substance dependence, cardiovascular diseases, viral diseases and many others [13]. The KEGG networks are assembled from the literature by searches for experimental confirmation of the relevant interactions. Each interaction is always confirmed by two or more different experimental techniques such as pull-down mass spectrometry, yeast two-hybrid and various biochemical tests. Naturally, networks constructed from experimental results are likely to contain errors, which are however impossible to quantify.
An automorphism is a permutation φ : V → V that preserves the adjacency relation, that is, With the operation of composition, the automorphisms form a group Aut(G).
The relation on the set of vertices is an equivalence relation and its equivalence classes are called the (group) orbits. MacArthur et al. [18] list 20 examples of real world networks and their rich symmetry groups. This is in contrast to large random graphs, such as graphs from the Erdős-Rényi model G(n, p). Here n is the number of vertices. Edges are independently present with probability 0 < p < 1. Such graphs have only the trivial automorphism, with probability approaching one, in the limit n → ∞ [5, Chapter IX]. The difference is not surprising if one realizes that real networks display a modular structure, with vertices organized in communities tightly connected internally and loosely connected to each other [10]. This results in the presence of symmetric subgraphs such as trees and complete cliques. Figure 1 shows as an example the protein-protein interaction network of pancreatic cancer as retrieved from the KEGG database. We find that the automorphism group of this network is the direct product of symmetric groups see Table 1 for a complete list of automorphism groups. Remarkably, symmetries do not only arise due to tree subgraphs at the "ends" of the network, but also due to central nodes of high degree (highlighted in yellow in Figure 1). Thus any flow of information that passes through one node in such an orbit equivalence class may pass through any other node in the same equivalence class. The presence of such modular patterns indicates a high level of redundancy which confers robustness to the associated biological system (tumor cells). We suggest that to interrupt the flow through such a network most efficiently, the nodes adjacent to large central orbits are the best to be targeted for example by pharmacological agents that inhibit a specific protein-protein interaction pair. Similar suggestions have been made in [7,8,22]. The use of automorphism groups has, to the best of our knowledge, not yet been proposed. Automorphism groups are often used to measure the complexity of a network [23]. In order to make automorphism group sizes of graphs with n = |V | vertices comparable, we follow the suggestion in [23] and compute the ratio This relates the size of Aut(G) to the size of the automorphism group S n of the complete graph on n vertices. A second graph invariant is of a more topological nature. A cycle is a sequence of adjacent vertices that starts and ends at the same vertex. The set of all cycles C(G) can be made a vector space over the field Z 2 by taking the symmetric difference as addition, the identity as negation, and the empty cycle as zero. The dimension of this vector space is called the cyclomatic number µ G , or the circuit rank. Loosely speaking, it is a count of the "independent" loops, see Figure 2. It is shown in [4,15] that for a graph with n vertices, m edges and c connected components, µ G is given by We plot these two indices against the five year survival probability p, obtained from the Surveillance, Epidemiology and End Results (SEER) database [20] for 11 types of cancer in Figure 3. Interaction networks with larger values of β G or equivalently greater symmetry are associated with better chances of survival. A large value of µ G indicates high topological complexity and correlates with decreased chance of survival. We find that both coefficients of determination are R 2 = 0.52 with corresponding p-value p = 0.011 (the equality is coincidental). There are widespread differences in detection stage, metastasis status, treatment and general health of the patient which are unfortunately not accessible from the SEER database. Nevertheless, given this large amount of natural uncertainty in the data, this indicates a strong correlation of averages. It would be invaluable for future research to classify database entries according to some of the parameters mentioned above. Since both the automorphism group size β G and the cyclomatic number µ G are correlated to the five year survival probability p, it is to be expected that these two quantities are correlated to each other, for the the protein-protein interaction networks of cancers that are the object of our study, see Figure 4, left panel. However, it is easy to construct examples of graphs that show no correlation between β G and µ G , see Figure 4, right panel.
Further study of the automorphism groups reveals repeated motifs in several interaction networks. The eight proteins from the PIK3C{A,B,D,G} and PIK3R{1,2,3,5} family form a single orbit equivalence class in seven of the networks (AML, CML, colorectal, endometrial, pancreatic, renal and SCL cancers) and are split in two orbit equivalence class in two more networks (glioma and NSCL). The three proteins AKT{1,2,3} are orbit equivalent in eight networks (CML, colorectal, endometrial, glioma, NSCL, pancreatic, renal and SCL cancers), that is, whenever they appear in the network to begin with. These players have been known for a long time to be of crucial importance to the initiation and progression of cancer, mainly due to the various biological and biochemical assays performed on cancer cells. However, our conclusions stem directly from a group-theoretic analysis of the PPI networks and they are network specific. Since its initial discovery as a proto-oncogene, the serine/threonine kinase AKT has become a major focus of attention because of its critical regulatory role in diverse cellular processes, including cancer progression and insulin metabolism. The AKT cascade is activated by receptor tyrosine kinases, integrins, B and T cell receptors, cytokine receptors, G-protein-coupled receptors and other stimuli that induce the production of phosphatidylinositol (3,4,5)-triphosphates (PtdIns(3,4,5)P3) by phosphoinositide 3-kinase (PI3K). These lipids serve as plasma membrane docking sites for proteins that harbor pleckstrin-homology (PH) domains, including AKT and its upstream activator PDK1. The tumor suppressor PTEN is recognized as a major inhibitor of AKT and is frequently lost in human tumors. There are three highly related isoforms of AKT (AKT1, AKT2, and AKT3), which represent the major signaling arm of PI3K. For example, germline mutations of AKT have been identified in pathological conditions of cancer and insulin metabolism. AKT regulates cell growth through its effects on the TSC1/TSC2 complex and mTOR pathways, as well as cell cycle and cell proliferation through its direct action on the CDK inhibitors p21 and p27, and its indirect effect on the levels of cyclin D1 and p53. AKT is a major mediator of cell survival through direct inhibition of pro-apoptotic signals such as the pro-apoptotic regulator BAD and the FOXO and Myc family of transcription factors. AKT has been demonstrated to interact with Smad molecules to regulate TGF-β signaling. These findings make AKT an important therapeutic target for the treatment of cancer.
Interestingly, the network of small cell lung cancer contains an enormous orbit of 18 equivalent nodes of degree six. This orbit consists of laminines, collagens and a fibronectin that are major proteins in the basal lamina. All nodes are connected to six members of the integrin family of transmembrane receptors.

Conclusion
We have shown that the relative size of the automorphism groups and the cyclomatic numbers for cancer pathway networks from the KEGG database are both correlated with five-year survival of cancer patients. Determination of the specific reasons for these great discrepancies in survival rates remains a topic for future research. Interestingly, cancers with more symmetric interaction networks are associated with better survival rates. This may be due to a greater robustness to failure, which, somewhat counterintuitive, is a positive feature in this context.
We suggest that selective removal of nodes from the network (clinically equivalent to protein inhibition) and reinterpolation on the linear curves helps to identify potential drug targets. This indicates that complexity of a biochemical network involved in a deregulated cell cycle as exemplified by cancer cells is of crucial importance to its robustness. This is manifested by various redundancies in the PPI network that make the search for a therapeutic "silver bullet" an impossible task. We suggest that selective removal of nodes from the network (clinically equivalent to protein inhibition) and reinterpolation on the linear curves helps to identify potential drug targets. We have shown that PI3K and AKT families of proteins appear to be the most suitable targets for pharmacological inhibition in the most number of cancer types studied. It is encouraging that there are several AKT pathway inhibitors in clinical development, e.g. perifosine (KRX-0401, Aeterna Zentaris/Keryx), MK-2206 (Merck), and GSK-2141795 (Glaxo-SmithKline) [3]. Similarly, Bayer, GlaxoSmithKline (GSK), Novartis, Merck & Co., Roche and Sanofi are just a few of the companies that have placed great importance on the development of a spectrum of agents targeting the PI3K pathway. Drug candidates including pan-PI3K inhibitors, PI3K isoform-specific inhibitors, AKT inhibitors and mTOR inhibitors are currently tested alone and in combinations in an array of cancer indications [11]. While the motivation for this focus has been stated as: The pathway is almost invariably on in cancer, our methodology identifies this pathway as the most crucial using mathematical analysis of the network. Moreover, we are able to identify those types of cancer where the pathway should be the main target and those types where targeting it may not produce the expected clinical outcomes.

Methods
The cancer pathways were obtained from the Kyoto Encyclopedia of Genes and Genomes (KEGG) [13] with the help of the open source software packages KEGGgraph [24] and cytoscape [21]. The automorphism groups of the networks were found with saucy [14] and gap [9] (see Tables 2-12 for the complete group lists). Bases of the cycle spaces were found using python networkX.  Figure 1: The protein-protein interaction network of pancreatic cancer. The network was retrieved from the KEGG database [13] and its automorphism group determined with saucy [14], namely Aut(G) = S 9 2 × S 6 3 × S 2 5 × S 8 . Highlighted in yellow are three central orbits of nodes of degrees 3, 4 and 8, respectively. Two of these are the PI3K and the AKT families, respectively.  Table 1: Automorphism groups of all cancers. Column n contains the number of vertices, column c contains the number of connected components of the proteinprotein interaction network. The abbreviations are the same as in Figure 3.