The role of protein interaction networks in systems biomedicine

The challenging task of studying and modeling complex dynamics of biological systems in order to describe various human diseases has gathered great interest in recent years. Major biological processes are mediated through protein interactions, hence there is a need to understand the chaotic network that forms these processes in pursuance of understanding human diseases. The applications of protein interaction networks to disease datasets allow the identification of genes and proteins associated with diseases, the study of network properties, identification of subnetworks, and network-based disease gene classification. Although various protein interaction network analysis strategies have been employed, grand challenges are still existing. Global understanding of protein interaction networks via integration of high-throughput functional genomics data from different levels will allow researchers to examine the disease pathways and identify strategies to control them. As a result, it seems likely that more personalized, more accurate and more rapid disease gene diagnostic techniques will be devised in the future, as well as novel strategies that are more personalized. This mini-review summarizes the current practice of protein interaction networks in medical research as well as challenges to be overcome.


Introduction
We have come a long way from "one-gene/one-enzyme/onefunction" concept originally framed by Beadle and Tatum [1]. They provided a basic explanation of how genes work at the molecular level, however, we now know that the picture is more complex. Biological processes inside our body are governed by the welldefined organization of proteins into complexes, which perform different functions acting as molecular machines [2]. Major biological processes such as immunity (antigen-antibody interaction), metabolism (enzyme-substrate interaction), signaling (interaction of messenger molecules, hormones, neurotransmitters with their cognate receptors), and gene expression (DNA-protein interactions), as well as the building of supramolecular assemblies (collagens, elastic fibers, actin filaments) and molecular machines (molecular motors, ribosomes, proteasome) were mediated through protein interactions. Studying the interactome, which is the whole set of molecular physical interactions between biological entities in cells and organisms, is essential in understanding how gene functions and regulations are integrated at the level of an organism [3].
The notion that, a disease is rarely a consequence of an abnormality on a single gene, but it is usually the result of complex interactions and perturbations involving large sets of genes and their relationships with several cellular components, has led to the development of the network based approaches to understand human disease [4]. Theoretical advances in network science and paralleling advances in high-throughput efforts to map biological networks have provided a conceptual framework with which we interpret large interactome network maps. Protein-protein interaction (PPI) networks are increasingly serving as tools to decipher the molecular basis of diseases. Furthermore, the sequencing of the genome and advances in proteomics leads to the identification of proteins of unknown functions. Interaction networks might give clues on the functions of these newly discovered proteins or on new functions of already identified proteins [5][6][7][8][9].
The promising applications of PPI networks to disease datasets are concentrated on four major areas: (i) the identification of genes and proteins associated with diseases, (ii) the study of network properties and their relation to disease states, (iii) the identification of disease-related subnetworks, and (iv) network-based disease classification [10]. Fig. 1 gives an example for the schematic representation of understanding disease-PPI network relationship using systems biomedicine approach.
Global understanding of networks will allow researchers to examine the disease pathways and identify strategies to control them. The integration of functional genomic and proteomic data to obtain dynamic network analysis will further improve the success of medical research.

The Role of Networks in Medicine
Networks provide a systems-level understanding of the mechanisms underlying diseases by serving as a model for data integration and analysis. They have been used to gain insight into disease mechanisms [11,12], study comorbidities [13,14], analyze therapeutic drugs and their targets [15][16][17], and discover novel network-based biomarkers [18,19].
Network science deals with complexity by "simplifying" complex systems to components (nodes) and interactions (edges) between them (Fig. 2). These simplifications help researchers make useful discoveries. Networks can be constructed purely based on gene expression information, including transcriptional regulatory networks [20] and co-expression networks [21], or can also be built upon prior knowledge of protein-protein interactions [22]. The nodes in a network representation are metabolites or macromolecules such as proteins, RNA molecules and gene sequences, while the edges are physical, biochemical or functional interactions. The resulting "interactome" network can serve as scaffold information to extract global or local graph theory properties, which lead to a better understanding of biological processes. Since cellular networks consist of various types of interaction and regulation, networks reflecting this complex scenario will provide better insight into the problem in hand.
Regulatory interaction networks, metabolic networks, signaling networks, and protein-protein interaction networks cannot be considered in isolation or as independent entities. Rather, we have to incorporate their intricate interwoven structure. Proteins might act alone or in combination: as transcription factors and regulators of protein abundances, as enzymes they catalyze and coordinate the basic cellular metabolic processes, and they react to external and internal stimuli activating other proteins in signaling cascades. All of these processes in turn provide cues that may lead to the formation or termination of protein interactions and complexes. PPI networks in particular have become a valuable resource in this context [23,24]. An example for a PPI network can be seen in Fig. 3 for psoriasis disease.
The current estimates suggest that the human interactome comprises approximately 130,000-650,000 protein interactions [25,26]; however, only a subset of these has been experimentally identified [24].
There are currently two widely used high throughput methodologies for large scale mapping of PPI. The yeast two-hybrid (Y2H) system [27] maps binary interactions. Mapping of direct interactions of protein complexes is carried out by affinity or immune-purification, followed by mass spectrometry (AP/MS) to identify protein constituents of these complexes as well as other protein-protein interactions [28]. It should be noted that these high-throughput methods are prone to a high rate  of false-positives and false-negatives, i.e. protein interactions which are identified by the experiment do not take place in the cell or interacting protein pairs cannot be identified by current experiment technology [29]. Sprinzak et al. [30] have shown that the false positive (FP) rate of high-throughput yeast two-hybrid assays is ∼ 50%. The rate of false negative (FN) is also very high [31]. Another downside of these technologies is that they are expensive and time consuming [32].
The reconstruction of interaction networks has been performed via three distinct approaches: (1) the manual curation of already existing data available in literature, usually obtained from one or just a few types of physical or biochemical interactions [33], (2) computational predictions based on available "orthogonal" information apart from physical or biochemical interactions, such as sequence similarities, gene-order conservation, co-presence and co-absence of genes in completely sequenced genomes and protein structural information [34], and (3) systematic and high throughput whole genome or proteome mapping strategies [35].
Studies employing these reconstructions ended up with more advances in network biology such as, studies of global relationships between human disorders, associated genes and interactome networks [36,37], predictions of new human disease-associated genes [38], analyses of network perturbations by pathogens [39], and emergence of node removal versus edge specific or "edgetic" models [40] to explain genotype-phenotype relationships [41]. The global topological analysis of PPI networks is increasingly being replaced by more detailed analyses of binary interactions, their determinants, characteristics, and effects [42,43]. In the first stage such analyses will inform us about the functional role of particular interactions, and later they will hopefully enable us to predict PPI and their roles in silico.
PPI networks also include small interwoven modules inside them. These modules contain functional information on complex biological networks and interaction between these proteins comprises information related to biological processes of the interactants. Using networking approaches to study biological problems can provide an intuitive picture or useful insights to help analyze complicated relations in these systems [2]. Because PPI networks are large and complex, it becomes necessary to develop efficient and biologically meaningful algorithms for their modular analysis. Graphlets (small induced subgraphs of large networks) can be applied to analyze the modular characteristics of PPI Networks [44][45][46].
We also have to keep in mind that molecular networks exhibit dynamic responses to both internal states and external signals. Ultimately, health or disease states emerge from an individual's integration of these internal and external signals [47]. PPI networks are also dynamic. They change over seconds as cellular processes require protein complexes to assemble and disassemble, and also evolve over millions of years as interactions and genes are gained and lost. It is thus crucial to understand how these dynamic and evolutionary changes operate, in order to grasp how cellular machineries function and have been shaped during evolution. In recent years such properties of PPI's and networks have been amenable to study, thanks to the ever-increasing body of data on interactions, atomic structures and mRNA expression levels [48].

Strategies to Identify Disease Genes
High-throughput gene expression profiling technologies have allowed researchers to characterize molecular differences between healthy and disease states which has led to the identification of an increasing amount of disease related genes. PPI network based approaches have proved to be useful in enlightening relevant disease related biological processes, and identifying candidate disease genes. Using a detailed interaction map of a given disease can help in elucidating its mechanism and can suggest potential points for biomarkers or drug targets [49].
The analysis of disease networks has also permitted the discovery of novel classes of therapeutic targets not amenable to classical drug-like compounds but reachable through novel bioactive compounds [50].
Disease genes tend to be highly expressed and have tissue-specific expression patterns and a higher mutation rate over evolutionary time [51]. With this information at hand different studies have been performed on omics platforms at various levels to identify, predict or prioritize disease genes.
Wu et al. [52] have used gene classification strategies to detect whether a gene is disease-associated or not. They have explored gene classification with topological features by using k-shell decomposition and the average distance to the center (essential genes) to analyze the hierarchy structure of their PPI network. Assuming essential genes as the center of the network, the disease genes appeared closer to the center than other genes. Their findings concluded that compared with other genes, both disease genes and non-essential disease genes are topologically more important. Nguyen and Ho [53] enriched the disease gene prediction by incorporating known disease genes with neighbors of disease genes and integrating multiple data sources in their semisupervised Learning scheme. It is also feasible to use gene networks to prioritize positional candidate genes in various heritable disorders with multiple associated genes. Franke et al. [54] proposed methods to integrate gene networks with various genetic studies which can be useful in identifying disease genes. Köhler et al. [55] have investigated the hypothesis that global network-similarity measures are better suited to capture relationships between disease proteins than are algorithms based on direct interactions or shortest paths between disease genes. They have presented a novel method for candidate-gene prioritization based on the random walk method, which they used to calculate a score reflecting the global similarity of candidate genes to known members of a disease-gene family. Schlicker et al. [56] introduced a novel approach for ranking candidate genes for a particular disease based on functional comparisons involving the Gene Ontology. Lee et al. [57] analyzed a large-scale, human gene functional interaction network (dubbed HumanNet). They showed that candidate disease genes can be effectively identified by GBA (guilt by association) in cross-validated tests using label propagation algorithms related to Google's PageRank. They resolved the issue of GBA working poorly in genome-wide association studies by explicitly modeling the uncertainty of the associations and incorporating the uncertainty for the seed set into the GBA framework. Re and Valentini [58] proposed a novel semisupervised method to rank genes with respect to cancer modules using networks constructed from different sources of functional information, not limited to gene expression data. Aerts et al. [59] described a bioinformatics approach which generates distinct prioritizations for multiple heterogeneous data sources, which are then integrated, or fused, into a global ranking using order statistics. Navlakha and Kingsford [60] have found that random-walk approaches individually outperform clustering and neighborhood approaches by examining the performance of seven recently developed computational methods.

Challenges and Limitations of PPI Networks
There are still challenges regarding PPI network based approaches stemming from the huge body of information that seems to be getting bigger by the day.
Providing maps of PPI networks using systematic and standardized approaches and assays that are as unbiased as possible is a grand challenge of network biology.
There is an impressive amount of data on sequence alterations and biomolecular profiles (mRNA expression, miRNA and noncoding RNA profiling, proteomics, and metabolomic measurements) for many human diseases, which can be accessed from specialized databases and publications. However, we still have not succeeded in translating this wealth of information into actionable knowledge about disease pathogenesis for the development of better strategies for disease prevention, diagnosis, and treatment. Progress is limited by the difficulties in assessing the functional consequences of disease-associated sequence variants and understanding how phenotype is affected by the combined effect of environmental and genomic variation [24].
Though high-throughput gene expression profiling has permitted the characterization of molecular differences between healthy and disease states, a clear limitation of these approaches is that they often deal with data about single players (i.e. changes in the expression of individual genes). Novel strategies should integrate systemic information to contextualize the differential expression patterns observed [51].
Another limitation of network biology is the coverage and quality of interactome data. The data incompleteness of the human PPI network poses limitations to any study of network properties of disease genes. Also, the availability of condition-specific interactomes that are more representative of the interactions of the proteins in a given tissue or under certain conditions will improve the significance of such analysis. Differential network mapping should offer more insight into the network rewiring that occurs during disease. This approach will enable monitoring changes in the role of the individual nodes (hubs, bottlenecks, etc.) and changes in the global topology of the PPI network, as well as how all these alterations correlate with cell function. By fully characterizing network rewiring in disease, a deeper understanding of how sequence variation shapes cellular networks and leads to observed phenotypic changes would be gained [24].
Although the amount of protein interactome data obtained by highthroughput protein interaction techniques is increasing rapidly, they contain a significant proportion of false positives and false negatives, which exhibits the need to prioritize the interactome reported in such assays for further validation. Computational analysis techniques for assessing and ranking the reliability of binary PPI are highly desirable [61]. Though many methods have emerged for ranking the reliability of protein interactions reported by high-throughput assays, they are mostly successful in detecting false positives. However, these assays are also known to produce a large number of false negatives. There is an immediate need for methods that detect false negatives. The identification of false negatives is equivalent to the problem of predicting new protein interactions [62].
Another challenge in integrating physical and genetic maps is to reconcile the variety of interaction types (i.e. genetic and physical interactions) that are currently available [63]. Complex diseases stem from interplay of genetic and environmental factors, which complicate the study of human diseases, since it is difficult to create a controlled environment that enables scientists to study environmental effects on disease development [64].
Researchers are increasingly using model organisms such as fruit flies, mice, and zebrafish to examine human diseases because they are easy to grow, dissect, and genetically manipulate in the laboratory. The issue of the transfer of phenotypic information from animal models to human is also a challenge [65,66].

Concluding Remarks and Future Work
We are still far from unraveling the molecular mechanisms of most diseases, thus developing effective methods to uncover disease genes remains a great challenge.
First and foremost, the construction of a more reliable PPI network for human genes is a must. This might be achieved by assembling a large PPI dataset with the inclusion of all or most of the major and minor PPI databases and ranking the interactions according to a well established scoring system. This also means less false positives and false negatives.
In conjunction with the use of PPI networks, studies evaluating the correlation between human disease genes for a specific disease and cross-comparison of those disease genes to other human diseases are needed for a deeper understanding of the relationship of disease genes within a disease and between diseases.
Understanding specific molecular pathways unique to a specific disease and elucidation of differences in molecular pathways for different diseases will help us recognize the similarities and differences in those pathways which will lead to a better understanding of the relationship between human diseases.
Constructing a 'linkage network' between diseases that are linked to each other by one or more than one gene is also another great way to associate human diseases by a simplified network.
It is also important to identify the 'triggers', key disease genes that initiate a specific disease, and constructing a 'triggers network' to further the unveiling of the disease mechanisms. This will aid us in the optimization of primary prevention of preventable human diseases and also lead to the discovery of biomarker or biomarker groups for early diagnosis and intervention of human diseases. Studies to evaluate individualized therapy that could lead to personalized medicine for patients is also needed.
The impact of outside effects to the progress of human diseases and the risk factors for the initiation and advancement of human diseases also need to be analyzed for a better evaluation of each disease.
The never ending information flow will revolutionize how we look at human diseases and disease networks for many years to come. The prospects of predictive and personalized medicine are enormous. The future holds the cures and the medicine.