A Guide to Conquer the Biological Network Era Using Graph Theory

Networks are one of the most common ways to represent biological systems as complex sets of binary interactions or relations between different bioentities. In this article, we discuss the basic graph theory concepts and the various graph types, as well as the available data structures for storing and reading graphs. In addition, we describe several network properties and we highlight some of the widely used network topological features. We briefly mention the network patterns, motifs and models, and we further comment on the types of biological and biomedical networks along with their corresponding computer- and human-readable file formats. Finally, we discuss a variety of algorithms and metrics for network analyses regarding graph drawing, clustering, visualization, link prediction, perturbation, and network alignment as well as the current state-of-the-art tools. We expect this review to reach a very broad spectrum of readers varying from experts to beginners while encouraging them to enhance the field further.


INTRODUCTION
While most recent review articles focus on biomedical and biological networks and their applications (McGillivray et al., 2018;Sonawane et al., 2019;Yue et al., 2019), in certain case studies, familiarity with the graph theory concepts behind these networks is often missing. The aim of this review is to tackle questions raised by today's increasing demands and aid researchers in understanding the graph theory behind the biomedical networks as well as concepts such as visualization, annotation, management, clustering, integration, etc. To do this, we start with an introduction about graphs (in discrete mathematics) and their different types and we further describe the various data structures and file formats for storage and representation. In addition, we discuss several topological features and network properties, as well as concepts such as graph clustering, clustering comparison, network alignment, motif detection, and edge prediction. We further comment on the various layout and graph drawing techniques as well as on methods regarding network alignment and link predictions and we highlight the state-of-the-art tools for analyzing such networks. Finally, we try to bring graph theory into a biomedical context by providing a thorough description about the different types of biomedical networks and the sources used for their construction. We hope this review becomes a useful handbook for readers regardless of their scientific background and help non-experts in handling and interpreting networks more easily.
In general, networks or graphs (mathematical way of representing a network) are used to capture relationships between entities or objects. In a typical representation, a graph is composed of a set of vertices/nodes/points, connected with edges/ lines/links/arrows/arcs. Examples of networks which we interact with in everyday life include the electricity grid, road maps, the world wide web, the internet, airline connections, citation and language networks, telecommunication channels, social networks, economic networks, and many others. Graph theory has been the established mathematical field for the study and the analysis of such networks and is applicable to a wide variety of disciplines, ranging from mathematics, physics, computer science, engineering, and sociology to biology and medicine (Junker and Schreiber, 2008;Pavlopoulos et al., 2011a). In the biomedical field for example, many biological networks consist of molecules such as DNA, RNA, proteins and metabolites, and graphs can be used to capture the interactions between these molecules. Therefore, it is essential to know the various network types which can be used, in order to be able to communicate and visualize such interactions.
Starting with the basic notions, in mathematics, a set A = {a 1 , a 2 , a 3 , ...a n } is a collection of objects a 1 , a 2 , a 3 , ...a n , whereas a graph G = (V, E) is composed of a set of vertices V and a set of edges E. A subgraph G ′ = (V ′ , E ′ ) of the graph G = (V, E) is a graph where V ′ is a subset of V and E ′ a subset of E. While one graph can have multiple representations, two different graphs may be isomorphic if they contain the same number of vertices connected in the same way. Examples are shown in Figures 1A-C. There are various graph categories. The most known are undirected, directed, weighted, bipartite, multi-edge, hypergraphs, and trees.
A graph is undirected if there is a single connection defined as E = {(i, j)|, i, j ∈ V} between vertices i and j. In such case, vertices i and j are called direct neighbors (e.g., gene coexpression network).
A graph is called directed if an edge between vertices i and j is represented by an arrow, thus indicating a direction from vertice i to vertice j or vice versa. A directed graph is defined as an ordered triple G = (V, E, f ) where f is a function that maps each element in set E to an ordered pair of vertices in V (e.g., pathway).
Notably, in biology there are a number of directed relationships which can be graphically shown as different arrow types toward a semantic approach (e.g., food web). For example, "inhibits, " "enhances, " "regulates" etc. Standards for arrow usage are described in the Systems Biology Graphical Notation (SBGN) visual language (Le Novère et al., 2009).
A weighted graph is defined as a graph where E is a set of edges between the vertices i and j (E = {(i, j) | i, j ∈ V}) associated with a weight function w : E → R, where R denotes the set of all real numbers. Most of the times, the weight w ij of the edge between nodes i and j represents the relevance of the connection (e.g., sequence similarity network).
A bipartite graph is an undirected graph G = (V, E) in which vertices in V can be partitioned into two sets V ′ and V ′′ such that (i, j) ∈ E implies either (i ∈ V ′ and j ∈ V ′′ ) or (j ∈ V ′ and i ∈ V ′′ ) (e.g., gene-disease networks). In other words, any vertex from set V ′ can be connected to any other vertex from set V ′′ but no edges between vertices within the same set (V ′ or V ′′ ) are allowed.
A graph is called multi-edge if it contains multiple edges or otherwise parallel edges that are incident to the same two vertices (e.g., knowledge/integration networks). A simple graph for example, has no multiple edges.
A hypergraph consists of a set of vertices V and a set of hyperedges E where an edge can join any number of vertices (e.g., biochemical networks).
A tree is an undirected graph in which any two vertices are connected by exactly one path, or equivalently a connected acyclic undirected graph (e.g., ontologies, phylogenies).

Examples of the various graph types are shown in Figures 1D-L.
A graph is connected if there is a path from any point to any other point in the graph. In a complete graph, every pair of distinct vertices is connected by a unique edge.
A cluster ( Figure 1M) is a graph formed from the disjoint union of complete graphs and a clique ( Figure 1N) in an undirected graph is a subset of vertices such that every pair of vertices in the clique is connected.

DATA STRUCTURES AND REPRESENTATIONS
A network can be stored as (i) adjacency matrix, (ii) adjacency list, or (iii) sparse matrix. In graph theory, an adjacency matrix A is a square matrix of size N × N (where N is the number of vertices) used to represent a graph. In the case of a simple graph, the adjacency matrix is a (Sabidussi, 1966;Yue et al., 2019)-matrix with zeros on its diagonal (A[i,j] = 1 for connection presence, A[i,j] = 0 for connection absence) or a (0,w ij )-matrix for a weighted graph where w ij is the edge weight between two nodes (A[i,j] = w ij ). In both undirected simple and weighted graphs, the adjacency matrix is symmetric (equal to its transpose-rows and columns are the same). In the case of directed graphs, the matrix is not symmetric, thus differentiating its upper triangular part from its lower triangular part (ij is not the same as ji). An overview of adjacency matrices and their representations are shown in Figures 2A-C. Bipartite graphs, as opposed to generic networks, have their own characteristics . One major property is that any bipartite graph can be presented as two biadjacency matrices (or otherwise projections). While in an original bipartite graph, vertices which belong to a set are not connected to each other, in its biadjacency form they are connected through nodes that belong to the other set (indirect connections). This concept is described in Figures 2D,E, whereas an extensive review about their biomedical application can be found elsewhere .
Adjacency matrices are memory inefficient for storing larger sparse networks as they require O(V 2 ) memory. Notably, the O notation in graph theory is a theoretical measure to classify algorithms according to how their running time or space requirements grow as the input size grows (Knuth, 1997). Let's assume that in a gene co-expression network, one wants to store an all-vs.-all matrix with all pairwise human gene similarities (V = ∼20,000 genes). This would require 400,000,000 bytes to be stored in memory (381 MB RAM) or 4×20,000 2 1,024 3 = 1.49 GB for float/integer numbers (for 4 byte integers and floats). To partially overcome this barrier, a simple approach would be to take advantage of the adjacency matrix symmetry by only storing the upper triangular part in an array B in a linear form (Figures 2F,G). The mapping between element coordinates in the two forms is given by the formula where N is the number of vertices ( Figure 2G). The linear representation B requires V(V−1) 2 memory which is half the size compared to the memory needed for a complete adjacency matrix A.
For sparse networks, adjacency lists are proposed as an alternative data structure. An adjacency list is an array A of separate lists. Each element of the array A i is a list, which contains all the vertices that are adjacent to vertex i. If the graph G is weighted, then each item in the adjacency list is either a two-item array or an object, giving the vertex number, and the edge weight ( Figure 2H). Adjacency lists require much less space O(V + E) compared to the space required by the adjacency matrix O(V 2 ). Moreover, finding all vertices adjacent to a given vertex in an adjacency matrix representation, requires O(V) time, whereas in an adjacency list such operation is as fast as reading the corresponding list (smaller length).
An alternative to the adjacency list, is the use of a sparse matrix data structure. In such case only the non-zero elements are kept along with their coordinates and everything else is discarded as non-informative. An example of such a data structure is shown in Figure 2I where the first row keeps the i coordinate for each element in A[i, j], the second row the In the projected network colored as green, node V 1 for example is connected to node V 2 through node node V 4 . (F) The upper triangular part of the adjacency matrix. (G) The upper triangular part of the adjacency matrix in a linear form. Element A[2,3] = 0.9 in the adjacency matrix is element B[10] = 0.9 in the linear form. (H) The graph presented as an adjacency list. Each vertex is accompanied by a list containing all other vertices adjacent to it. (I) A data structure for efficiently storing sparse matrices with many zeros. The first two rows indicate the coordinates in an adjacency matrix, whereas the third column contains the connection weight.
j coordinate in A[i, j] and the third row the weight w ij . In the case of unweighted simple graphs (referring to the default value which equals to 0, indicating that no link exists), the third row can be completely skipped, remembering that w ij is always one.

GENERAL NETWORK PROPERTIES
As degree deg i , we define the total number of edges adjacent to a vertex. In the case of a directed graph we distinguish between the "indegree" (deg in i ) and "outdegree" (deg out i ). The indegree refers to the number of arcs, incident from the vertex, whereas the outdegree to the number of arcs incident to the vertex. In a social network for example, the indegree would represent the followers, whereas the outdegree the people one follows. The total degree in a directed graph is the sum of the indegree and outdegree showing all connections (both followers and followed people). The average degree of the network is Figure 3A). Looking at all nodes in a network, in order to study the degree distribution p(k), we consider the probability that a randomly selected vertex has degree equal to k. The same information can also be found as cumulative degree distribution p c (k) which shows the a-posterior probability of a randomly selected vertex to have degree larger than k. Notably, the degree distribution is one of the most important topological features and is characteristic to different network types. In the simplest case, p(k) can be estimated by a histogram of degrees. An example is shown in Figure 3B. Networks, whose degree distribution follow a power law, are called scale-free networks. The closeness centrality in blue, the betweenness centrality in red and the eccentricity centrality in orange. The graph consists of 6 nodes and 5 edges. Closeness centrality calculation example: Node V 1 accesses nodes V 2 , V 4 , V 5 , V 6 with step 1 and node V 3 with step 2. Therefore, its closeness centrality is calculated as 5 4×1+2×1 = 5 6 = 0.833. Betweenness centrality calculation example: Since all nodes are accessible through any other node, there are N(N − 1) = 6 × 5 = 30 shortest paths but only 12 of them pass through node V 2 . These are Therefore the C bet(V2) = 12 30 = 0.4. Eccentricity calculation example: Node V 1 accesses nodes V 2 , V 4 , V 5 , V 6 with one step and node V 3 with two steps. Therefore, its eccentricity will be max (2, 1) = 2.
Density is the ratio between the number of edges in a graph and the number of possible edges in the same graph. In a fully connected graph (e.g., protein complex), the number of possible edges (pairwise connections) are E max = V(V−1) 2 . Therefore, the density can be calculated as density = E E max = 2E V(V−1) . If a graph has E ≃ V k , 2 > k > 1, then this graph is considered as dense, whereas when a graph has E ≃ V or E ≃ V k , k ≤ 1, it is considered as sparse.
The Clustering coefficient is a measure which shows whether a network or a node has the tendency to form clusters or tightly connected communities (e.g., protein clusters in a proteinprotein interaction network). The clustering coefficient of a node is defined as the number of edges between its neighbors divided by the number of possible connections between these neighbors. The clustering coefficient of a node i is defined as C i = 2e where k is the number of neighbors (degree) and e the number of edges between these k neighbors. The average clustering of a network is given by C avg = C i V . The clustering coefficient takes values 0 ≤ C i ≤ 1, thus the closer to 1, the higher the tendency for clusters to be formed. An example is shown in Figures 3C,D. The matching index M ij can be used to identify two nodes in a network which might be functionally similar without necessarily being connected to each other. The matching index is a measure to quantify such similarity between any two nodes within a network and, according to the above, two nodes can be found to be functionally similar if they share common neighbors. The matching index between vertices i and j is calculated as M ij = Σ distinct common neighbors Σ total number of neighbors and can be extended beyond the direct neighbors of a vertex. In addition, it can be applied to multiedge networks.
The distance dist ij between two nodes (e.g., metabolites in a metabolic network) is defined as the length of the shortest path between them. As shortest path we define the minimal number of edges that need to be traversed to reach node j from node i. In the case where two shortest paths of identical length exist, any of them could be used. Whenever there is no connection between two nodes i and j, then their distance is defined as infinite dist ij = ∞. In addition, the diameter, diam m = max(dist ij ), is the maximal distance between any pair of vertices. The average path length is defined as the average distance between all node pairs and is defined as dist avg = 1 N(N−1) Frontiers in Bioengineering and Biotechnology | www.frontiersin.org

NETWORK CENTRALITIES
Very often, in network analysis, we ask questions such as: which is the most important node, which node behaves as a hub, which node is the bridge between two different communities, which node is important for the network's robustness (tolerance to failures and perturbations), etc. In order to address these questions, various network centralities can be used. The degree centrality (Bonacich, 1987) is a measure to highlight highly connected nodes (e.g., central transcription factors). A network with a star-like topology for example, contains hubs which are central nodes with many neighbors around them. The degree centrality of a node i is calculated as C i = deg(i) where deg(i) is the node's degree. Similarly, the closeness centrality (Sabidussi, 1966) is a measure to detect important nodes which can communicate quickly with other nodes in a network. For a graph G = (V, E) it is defined as C clo = 1 dist ij or as C clo = N−1 dist ij in its normalized form. In biochemical networks, it is often used to find top metabolites [e.g., metabolites in E. coli as part of the glycolysis and citrate acid cycle pathways (Ma and Zeng, 2003;Koschützki and Schreiber, 2008)]. The betweenness centrality (Freeman, 1977) shows the nodes which form such bridges so that two communities can communicate with each other. It is calculated as C bet (i) = σ xy (i) σ xy where σ xy is the total number of shortest paths from node x to node y and σ xy (i) is the number of those paths that pass through node i. It has been shown that proteins with high betweenness centrality in a protein-protein interaction (PPI) network play an important role to the modularization of the network (Koschützki and Schreiber, 2008). The eccentricity centrality (Hage and Harary, 1995) shows how easily accessible a vertex is from any other vertex in the network. The eccentricity is the maximum graph distance between vertex i and any other vertex j in graph G. For a disconnected graph, all vertices are defined to have infinite eccentricity. The eccentricity centrality is calculated as C ecc = 1 max(dist ij ) . Eccentricity centrality has been used to detect essential proteins in a PPI (Jalili et al., 2016). Notably, the maximum eccentricity is called the graph diameter, whereas the minimum graph eccentricity is called the graph radius. Finally, there are many other specialized centralities that serve different purposes. The eigenvector centrality for example, detects vertices that are connected to important vertices, whereas the subgraph centrality accounts for the participation of a node in all subgraphs of the network. Examples are shown in Figure 3E.

MOTIFS
Network motifs are repeated graphlets (small subgraphs of a larger network that appear at any frequency) in a specific network capturing particular patterns of interactions between vertices. They are often associated with particular functions (Stone et al., 2019) and are used for many applications in biological networks (Kim et al., 2011). Motifs are structures which occur at higher frequencies compared to random networks and are found in both directed and undirected networks. Motif analysis is often applied on biological networks such as biochemical, ecological, neurobiology, or gene expression networks to unravel building blocks associated with certain biological processes.
For example, motifs can be found in ecological food webs as well as genetic networks or the World Wide Web (Milo, 2002;Shen-Orr et al., 2002). Feed-forward-loop (FFL) and bifan motifs (Figure 4) are typical patterns found in various types of biological networks . Notably, motifs have been used to distinguish different protein-protein interaction networks (Przulj et al., 2004) and in contrast to the transcriptional regulatory networks, it has been shown that they are evolutionary conserved in PPI networks (Conant and Wagner, 2003).
To measure the statistical significance of a network motif, a Z-score or a P-value can be used. The Z-score is calculated as the difference of the frequency f (m) of a motif m in a network and its mean frequency f r (m) in a large number of Similarly, the P-value shows the probability P(m) of a motif m to appear in a randomized network equally or more times than in the network of interest. Motifs are considered to be statistically significant if they have Z(m) > 2.0. Motif detection can become computationally expensive and tools like Pajek (Mrvar and Batagelj, 2016), Mfinder (Kashtan et al., 2004), MAVisto (Schreiber and Schwöbbermeyer, 2005), NetMatch (Ferro et al., 2007), SANA (Mamano and Hayes, 2017), and FANMOD (Wernicke and Rasche, 2006) are offered for this purpose (Kavurucu, 2015).

MODELS
In order to better understand a network's topology and come to the conclusion of whether observed features are networkspecific or not, several models such as the Erdos-Rényi (Bollobás, 2001), Watts-Strogatz (Watts and Strogatz, 1998), and Barabási-Albert (Barabasi and Albert, 1999) have been introduced ( Figure 5).
The Erdos-Rényi model: It is one of the most popular models in graph theory and was mainly introduced to describe the properties of a random graph. According to this model, V number of vertices are randomly connected with probability p = 2E V(V−1) . In general, in such a graph, each pair of vertices can be connected with approximately an equal probability p ≤ 1, whereas the degree distribution is given by a binomial distribution. The probability of a vertex to have degree deg Notably, for a network where V → ∞ the distribution becomes approximately Poissonian. A typical characteristic of a random network is its homogeneity as most vertices have a similar number of connections. For small p, the network seems as disconnected, whereas for p ≈ and shows that the probability of two nodes with a common neighbor to be connected is the same as the probability of two randomly paired vertices. In the case of biological networks, straightforward comparisons show if they have a certain topology or differ from any other random network. Thus, Erdos-Rényi is not a good model for biological networks with respect to degree distribution. The Watts-Strogatz model: This model was introduced to describe random networks that follow a small world topology meaning that most nodes can be reached by any other node in a small number of steps. While random networks can often capture this property too, they fail to account for highly connected regions like in most empirical networks (e.g., social networks). Therefore, Watts and Strogatz proposed a model for networks described by local structures (high clustering coefficient) as well as small average path lengths. Metabolic networks [e.g., fatmetabolism communication in Yeast (Al-Anzi et al., 2015)], in which metabolites are linked to each other with small steps, is a typical example (Jeong et al., 2000). In a Watts-Strogatz network, if all vertices are placed on a circular ring, each vertex would be connected to its V 2 neighbors. In the real world, this indicates the form of small communities where people know other people from their close environment as well as friends of friends from nearby areas. Coexistence of high local clustering and short average path length are two main characteristics of this type of networks.
The Barabási-Albert model: This model describes random scale-free networks. These are networks whose degree distribution follows a power law taking into account their inhomogeneous degree distribution or otherwise networks with nodes which do not have a typical number of neighbors. According to this model, networks can evolve overtime and new edges do not appear randomly, whereas new nodes follow the existing degree distribution. At time point t = 0 for example, let's assume a network consisting of V 0 vertices and zero edges. A new vertex will connect with e ≤ V 0 edges to the existing vertices, whereas after t time points, the network is expected to consist of V = V 0 + et edges. Notably, for t ≫ 1, the Barabasi-Albert model will exhibit a scale-free distribution p(k) ∼ k −γ , γ = 3. Like in a social network, individuals who already have many friends are likely to acquire more friends overtime compared to individuals with a limited number of friends. When comparing the Erdos-Rényi and Watts-Strogatz networks of the same size and density, the Barabasi-Albert networks were found to have shorter average path lengths. Characteristic examples of BA networks are the Protein-Protein interaction networks Yook et al., 2004).
Like in many real-life examples, most biological networks are robust and tolerant against random removal of nodes as biological functions must remain maintained. However, compared to random networks with homogeneous degree distribution, scale-free networks are very vulnerable to targeted attacks but very robust against random removal of vertices. In general, nodes with low degree appear more frequently compared to nodes with high degree and play a minor role in the overall network topology, whereas aimed removal of nodes with higher degree distribution can affect a network's topology significantly.

BIOLOGICAL AND BIOMEDICAL NETWORKS
In biomedical research, graphs can capture the associations between any type of biological entity such as proteins, genes, small molecules, metabolites, ligands, diseases, drugs, or even database records (Figure 6). Some biological networks model the functions of celland tissue-specific molecular interactions at a cellular organizational level, varying from cells to a complete organ. These are:

Protein-Protein Interaction Networks (PPIs)
This type of networks holds information about how different proteins operate with each other to enable a biological process within a cell. The interactions in a PPI network can be physical or predicted. Notably, a whole interactome can capture all PPIs happening in a cell or an organism. In vivo and in vitro methods for detecting PPIs include: Xray crystallography, NMR, tandem affinity purification (TAP), affinity chromatography, coimmunoprecipitation, protein arrays, protein fragment complementation, phage display and yeast two-hybrid (Y2H) (Rao et al., 2014). Widely used repositories (Lehne and Schlitt, 2009;Szklarczyk and Jensen, 2015) which host PPIs for various organisms are the BioGRID (Stark et al., 2006), MINT (Chatr-aryamontri et al., 2007), BIND , DIP (Xenarios et al., 2000), IntAct (Hermjakob et al., 2004a), and HPRD (Peri et al., 2003) database. Concerning topology, the PPI networks follow a small-world property and are scale-free networks. Central hubs often represent evolutionarily conserved proteins, whereas cliques (fully connected subgraphs) have been found to have a high functional significance (Spirin and Mirny, 2003).

Sequence Similarity Networks (SSNs)
These networks consist of nodes representing proteins or genes and edges capturing the sequence similarity between amino acid or nucleotide sequences. Widely used tools (Ekre and Mante, 2016) for obtaining a sequence similarity between two sequences are the BLAST (Altschul et al., 1990), LAST (Kiełbasa et al., 2011), and FASTA3 suite (Pearson, 2000), which contains SSEARCH, GGSEARCH, GLSEARCH executables of Smith-Waterman (Smith and Waterman, 1981) and Needleman-Wunsch (Needleman and Wunsch, 1970) implementations for local and global sequence alignment. These networks are weighted, have a small-world and scale-free topology and often contain hubs. Often, clustering algorithms are applied on such networks for the detection of protein families. Like in PPIs, proteins that lie together in such networks are more likely to have similar functions or be involved in similar biological processes (Sharan et al., 2007). While it is not straightforward to come to a conclusion about their density, when coping with fragmented sequences (e.g., alignments of predicted proteins from metagenomes), the networks are rather sparse.

Gene Regulatory Networks
They are collections of regulatory relationships between transcription factors (TFs) and TF-binding sites or between genes and their regulators. Normally, these networks are directed, dynamic, and can be visualized as bipartite graphs. In such networks, most nodes have only a few interactions and only a few hubs come with a higher connectivity degree. In any case, such networks follow a power law degree distribution (scale-free) p(k) ∼ k −γ , γ ≈ 2 (Vázquez et al., 2004). Among a variety of databases hosting information about gene regulation, widely used repositories are the KEGG (Kanehisa and Goto, 2000), GTRD (Yevshin et al., 2019), TRANSFAC (Matys et al., 2003), TRRUST (Han et al., 2018).

Signal Transduction Networks
These networks capture cell signaling or otherwise the transmission of molecular signals as well as a series of molecular events within a cell or from the exterior to its interior (Fabregat et al., 2018). A signal transduction network normally consists of several thousand nodes and edges representing a series of reactions. These networks are mostly directed and sparse. They follow a power law degree distribution as well as small-world properties. While such data can be found in well-known pathway databases (KEGG, Reactome), specialized repositories such as the MiST (signal transduction in microbes) (Ulrich and Zhulin, 2007), NetPath (Kandasamy et al., 2010), or Human-gpDB (Satagopam et al., 2010) also exist.

Metabolic Networks
They are networks consisting of metabolites (nodes) and their interactions in an organism. Metabolites can be either smaller molecules such as amino acids or larger macromolecules like polysaccharides. These networks are usually directed graphs and can be represented as Petri nets (Reisig, 1985;Chaouiya, 2007). They are scale-free, they carry small-world properties (Jeong et al., 2000) and can often be organized using hierarchies (Gagneur et al., 2003). In order to gain insights into their decomposition, heuristic modularity optimization over all possible divisions to find the best one is required (Newman and Girvan, 2004). KEGG and Reactome databases are two of the most widely used repositories for this type of network.

Gene Co-expression Networks
They are undirected weighted networks where two nodes (genes) are connected if there is a significant co-expression between them. Such networks are usually constructed using data from high-throughput technologies such as Microarrays, RNA-Seq or scRNA-seq. For each pairwise connection, a metric like for example, the Pearson Correlation Coefficient (PCC) (Kirch, 2008) can be used to calculate an edge's weight. Often, a threshold or a Z-score are applied on the whole network in order to accept correlations above a certain cutoff. Otherwise the network would look like a fully connected clique. After the threshold and depending on the total clustering coefficient, the network can be clustered to detect functional modules. One typical example is the ribosomal genes which tend to group together due to similar expression patterns. Expression data for such analyses can be found in widely used repositories such as GEO (Barrett et al., 2013) or ArrayExpress (Parkinson et al., 2007). Notably, Arena-Idb (Bonnici et al., 2018) repository can be used for human non-coding RNAs interactions.

Expression Quantitative Trait Loci (eQTL) Network
Data obtained from genotyping and/or transcriptomic experiments are used as locus (eQTLs) in explaining a fraction of the genetic variance of a gene expression phenotype (Nica and Dermitzakis, 2013). For this purpose, eQTL networks are suitable for summarizing this information (Platig et al., 2016;Fagny et al., 2017;Sonawane et al., 2019). Genome-wide association studies (GWASs) are used for association between common genetic variants and phenotypic traits based on many variants of relatively small effect size. Those singlenucleotide polymorphisms (SNPs) are measured by expression quantitative trait locus (eQTL) analysis and are represented by eQTL networks with significant associations as edges. Findings provide unique insight into the genotype-phenotype relationship [e.g., Enhanced tissue-specific heritability of type 2 diabetes (T2D) was identified by eQTL networks (Torres et al., 2014;Fagny et al., 2017)].

lncRNA-Protein Interaction Networks
These networks reveal the functions of lncRNAs coming from their interactions with proteins (Yue et al., 2019). Most common experiments for studying these interactions include the RNP immunoprecipitation-microarray (RIP-Chip) (RIP-Chip: the isolation and identification of mRNAs, microRNAs, and protein components of ribonucleoprotein complexes from cell extracts), high-throughput sequencing of RNA isolated by crosslinking immunoprecipitation (HITS-CLIP) (HITS-CLIP yields genome-wide insights into brain alternative RNA processing.), photoactivatable ribonucleosideenhanced crosslinking and immunoprecipitation (PAR-CLIP) (Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP.) and RNAcompete (Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins). Regarding the computational methods for predicting these interactions, network-based methods are the most applicable. Multiple protein-protein similarity networks (PPSNs) (Fusing multiple protein-protein similarity networks to effectively predict lncRNA-protein interactions.), LPIHN (Predicting Long Noncoding RNA and Protein Interactions Using Heterogeneous Network Model) and PLPIHS (Prediction of lncRNA-protein interactions using HeteSim scores based on heterogeneous networks) could be applied to generate lncRNA-protein interaction networks (Zhang et al., 2019).
Additionally, some biological networks are distinguished by comprising information about evolution and interactions of species. These are: Phylogenetic networks: They are networks trying to capture the evolutionary relationships between organisms in time (Huson et al., 2010;Thomas and Portier, 2013). Reconstructed phylogenies are mainly represented as trees even if it is debatable whether a tree is the right scheme as it fails at capturing events like the union of different lineages.
As an extension to trees, phylogenetic networks might contain loops. The tree of life is a global effort to capture the evolution of all organisms in a single snapshot and describe the relationships between them. Notably, widely used methods for tree reconstruction are the Neighbor-Joining (NJ) (Saitou and Nei, 1987), UPGMA, and maximum likelihood parsimony (Golding and Felsenstein, 1990), whereas widely used applications for such analyses are the PAUP (Yang, 1996), PHYLIP (Baum, 1989), and MEGA (Kumar et al., 2016). Ecological networks: These networks mainly represent food webs or interactions among species in an ecosystem. These interactions can be trophic or symbiotic (Ings et al., 2009), mutualistic (bidirectional) or competitive (host-parasite). A fundamental aim of ecological network analysis is to uncover the mechanisms which influence the stability of fragile ecosystems. In general, binary food webs can be simple directed or undirected k-partite or simple graphs, whereas quantitative food chains can be shown as weighted graphs. Most food webs follow an exponential degree distribution, whereas it is well-accepted that such webs display an average low connectance. An in-depth analysis of the topological features of this type of networks is extensively discussed elsewhere (Danon et al., 2011). Epidemiological networks: They are networks used in public health to study disease transmission (e.g., sexually transmitted diseases-STDs) (Danon et al., 2011). Path traversal analysis can reveal transmission routes while the network's structure can provide insights into the epidemiological dynamics. While epidemiological networks often simulate social networks, they can be shown as bipartite graphs .

Species interaction networks:
There are between-species interaction networks describing pairwise interactions between species, trying to understand what factors (e.g., diversity) lead to stability (Romanuk et al., 2010) and withinspecies interaction networks quantifying associations between individuals, offering information in species, and/or population level (Croft et al., 2004). Food webs: All organisms are connected to each other through feeding interactions and the networks presenting these interactions are very-well known for the effort to answer the long-standing question in ecology about the stability of these interactions (Milner-Gulland, 2012). Interactions of ecological entities captured in networks can be obtained from literature articles, observation in the field, molecular experiments (e.g., analysis of environmental DNA), or models based on incomplete data (Delmas et al., 2019).
Moreover, biomedical graphs are of great importance for both researchers and clinicians (Yue et al., 2019). These are: Disease networks: They are formed by diseases and their causative genes, while the connections between them can be constructed based on repositories such as the Online Mendelian Inheritance in Man (OMIM) associations. These networks are generated when diseases share at least one causative gene, and therefore are considered to be linked. Disease networks are typically shown as bipartite networks (Goh et al., 2007).

Drug-disease associations:
These networks hold information about known and/or predicted drug-disease associations. The information could be extracted from a database or from published literature (Gottlieb et al., 2011;Sonawane et al., 2019;Yue et al., 2019). Disease-symptom graphs: These graphs connect diseases with their symptoms and visualize the potential evolution of the diseases, assisting clinicians to follow the more efficient medical treatment rapidly (Sonawane et al., 2019). These graphs are generated based on medical records using rudimentary concept extraction of cause and effect.
Finally, data integration approaches can be used to generate biological networks consisting of nodes which represent text, database records, or literature articles.
Literature co-occurrence networks: These networks show connections between bioentities that are found to co-occur in any text corpus (Pavlopoulos et al., 2014). Name-entityrecognition (NER) taggers such as the EXTRACT (Pafilis et al., 2016), can be used to initially identify genes/proteins, chemical compounds, environments, tissues, diseases, phenotypes, and Gene Ontology terms in a text and map the identified terms to their corresponding ontology/taxonomy entries in public databases. This way, any text corpus like Wikipedia, PubMed (∼29 million abstracts) or PubMed Central (PMC, ∼6 million full-text articles) can be parsed and analyzed for both abstractbased or sentence-based co-occurrences. Knowledge networks: These networks are mostly multiedge graphs as they combine heterogeneous information and metadata from various sources like public repositories or biological and literature databases. Typical examples are the STRING (Franceschini et al., 2013), STITCH (Szklarczyk et al., 2016), and PICKLE (Gioutlakis et al., 2017) databases. STRING contains known and predicted protein-protein interactions for various organisms, whereas STITCH contains known and predicted interactions between chemical compounds and proteins. In the STRING database, two proteins can be, for example, connected in multiple ways. They can be homologous, or co-occur in an abstract, or have neighboring positions in a genome or be products of a fusion event or co-express in an experiment. Similarly, PICKLE integrates publicly available PPI databases via genetic information ontology. Finally, bioDBnet (Mudunuri et al., 2009) is a network of the major biological databases.
Overall, biological networks follow the new era of hybrid heterogeneous networks, trying to put together different types of information (Navlakha and Kingsford, 2010;Moreau and Tranchevent, 2012;Ni et al., 2016). It is worth mentioning, that a great collection of biological networks that are produced by researchers and are published in various articles can be found in https://cytoscape-publications.tumblr.com. This repository can be used as an excellent teaching material as well as a great resource for inspiration and case studies when building software applications.

FUNCTIONAL ANNOTATION AND OVERREPRESENTATION ANALYSIS
A common task in computational biology field is the annotation and interpretation of gene lists (e.g., genes or proteins which are found to be tightly connected in a network). For this task, functional annotation and/or overrepresentation analysis can be used (Tipney and Hunter, 2010;Hung et al., 2012). Enrichment analysis determines over-represented classes of genes or proteins in a large group of samples in order to reveal existing associations with disease phenotypes (Huang et al., 2009a). Similarly, functional enrichment analysis applies statistical tests to match genes of interest with certain biological functions (Bindea et al., 2009). PANTHER (Mi et al., 2013), Gorilla (Eden et al., 2009) and DAVID (Huang et al., 2009b) applications for example, accept a gene list as an input and report related hits to molecular functions, biological processes [e.g., Gene Ontology (Gene Ontology Consortium, 2004)] and KEGG (Kanehisa and Goto, 2000) and Reactome (Fabregat et al., 2018) pathways. Another similar tool is the ClueGO (Bindea et al., 2009) which is offered as a Cytoscape plugin. For researchers interested in non-coding RNA annotation and identification, Transcriptator (Tripathi et al., 2015) can be used. Pathway enrichment analysis can be also performed by additional tools such as pathfindR (Ulgen et al., 2019), g:Profiler (Raudvere et al., 2019), and EnrichmentMap (Merico et al., 2010;Reimand et al., 2019). Gene Set Enrichment Analysis [GSEA (Mootha et al., 2003;Subramanian et al., 2005)] and NGSEA  can be used for overrepresentation analysis, whereas differential expression analysis for the determination of the upand down-regulated genes is offered by DESeq2 (Michael, 2017) or metaseqR (Moulos and Hatzis, 2015).

FILE FORMATS
A network can be described and stored in multiple humanand computer-readable ways. Apart from the simple file formats such as the tab-delimited, CSV, SIF, Excel and adjacency matrix, several others like the BioPAX (Demir et al., 2010), SBML (Hucka et al., 2003), PSI-MI (Hermjakob et al., 2004b), CML (Murray-Rust et al., 2001), and CellML (Lloyd et al., 2004) have been introduced for biological data and semantics. For example, SBML, which stands for Systems Biology Markup Language, is an XML-like format for storing and parsing biochemical networks as well as for describing biological processes. BioPAX stands for Biological Pathway Exchange and is made for the representation of biological pathways at the molecular and cellular level. The PSI-MI format is used for the data exchange related to molecular interactions and CellML is used for describing mathematical models. GraphML (Brandes et al., 2017) is an XML-like file format and consists of unordered sections related to a network's node and edge elements. Each node has a distinct identifier, whereas each edge is described by a source and a target node. Additional attributes such, an edge weight or a label can also be included in the schema. The JavaScript Object Notation (JSON) format is a generic and widely-used non-biological file format and is popular for web-based applications or web-server asynchronous communication and data exchange. However, it is worth mentioning that Cytoscape.js (Franz et al., 2016) accepts JSON formats for network visualization. Finally, the Nexus and the Newick file formats are standard ways for representing trees. While NDEx (Pillich et al., 2017) is an open-source framework for the sharing of networks of many types and formats, file-format-specific parsers are available [e.g., Bioconductor (Gentleman et al., 2004) rBiopaxParser (Kramer et al., 2013), rsbml, RPsiXML and others]. Examples of such file formats are shown in Figure 7.

GRAPH LAYOUTS AND EDGE BUNDLING
For graph analysis and interpretation, it is important to be able to depict a graph whose structure, symmetries, and other main features become clear in a visually and aesthetically appealing way. This is especially true for graphs of large size, where many nodes and edges can have multiple clusters and interconnected areas.
Graph drawing combines methods from mathematics and computer science to derive two-and three-dimensional representations of graphs, employing a number of strategies (Figure 8). Among the most successful layouts are the forcebased layout approaches, where the nodes of the graphs are metaphorically modeled as point particles with attractive (spring) forces acting between nodes connected by an edge and repelling (electrical) forces acting between all pairs of nodes. The optimal layout is determined by the positions of nodes/particles that minimize the total energy of the system. Typically, such a state is found by simulating the forces of the many-particle physical system and arriving at a minimum energy state iteratively. In addition, in a spectral layout method, the coordinates are taken to be the eigenvectors of a matrix such as the Laplacian, derived from the adjacency matrix of the graph. Orthogonal layout methods allow the edges of the graph to run horizontally or vertically, parallel to the coordinate axes of the layout, while tree layout algorithms use tree-like structures and are suitable for visualizing ontologies or hierarchies. Finally, circular layout methods place the vertices of the graph on a circle, choosing carefully the ordering of the vertices around the circle to reduce crossings and place adjacent vertices close to each other.
While graph drawing is a mature field with many proposed alternatives, the approaches that produce the most compelling visualizations (e.g., force directed based algorithms) can often become CPU and memory greedy and struggle with visualizing networks of more than a few thousands of nodes and edges. Empirical performance statistics can be found elsewhere (Pavlopoulos et al., 2017). Many layout algorithms are embedded in standard visualization tools: Gephi (Bastian et al., 2009) visualization tool comes with a great variety of algorithms such as OpenOrd (Martin et al., 2011) and Yifan-Hu (Yifan, 2005) forcedirected algorithms. OpenOrd can layout networks consisting of over a million nodes in less than half an hour but aesthetics depends on the network's topology. The Yifan-Hu layout can give aesthetically comparable representations to the ones produced by the widely used but time-consuming Fruchterman-Reingold (Fruchterman and Reingold, 1991), with much faster performance. Other algorithms included in Gephi are the circular, contraction, dual circle, random, MDS, Geo, Isometric, GraphViz, and Force atlas layouts. Similarly, Cytoscape (Shannon et al., 2003) visualization tool comes with a rich variety of simple (grid, random, and circular) and more sophisticated (force-directed, hierarchical) layout algorithms. Finally, for more customized layouts, one can utilize the igraph library (Gabor and Nepusz, 2006). yWorks provides the professional software manufacturer with state-of-the-art diagramming components.
For even more aesthetic layouts, edge bundling methods can be utilized to provide significant clutter reduction and make visible high-level edge patterns clearer (Zhou et al., 2013). These methods are gaining ground over the years and are mostly divided in hierarchical or force directed. An overview of these methods is extensively described elsewhere (Zhou, 2016). Edge bundling methods are still computationally expensive and their main philosophy is to group edges together like a bundle of cables. Cytoscape and Tulip (Auber et al., 2017) are two of the most widely used visualization tools which have such methods incorporated. Basic node layout as well as edgebundling examples are shown in Figure 8.
In general, force-directed layouts are very suitable for scalefree networks like PPIs or highly modular networks with distinct communities or high clustering coefficient. It would not make sense for example to apply a force-directed layout in a fully connected graph. Similarly, hierarchical layouts are more suitable for trees or tree-like graphs such as the Gene Ontology. Finally, it is worth mentioning that there is tradeoff against time, particularly because algorithms (e.g., layouts) grow time exponentially as the network increases.

NETWORK VISUALIZATION
Several techniques have been introduced for the visualization of networks varying from very simple (e.g., adjacency matrices) to more complex (e.g., force directed layouts in 2D or 3D). However, the selection of the appropriate visualization, highly depends on the type of network which needs to be visualized. For example, in a multi-Omics approach, one would like to see different types of information (e.g., proteomics, transcriptomics, metabolomics, genomics) in a well-structured view. For this purpose, a multilayered visualization would be much more preferable compared to a generic force directed layout. This way, nodes of different type are placed onto different layers, while connections are allowed both within a layer as well as across layers. In the case of multi-edge graphs, two bioentities can be connected in multiple ways. Two genes, for example, might be homologous, or neighbors in a genome or co-express in an experiment. STRING database is one of the most widely used databases which utilizes multi-edge graph visualization. In such networks, layouts can be applied taking into consideration only one connection type or any combination of them.
While force-directed or hierarchical visualizations are very common, they often fail in coping the so-called hairball effect (dense networks where all nodes are almost connected to any other node-no structure). To partially address this issue, circos and hive plots have been introduced. Hive plot views use "radially oriented" linear axes as a coordinate system. Nodes are placed on these axes and edges are drawn as curved links. While hive plots are general, they have been used in biology to successfully visualize cancer, gene-disease, and gene regulatory networks (Krzywinski et al., 2012). Similarly, Circos application (Krzywinski et al., 2009) enables a circular composition to show connections between nodes or positions, which are difficult to visually organize when the underlying layout is linear. Such plots are very widely used in biology to represent phenomena like genomic variations. Arc diagrams in which nodes are displayed along a single axis and links are represented with arcs, can be used for a similar purpose. Finally, bipartite graphs which are widely used in epidemiology and gene-disease networks need special visualization to show mutual relationships between the elements of their two collections. While several other visualization approaches can be applied on hierarchical graphs (e.g., Gene Ontology) and biochemical networks (e.g., pathways or petri nets), the most basic concepts are schematically shown in Figure 9.

GRAPH-BASED CLUSTERING
Clustering is the process of grouping a set of objects so that objects belonging in the same group (cluster) have similar properties. For this, many state-of-the-art algorithms take into account the network's topology and try to cluster the network accordingly. For example, many approaches try to find densely connected areas in a network, others try to "break" the bridges (edges with high betweenness centrality) between distinct communities and others look for easiest flow paths or are based on node distances.
Despite the great variety of graph-based clustering algorithms available today (Xu and Wunsch, 2005;Brohée and van Helden, 2006;Moschopoulos et al., 2011), only few can cope with largescale networks consisting of millions of nodes and edges. SPICi (Jiang and Singh, 2010) is one of the fastest algorithms and accepts as input a list of connections. It supports both dense and sparse matrices and tries to find local densely connected neighbors using heuristics. It has running time complexity O(VlogV+E) time and needs O(E) memory. It is not suitable for networks with many hubs and low clustering coefficient. Louvain (Blondel et al., 2008) on the other hand, is an oldfashioned but rather fast and greedy algorithm with O(VlogV) time performance. Molecular Complex Detection (MCODE)  is a widely-used algorithm in biology and very suitable for finding protein complexes in PPI networks. It has O(VEd 3 ) time complexity where d is the vertex size of the average vertex neighborhood in the input graph. Affinitypropagation (Frey and Dueck, 2007) detects ways that nodes in a network can exchange "messages" between each other very fast. It is a high-quality algorithm and comes with O(V 2 ) time complexity. This might be a decent performance for mediumscale biological networks like gene co-expression or PPIs but not sufficient for larger networks like the literature-based or the knowledge-based ones. Markov Clustering (MCL) (Enright et al., 2002) is one of the mostly cited algorithms in the field and was initially introduced to detect protein families from sequence similarity networks. It uses random walks to detect highly-connected subgraphs using a mathematical bootstrapping procedure and is able to cluster a few million nodes in less than an hour. However, it is memory greedy, a bottleneck which has been solved with its parallel version HipMCL (Azad et al., 2018), a scalable distributed-memory implementation. HipMCL uses MPI (Forum, 1994), and OpenMP (Dagum and Menon, 1998) and can cluster a network consisting of 300 million nodes and ∼17 billion edges in only ∼6 h using ∼136,000 cores.
While it is not in the scope of this review to go into each algorithm's detail, we highly encourage readers to either try each of them individually in their command line versions or through the clusterMaker2 (Morris et al., 2011) Cytoscape plugin. In Figure 10 for example, a Yeast PPI network (Gavin et al., 2006) has been clustered with clusterMaker's MCL algorithm, whereas a randomly selected cluster has been annotated with Cytoscape's BiNGO plugin (Maere et al., 2005).

HIERARCHICAL CLUSTERING
Hierarchical clustering is a non-graph-based way of data clustering which accepts a distance matrix containing all pairwise distances between the nodes as input and outputs a dendrogram showing the hierarchical relationship between the clusters. The standard hierarchical algorithm has O(n 3 ) time complexity of and requires O(n 2 ) memory, thus making this method inappropriate for large data sets. Hierarchical clustering is divided in three main categories. These are Single linkage which calculates the smallest distance between objects in each iteration step, Complete linkage which calculates the longest distance between objects in each iteration step and Average linkage which uses the average distance between all pairs of objects in every iteration step. For more details, a survey explaining how hierarchical clustering algorithms work and what are their variations can be found elsewhere (Langfelder et al., 2008).
Notably, all calculations are based on a distance matrix (fully connected graph) which can be generated by a correlation matrix as D ij = 1 − PCC ij . D is the distance matrix and PCC a Pearson Correlation Matrix (e.g., gene co-expression networks). Figure 11 shows an example of how five genes can be hierarchically clustered according to their expression values/patterns measured in three hypothetical conditions or time points. The final output is a heatmap accompanied by a dendrogram showing how genes are grouped together. Notably, in cases where it is not straightforward which cutoff to apply on the tree in order to define the number of clusters, statistical methods to automate such task, are available (Langfelder et al., 2008).

CLUSTERING COMPARISON
Different clustering algorithms or runs of the same algorithm using different parameters can often lead to dissimilar results. Therefore, it is essential to be able to compare different clustering results between each other. This is especially useful when one wants to compare the results of a clustering algorithm against an "optimal" or desired clustering for example, to study an algorithm's accuracy.
For this purpose, several clustering comparison metrics have been introduced. Generally speaking, these metrics can be divided into three categories: (i) counting pairs, (ii) set overlaps, or (iii) mutual information (Wagner and Wagner, 2007). Some well-known clustering comparison metrics which are based on counting pairs are the Chi Squared Coefficient (Mirkin, 2001), Rand Index (Rand, 1971), Fowlkes-Mallows Index (Fowlkes and Mallows, 1983), Mirkin Metric (or Equivalence Mismatch Distance), Jaccard Index and the Partition Difference (Li et al., 2004). Metrics based on set overlaps include the F-Measure (Fung et al., 2003), Meila-Heckerman & Maximum-Match-Measure (Marina Meil and David, 2001), and the Van Dongen-Measure (Dongen, 2000). Finally, clustering comparison metrics of the mutual information category include the Normalized Mutual Information by Strehl & Ghosh (Alexander and Joydeep, 2003), Normalized Mutual Information by Fred & Jain (Ana and Jain, 2003) and the Variation of Information methods (Meila, 2000).
The metrics of the first category count the number of object pairs that (a) were clustered together in both clusterings, (b) were clustered differently in both clusterings and (c) were clustered together in only one of the two clusterings. Rand Index is such a metric, ranging from 0 to 1 and is defined as: RandIndex(C 1 , C 2 ) = 2(n ctb +n cdb ) n(n−1) where C 1 and C 2 are two different clusterings of a data set with n objects, n ctb is the total number of object pairs that were clustered together in both clustering and n cdb the number of pairs that were clustered differently in both clusterings. Intuitively, the Rand Index calculates the fraction of same-clustered (together or separately) pairs against the number of all possible pairs and equals to 1 when all pairs are clustered in the same manner in both clusterings and to 0 when there is no pair clustered in the same manner in any of the two clusterings.
Metrics based on set overlaps try to map clusters between clusterings in accordance to their maximum overlap. The Meila-Heckerman measure compares the results of a clustering against the optimal clustering. This makes the method asymmetric, which means it cannot be used while comparing two clusterings without one being the optimal. The Maximum-Match-Measure is the symmetric version of this metric which iteratively looks for the largest element of the confusion matrix of the intersection values between all clusters of the two clusterings, meaning the cluster pair with the largest overlap. The column and row of the confusion matrix which contain the largest element are then crossed out and the sum of the results of all iterations are aggregated and divided by the total number of elements. The formula for the Maximum-Match-Measure is as follows: MM(C 1 , C 2 ) = 1 n min{k,l} i = 1 max{conf ′ } where the algorithm finishes in min{k, l} steps, k and l are the respective numbers of clusters for clusterings C 1 and C 2 and conf ′ is the confusion matrix described above with i − 1 columns and i − 1 rows being removed at each iteration. Maximum-Match-Measure ranges from 0 to 1.
An asymmetric and widely used clustering comparison metric of the set-overlap category is the F-Measure. The F-Measure indicates how close a clustering C 2 is to an optimal clustering C 1 by making use of the harmonic mean of precision and recall between each cluster, with precision pC 1i C 2j = conf ij nC 2j and recall rC 1i C 2j = conf ij nC 1i , i ǫ [1, k] and j ǫ [1, l]. The F-Measure between two clusters is calculated as F(C 1i , C 2j ) = 2 * pC 1i C 2j * rC 1i C 2j pC 1i C 2j + rC 1i C 2j and the overall F-Measure between two clusterings is defined as the weighted sum of the maximum F-Measures for the clusters in C 2 , F(C 1 , C 2 ) = k i = 1 nC 1i n max l j=1 {F(C 1i , C 2j )} and ranges in [0, 1]. Metrics of the mutual information clustering comparison category are based on the entropy of information and on the probability of finding an element in a specific cluster. The entropy of a clustering is defined as H(C) = − k i = 1 P(i)log 2 p(i), where P(i) = nC i n is the probability that a random FIGURE 12 | Clustering comparisons. (A) Rand Index between C 1 and C 2 . C 11 and C 12 are clusters 1 and 2 of the C 1 clustering, respectively. One pair [1, 2] is clustered together in both clusterings, three pairs [1,5], [2, 5], and [3, 4] are clustered differently in both clusterings and the rest six pairs [1,3], [1,4], [2,3], [2,4], [3,4], and [4,5] have been placed together in only one of the two clusterings. The Rand Index between the two clusterings is calculated as RandIndex(C 1 , C 2 ) = 2(1+3) 5(4) = 0.4. (B) Maximum-Match-Measure between C 1 and C 2 . C 1 has four clusters while C 2 has three. At the first iteration the cluster-intersections' confusion matrix element conf ′ 11 = 4 is chosen and column 1 and row 1 are crossed out. At the second iteration the maximum element of the remaining confusion matrix is conf ′ 22 = 3 and column 2 and row 2 are crossed out. At the third and final iteration conf ′ 33 = 2 is chosen. The metric is calculated as: MM(C 1 , C 2 ) = 1 12 3 i = 1 max{conf ′ } = 1 12 (4 + 3 + 2) = 0.75. On the same schema if C 1 is chosen as the optimal clustering the F-measure for C 2 can be calculated. First, the precision and recall measures are calculated for clusters C 11 and C 21 as pC 11 C 21 = conf11 nC21 = 4 4 = 1 and rC 11 C 21 = conf11 nC11 = 4 5 . Then, the F-Measure can be calculated between these two clusters as F(C 11 , C 21 ) = 2 * pC11C21 * rC11 C21 pC11C21 + rC11 C21 = 2 * 1 * 4 5 1 + 4 5 = 8 9 . By calculating the respective values for the rest of the cluster pairs, the matrix (C) is created. The overall F-Measure of C 2 against C 1 is F(C 1 , C 2 ) = k i = 1 nC1i n max l j=1 {F(C 1i , C 2j )} = 5 12 * 8 9 + 3 12 * 3 4 + 2 12 * 4 5 + 2 12 * 2 5 ≃ 0.76. (D) Variation of Information matrix of the P(i, j) probabilities of an element being in the intersection of clusters. Based on the two clustering schemas of (A) the entropy of C 1 is H(C 1 ) = − 2 i = 1 P(i)log 2 p(i) = −( 3 5 log 2 ( 3 5 ) + 2 5 log 2 ( 2 5 )) ≃ 0.97 and following the same procedure H(C 2 ) ≃ 0.97. The mutual information between the two clusterings is calculated as I(C 1 , C 2 ) = k i = 1 Frontiers in Bioengineering and Biotechnology | www.frontiersin.org element picked is a member of cluster C i , and the mutual information between two clusterings C 1 and C 2 as I(C 1 , C 2 ) = k i = 1 l j = 1 P(i, j)log 2 P(i,j) PC 1 (i)PC 2 (j) , where P(i, j) = conf ij n is the probability that an element belongs in cluster C 1i and also in C 2j . The Variation of Information is a mutual information clustering comparison metric and is calculated as VI(C 1 , C 2 ) = H(C 1 ) + H(C 2 ) − 2I(C 1 , C 2 ). Intuitively, the Variation of Information metric describes the amount of information we lose from the first clustering as well as the information we still have to gain from the second clustering. The Variation of Information metric is not bounded by a constant value but by a log(n)upper bound (in the case of two trivial clusterings).
Examples of Rand Index, Maximum-Match-Measure, F-Measure and Variation of Information are shown in Figure 12.

NETWORK ALIGNMENT
In today's multi-Omics era, integration of heterogeneous information (e.g., transcriptomics, proteomics, metabolomics, etc.) in a multi-layered network structure is becoming a trend. Additionally, methods to directly compare networks and their topological features are gaining ground. To address these issues, network alignment, or alternatively graph isomorphism approaches can be used. Notably, graph alignment is not a trivial task as it is computationally expensive and has been characterized as NP-complete (Zampelli et al., 2010). The concept behind network alignment is to highlight conserved or missing nodes and edges across two (pairwise) or more (multiple) networks. In the biomedical field for example, an alignment could potentially be used for the discovery of conserved traits between different species (Sharan et al., 2005), the detection of common pathway interactions between two different disease states or the detection of deleted gene expression connections upon drug treatment. Like in a sequence alignment, a network alignment can also be either local or global.
Established implementations in the field include the NetworkBLAST aligner (Kalaev et al., 2008) for protein network alignment between two species or across multiple networks from different organisms, the MaWISh (Maximum Weight Induced Subgraph) (Koyutürk et al., 2006) for PPI alignments in order to underlie evolutionary relationships and the H-GRAAL  for metabolic networks of different species.
Until now, several graph alignment strategies have been introduced and various methods have been implemented. Some of the strategies are: modular graph kernels and divide and conquer strategies (Towfic et al., 2009), constraint programming (Zampelli et al., 2010), linear representation of networks (Kalaev et al., 2008), scoring functions (Flannick et al., 2008), connectedcomponents (Tian and Samatova, 2008), heuristic searches (Kuchaiev et al., 2010), and graphlet degree vectors. Notably, recent developments allow the alignment of networks with multiple edge types .
While it is not in the scope of this article to cover all existing methods, for demonstration purposes, we present an example based on a simplified version of the GRAAL alignment method. The GRAAL network aligner takes into consideration the topology of a network and uses facets from both local and global alignment methods to produce a global alignment. According to the GRAAL algorithm, each node of the smaller network is aligned to exactly one node of the larger one. Let's assume that there are two graphs G1 = (V1, E1) and G2 = (V2, E2). GRAAL introduces the concept of graphlets, which give a node a more detailed representation of its degree based on its local neighborhood of connections. All possible 2-and 3-node graphlet compositions are shown in Figure 13A. The aggregation of the number of 2-node graphlets attached to a node, represents the node's degree. A node can only appear in one of the annotated orbits 0-3 in the respective graphlets in Figure 13A. Figure 13B, demonstrates a network alignment example. During GRAAL's first step, the lowest possible alignment cost for aligning each node from G1 to each node of G2 is calculated. Each row in Figure 13C represents the graphlet-signature for each node (for each graph, respectively), based on all possible orbits depicted in Figure 13A. The cost of aligning two nodes takes into consideration both their degrees and graphlet-signature similarity, whereas the lower cost is assigned to high degree nodes, as well as to graphlet similarities which do not replicate lower-degree graphlets. The starting seed node-pair with the lowest cost is chosen and the alignment is expanded outwards from these two nodes. In this example, nodes V2 G and V1 A have the highest degrees and the most similar graphlet-signatures, thus the pair (V2 G , V1 A ) is chosen as the first aligned seed. The next highest degree node connected to V2 G is node V2 I . Node V2 I is then randomly matched to one of the nodes V1 B , V1 C , V1 D , V1 E (same degrees and similarity distances). For demonstration purposes let it be V1 B . Continuing on this graph's path, node V2 J is matched to V1 C . Moving on, node V2 H is aligned to node V1 F and finally node V2 K is randomly aligned to either V1 D or V1 E . Let it be V1 E . V1 D remains unaligned. The final GRAAL alignment is shown in Figure 13D.

LINK PREDICTION
Besides network alignment, predicting link changes in a single network has recently drawn attention in the biomedical field.
Link prediction might concern the creation of future edges or the identification of missing links (e.g., incomplete data). While link prediction techniques are widely used by social media, in biological networks, they have also been used to identify potential drug side effects, protein-protein interactions, disease phenotypes based on molecular information and phylogenetic relations. For example, its application on bipartite graphs has unraveled new drug-target interactions (Kunegis et al., 2013). Its application on heterogeneous biological networks, has led to the identification of key pathway and protein interactions responsible for disease pathogenesis as well as candidate multiple sclerosis-associated genes (Himmelstein and Baranzini, 2015). Its combination with multi-way-based spectral has led to link prediction of protein-protein interaction networks.
The algorithm chosen for link prediction is often tied to the data type of the network. Due to the fact that each network type comes with its own growth pattern, relative assumptions must be made (Kunegis et al., 2013). Starting from an adjacency matrix A, through eigenvalue decomposition we can write A as U U T . U is an n×n orthogonal matrix and an n×n diagonal matrix. The values ii are the eigenvalues of A, and the columns of U are its eigenvectors. The spectral evolution model states that in dynamic networks, eigenvalues change over time while eigenvectors remain constant. Some of the most common link prediction algorithm categories are listed below. Triangle closing or triadic closure (Leskovec et al., 2008) is a method for predicting edges which will appear between nodes with common neighbors and widely used in social network analysis. Path counting (Lü et al., 2009) is the extension of triangle closing, giving two nodes with further level neighbors an additional, lower link prediction score. Graph kernels (Smola and Kondor, 2003;Ito et al., 2005) are functions which describe the similarity between two nodes and are often used for link prediction.
Here, we demonstrate a link prediction example based on the triangle closing model. The adjacency matrix A of an unweighted and undirected network is shown in Figure 14A. The algebraic representation of the triangle closing model can be expressed as a square A 2 of the adjacency matrix, where each (i,j) pair contains the number of common neighbors between i and j at a given time t. Thus, A 2 ij = k A ik A jk . The A 2 matrix is shown in Figure 14B. Let the algorithm allow only one new edge creation at each iteration. At the time point t+1, the highest value of A 2 is 3 (between nodes V1 and V7). The new edge V1-V7 is created and the A 2 table is updated accordingly for the next iteration. If this edge already existed, the next highest value of A 2 would be checked. In case of duplicate highest values, the algorithm would have chosen one of the corresponding edges to create randomly. The network of this example and the new edge created at time point t+1 are depicted in Figure 14C.
Here, we present some tools which allow the computational link prediction on biological networks. HETIONET (Himmelstein and Baranzini, 2015) is an integrative biomedical knowledge network assembled from 29 different databases of genes, compounds, diseases, and more. Through HETIONET's website (https://het.io/), researchers can browse an interactive biological network of 47,031 nodes (11 types) and 2,250,197 relationships (24 types) and formulate their own edge predictions. Linkpred (Guns, 2014) calculates the likelihood of potential edge-creation in a future snapshot of a network. There are 18 predictor functions (local and global) to choose from. LPmade is another link prediction software which specializes in link prediction via commonly used unsupervised link prediction methods such as Adamic/Adar, common neighbors, Jaccard's coefficient, Katz, preferential attachment, PropFlow, rooted PageRank, SimRank, and weighted rooted PageRank.

NETWORK PERTURBATION
In biology, direct comparisons between a disease and a healthy state are very common, thus making the study of molecular changes essential. Therefore, at a network level, changes between such states are considered as biological network perturbations. In network medicine, a network's topology can be used as the backbone to further predict side effects in a system even at a 65-80% success rate (Santolini and Barabási, 2018). In the same study for example, a topology-based methodology was applied on a chemotaxis network of bacteria in order to predict the dynamics of perturbations such as gene knockout and overexpression with 90% accuracy. Furthermore, gene editing techniques such as CRISPR/Cas9 also benefit from network perturbation studies as the combination of singlecell sequencing methods with CRISPR/Cas9 offers detailed information of gene-knockout effects at a cellular level (Holding et al., 2019). Similar to gene knockout, RNA interference (RNAi) is a protein silencing method, where RNA molecules inhibit gene expression by targeting their mRNA. Nested effect models (NEMs) constitute probabilistic graphical models that describe the directed hierarchical dependencies on a perturbation network. In a recent study (Siebourg-Polster et al., 2015), it has been shown that an extended version of NEMs (NEMix), was proposed and used on signaling pathways' networks. In a use case scenario of a human rhinovirus (HRV) infection signaling network constructed from RNAi screening data, the proposed method inferred highly accurate signaling networks, fully aligned to the ones in KEGG database.
Concerning libraries, igraph (Gabor and Nepusz, 2006) is an open source library for network analysis and can be used by both Python and R languages. It offers a very rich plethora of functions dedicated to network analysis while it emphasizes on efficiency, portability, and ease of use. VisNetwork is a JavaScriptbased R package for network visualization and ggplot2 an R data visualization package suitable for interactive charts and plots.
Graphviz is an open source graph visualization software for representing structural information such as diagrams of abstract graphs and networks. NetworkD3 is a D3 JavaScript library and plotly a library suitable for data analytics. In addition, Three.js is a cross-browser JavaScript library for animated 3D computer graphics in a web browser and ndtv-d3 a library suitable for timelines and animated movies of objects. NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks and Graph-tool (Peixoto, 2017) a Python module for manipulation and statistical analysis of graphs. Finally, as biological network analyses become more and more popular, data exchange is crucial. For this purpose, the NDEx Project (Pillich et al., 2017) provides an open-source framework where scientists and organizations can share, store, manipulate, and publish biological network knowledge.

DISCUSSION
The adoption of mature high-throughput -Omics approaches to analyze biological samples (e.g., genomics, transcriptomics, proteomics, metabolomics etc.) has led to the production of data at the scale of tera-to peta-byte in size. Subsequently, due to this trend, biological networks follow an exponential growth, thus making their exploration, visualization, analysis, and storage a very difficult task. Therefore, traditional algorithms and data structures often fail to address scalability issues, thus making the adoption of modern technologies a necessity. As current tools are often limited in coping with large-scale datasets, Big Data approaches as well as parallel processing could be used for storing, querying, and processing large data volumes. Ideally, network analysis and visualization software could support algorithms which can run on distributed memory or multiple CPU and GPU systems for increased performance. In addition, global state-of-the-art data structures adjusted to such systems would be of great benefit.
Another bottleneck in systems biology is the visualization and representation of large-scale networks. As networks increase in size and complexity, more efficient algorithms for visualization are necessary. Notably, an alternative way to overcome 2D/3D space limitations is the adoption of virtual reality (VR) technologies. This way, biological networks could be for example explored or browsed using virtual universes. Typical examples for visualizing living systems such as a whole cell using such technology are the Visible Cell (Gagescu, 2001) or CELLmicrocosmos (Sommer, 2019). However, even after a decade of its existence, graphical limits, and cost of VR devices are still restrictive factors to be considered.

AUTHOR CONTRIBUTIONS
MK and EK wrote most of the manuscript. DP-E helped with the structure and with the biological content. GP conceived the concept and supervised the whole process.

FUNDING
Supported by the Operational Program Competitiveness, Entrepreneurship and Innovation, NSRF 2014-2020, Action code: MIS 5002562, co-financed by Greece and the European Union (European Regional Development Fund). EK has been partially supported by the Action Strengthening Human Resources, Education and Lifelong Learning, 2014-2020, co-funded by the European Social Fund (ESF) and the Greek State.