A structural analysis of the hypoxia response network

Background The hypoxia-inducible factor-1 (HIF-1) signaling pathway is an important topic in high-altitude medicine. Network analysis is a novel method for integrating information on different aspects and levels of biological networks. However, this method has not been used in research on the HIF-1 signaling pathway network. To introduce this method into HIF-1-related research fields and verify its feasibility and effectiveness, we used a network analytical method to explore the structural attributes of the HIF-1 signaling pathway network. Methods First, we analyzed the overall network of the HIF-1 signaling pathway using information retrieved from the Kyoto Encyclopedia of Genes and Genomes (KEGG). We performed topology analysis, centrality analysis, and subgroup analysis of the network. Then, we analyzed the core network based on the overall network analysis. We analyzed the properties of the topology, the bow-tie structure, and the structural complexity of the core network. Results We obtained topological structure diagrams and quantitative indicators of the overall and core networks of the HIF-1 signaling pathway. For the structure diagrams, we generated topology diagrams of the network and the bow-tie structure of the core network. As quantitative indicators, we identified topology, centrality, subgroups, the bow-tie structure, and structural complexity. The topology indicators were the number of nodes, the number of lines, the network diameter, and the network density. The centrality indicators were the degree, closeness, and betweenness. The cohesive subgroup indicator was the components of the network. The bow-tie structure indicators included the core, input, and tendril-like structures. The structural complexity indicators included a power-law fitting model and its scale parameter. Conclusions The core network could be extracted based on the subgroup analysis of the overall network of the HIF-1 signaling pathway. The critical elements of the network could be identified in the centrality analysis. The results of the study show the feasibility and effectiveness of the network analytical method used to explore the network properties of the HIF-1 signaling pathway and provide support for further research.


BACKGROUND
Hypoxia affects the work abilities and health of soldiers at high altitudes (Truesdell & Wilson, 2006;Qiang et al., 2014). Research on the hypoxia response network (HRN) is a necessary field of high-altitude medicine that aims to promote the health and work abilities of soldiers at high altitudes. The hypoxia-inducible factor-1 (HIF-1) signaling pathway is one of the crucial components of the HRN (Harris, 2002;Hockel & Vaupel, 2001). Most studies on the HRN have been carried out based on classical biomedical methods, and these studies have made remarkable achievements. With the development of systems biology, researchers have tried to explore the HRN from a systematic view to compensate for the shortcomings of traditional research methods. Kohn et al. (2004) proposed a theoretical HRN model based on ordinary differential equation behavior. They analyzed the structural composition of HRN in detail with the aim of identifying the core subsystem responsible for HRN conversion. The network has been decomposed into multiple primary paths using the extreme path analysis method (Palsson, 2006). It can be well matched with the consensus of existing research, showing that the path switching or branching effect may be the cause of the intense response to the oxygen concentration. Heiner & Sriram (2010) subsequently constructed the general Petri net structural model of the HRN and analyzed the relative modules and properties of the network algorithmically. Additionally, network analysis methods and tools have been used to analyze biological networks, providing a novel perspective for studying traditional biological problems (Barabasi & Oltvai, 2004). Ding et al. (2009) studied the structural and functional properties of the giant strong component of the B. thuringiensis metabolic network. Zhang & Zhang (2019) performed a proteinprotein interaction network analysis of insecticide resistance molecular mechanisms. Ma & Zeng (2003) studied the connectivity of the metabolic networks of 65 biological species, which did not include the core network of HIF-1 signaling pathways. They found that these biological metabolic networks are similar to the Internet in their macrostructure, which also presents a bow-tie structure (Ma & Zeng, 2003). Network analysis methods focus on the properties of the connections of things and investigate these connections as a whole (Boccaletti et al., 2006). However, this method has not been used to analyze the HRN.
This study describes an analytical method for the hypoxia network involving the analysis of the quantitative network indicators of the HIF-1 signaling pathway from the perspective of complex networks. It aims to explore the structural properties of this network and verify the feasibility of the network analytical method. We analyzed the structural attributes of this network after constructing it from the corresponding biomedical database and then checked the complex properties of the network.

METHODS Datasets
This study focused on the structure of the hypoxia-inducible factor-1 (HIF-1) (Semenza, 1999;Pugh & Ratcliffe, 2003) signaling pathway of the HRN and retrieved related information from the entries in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (Kanehisa & Goto, 2000). KEGG provides data resources related to the high-level functions and operations of cells, organs, ecosystems, and other life systems based on molecular-level information, especially information used for parsing molecular datasets generated by gene sequencing and other high-throughput experimental techniques (Kanehisa et al., 2017). There are two essential elements of KEGG discussed in this paper: the KEGG PATHWAY and KEGG ORTHOLOGY databases. The KEGG PATHWAY database is a collection of knowledge-based hand-drawn maps of pathways based on molecular interaction networks covering metabolic, genetic information processing, environmental information processing, cellular processes, biological systems, human diseases, and drug discovery. KEGG PATHWAY provides a concrete explanation of the meaning of the markers of the HIF-1 signaling pathway. The KEGG ORTHOLOGY (KO) database is a set of ortholog assemblages manually defined to denote the nodes (boxes) in the KEGG PATHWAY maps. The distinctive identifier that determines each KO entry is referred to as the K number ('K' followed by a five-digit number).
We analyzed the data in the KEGG Markup Language (KGML) file. These data contain information that corresponds to the HIF-1 signaling pathway in KEGG PATHWAY. The KGML, which enables the automated mapping of KEGG pathways, is an extensible markup language (XML) representation of the KEGG pathway. It is conducive to the computer-aided analysis and model building of gene/protein and chemical networks. There are two types of graphic elements in the KGML-based metabolic pathways. One is a rectangle, which represents an enzyme, connected by a ''relationship.'' The other is a circle, which represents a compound, connected by a ''reaction.'' In non-metabolic pathways, there are only rectangular elements, which indicates that there are only proteins joined by ''relationships'' in these pathways. The HIF-1 signaling pathway addressed in this study belongs to the metabolic pathways.

Overall analytical methods
The methods of this study are based on network analysis. The network of the HIF-1 signaling pathway is a metabolic network. It is a directed unprivileged network; i.e., the direction of a line between two nodes needs to be considered rather than the weight in the network. The corresponding analytical methods applied in this study are as follows: Process the text in the KGML file of the HIF-1 signaling pathway. Refine the network topology information and transform it into a network file that is identifiable for network analysis. Analyze the network indicators and draw the network topology maps. Analyze and test the structural complexity of the network. The generation of the network file is based on regular expression using the text-editing software Notepad++ and functions using the spreadsheet software Excel. The process of network analysis is based on network analysis models using Pajek. These methods and tools are comprehensively utilized to realize the qualitative and quantitative integration analysis of the network structural properties of the HIF-1 signaling pathway.
Biological network analysis tools are used to identify, analyze, visualize, or simulate nodes (reactants) and edges (reactions) from various input data types, including mathematical models of biological networks. There are four commonly used tools, namely Gephi, Networkx, IGraph, and Pajek. All of them are free for use, and they can handle large graph size. Gephi and Pajek are GUI based network analysis tools, whereas Networkx and IGraph are used in a programming language. We have chosen Pajek as the analysis tool based on the following reasons: First, Pajek supports a multi-relational network graph. There are different reactants in the HIF-1 signaling pathway, such as orthologous enzymes, compounds, and groups. And the kind of their relationships is diverse, including activation reactions, expression reactions, and inhibition reactions. Therefore, the network of the HIF-1 signaling pathway is a multi-relational network which can be analyzed by Pajek. Second, Pajek support partition feature enables the identification and extraction of the HIF-1 signaling pathway core network. Third, Pajek supports more network graph layout algorithms, and its algorithms for analyzing the corresponding network features are better. Forth, Pajek is a standalone software that is easy to learn and easy to use (Akhtar, 2014).

Topology
The network topology indicators include the node number, the line number, the line value, the network density, and the network diameter. The node is an entity that forms the network. The lines between the nodes represent the connections between network entities. The line value is the weight of a line, which indicates the strength of the relationship between network entities. The diameter of the network is the longest path of the shortest distance between pairs of nodes in the network, which is the number of maximum steps required to connect any pair of nodes in the network. The network density is equal to the numeric ratio of the actual connections to the possible connections. It indicates the degree of closeness of the relationships between the nodes in the network.

Centrality
Centrality is an essential concept of network analysis. A highly centralized network supports the convenient transmission of information. The central node has a critical influence on the transmission of information in the network. The metrics for centrality used in this paper include the centrality of the degree, the closeness, and the betweenness.

Degree centrality
The degree centrality of a node is defined as the number of connections of a node. It provides the most intuitive conceptual form of centrality indicator. It can be classified as the overall, input, or output degree centrality. These values correspond to the total number of connections, the number of input connections, and the number of output connections of the node, respectively. The regular formula of the degree centrality is as follows (Wasserman & Faust, 1994): where DC i represents the overall degree, Din i represents the input degree, and Dout i represents the output degree.

Closeness centrality
A node's closeness centrality is the value obtained by dividing the number of all other nodes by the sum of the geodesic distance between the node and all other nodes. The geodesic distance is the number of connections included in the shortest path between the two nodes. The farther away a node is from other nodes, the lower the closeness centrality of the node is, and vice versa. Similar to the degree centrality, the closeness centrality of the nodes involves the overall closeness centrality, the input closeness centrality, and the output closeness centrality. The closeness centrality of a node reflects the nearness of a node to other nodes. The closer a node is to other nodes, the easier it is for information to reach the node, and the higher its closeness centrality is. The regular formula of the closeness centrality is as follows (Wasserman & Faust, 1994): where i = j, N is the number of nodes, and d ij is the shortest pathway between nodes i and j.

Betweenness centrality
A node's betweenness centrality is the ratio of the shortest pathways passing through this node to all of the shortest pathways between any two nodes in the network. The degree centrality and closeness centrality are based on the reachability of a node in the network.
In view of betweenness centrality, if this indicator of a node is high, its importance as an intermediary node in the network is higher, and its bridging ability is active. The normal formula for the betweenness centrality is as follows (Wasserman & Faust, 1994): where i = j = k, g jk represents the number of the shortest pathways between nodes j and k; g jk (i) represents the number of the shortest pathways containing i, and N represents the number of nodes.

Cohesive subgroups
Some entities in the network relate to each other so tightly that they form a small local group known as a cohesive subgroup. The number of cohesive subgroups in the network as well as the local and global associations between subgroups are analyzed via cohesive subgroup analysis. This analysis includes component analysis, K-core analysis, and island analysis based on different analytical indicators and perspectives (De Nooy, Mrvar & Batagelj, 2018). The component analysis is mainly applied in this study. The component is an essential indicator of the cohesive subgroup analysis, which refers to the largest connected subnetwork in the network; i.e., there is a way to reach other nodes between any nodes in the subnetwork. Biometabolism networks often contain components that are not connected, and the most significant component is often the one that needs attention.

Bow-tie structure
The bow-tie property is an essential component of these properties. Broder et al. (2000) revealed that there is a bow-tie structure in the topology structure of the Internet at the macro level.  Figure 1 shows the bow-tie structure. The nodes of the input section can reach the nodes of the core section of a strong connection, while the reverse is not true. The nodes of the core section of the strong connection can reach each other nodes. The nodes of the core section of the strong connection can reach the nodes of the output section, while the reverse is not true. The nodes of the pipe section can reach the nodes of the output section via the input section. The tendril section connects with the input section and the output section. The nodes of the input section can reach the nodes in the same section or the nodes of the output section. There are no connections between the nodes of the non-connected component and other sections. Ma & Zeng (2003) divided the macrostructure of the biological metabolic network into four parts: the giant strong component (GSC), substrate subset (S), product subset (P), and isolated subset (IS). These components correspond to the strongly connected core, input, output, and disconnected components of the Internet bow-tie structure, respectively, while the exact opposite situation is not observed. Any two metabolites within GSC can be generated from each other by a series of reactions. Any metabolite in S can be converted into the corresponding metabolite in GSC, but not vice versa. Any metabolite in P can be transformed via a series of the corresponding metabolites in GSC, while the exact opposite situation is not observed. The metabolites in IS cannot be converted from those in GSC, nor can they be converted into the corresponding metabolites in GSC.

Structural complexity
Qian et al. provided a strict definition of complex networks (Xuesen, Jingyuan & Ruwei, 1993). A network with some or all of the following characteristics is known as a complex network. These characteristics are self-organization, self-similarity, attractor, small world, and scale-free. We start with a scale-free perspective to verify the sophisticated attributes of the network. The scale-free property refers to the invariance of the network's scale, meaning that the degree of the network nodes obeys the law of a power exponential distribution (i.e., the power-law distribution). It is also known as the Pareto distribution rate or the Zipf rate. This distribution law describes the distribution characteristics of the degree of network nodes. A small number of nodes in the network tend to present a large number of connections, but the number of connections of most nodes is very limited. This paper uses the Pareto distribution rate: where x is the probability corresponding to the actual measurement scale of the network, and a is the scale parameter of the power rate distribution. The law of the degree distribution of the network nodes needs to be fitted to the power exponential distribution law to verify this characteristic of the core network of the HIF-1 signaling pathway. That is, the fitting value,â, of the scale parameter, a, is obtained. Then, the characteristic is judged by the fitting effect.

The power-law fitting of the degree distribution
The general method of power-law fitting is to use the least-squares linear fitting based on double-logarithmic coordinates and to use the R 2 test to measure the fitting effect. However, Goldstein et al. believe that the scale result obtained by this method shows a significant error in relation to the corresponding value of the actual measurement network scale (Goldstein, Morris & Yen, 2004). Therefore, Clauset and Barabási et al. proposed a maximum likelihood estimation method for power rate fitting and tested the fitting results using KS statistics (Barabási, Albert & Jeong, 1999;Clauset, Shalizi & Newman, 2009). At present, no research shows that all the data corresponding to the actual measurement scale of the studied network obey the power-law distribution, but there is a critical value. The data corresponding to these scales obey the power-law distribution only when the value corresponding to the actual measurement scale, x, is higher than x min . This study used Clause's universal method, applicable to both discrete and continuous data, to estimate x min . For actual network-scale data, the following formula is used to estimate the scale parameter, a, of the power-law distribution: where x min can be accurately obtained by calculating the maximum difference, L, between the actual measurement-scale data and the corresponding data of the fitted model: where S(x) is the corresponding data of the actual measurement, P(x) is the corresponding data of the fitted power-law distribution model, and x min , which minimizes L, is the optimal value.

Similarity test
This study used the K-S (Kolmogorov-Smirnov) method to test the ''distance'' L between the actual measurement data and the fitted power-law distribution model. The model constructed from actual measurement data is denoted as M . This model produces n sets of data. There is a set of data whose ''distance'', L, from the corresponding fitted model, M , is greater than the ''distance'', L, between the actual measured data and the fitted model, M .
The number of such datasets is m. m n is represented as p and is known as the p value. If the p value is large (close to 1), it can be considered that the statistical fluctuations alone cause the difference between the actual measurement data and the fitted model. If the p value is small, there is room for adjustment in the rationality of the fitted model. If p ≤ 0.1, it can be considered that the actual measured data do not obey the power-law distribution (Ping, 2014). Figure 2 shows the overall topology of the HIF-1 signaling pathway network based on the KGML file from KEGG. Quantitative analysis of the network indicators is performed, and the results are as follows: The numbers of nodes and lines are 85 and 61, respectively. The network has a ring and ringless density of 0.00844291 and 0.00854342, respectively. The average node degree (i.e., the average number of connections) is 1.43529412. The network diameter is 7, and the average distance between the nodes is 2.26761. There are 59 orthologous enzymes, 16 compounds, eight pathways, and two unknown groups in this metabolic network that have no references in KEGG. The numbers of activation reactions, expression reactions, and inhibition reactions are 29, 26, and 6, respectively.

Topology analysis
The core network of the HIF-1 signaling pathway is identified based on cohesive subgroup analysis. The network is extracted for the convenience of further study, as shown in Fig. 3. The quantitative indicators of the network are as follows: the number of nodes is 34, and the number of lines is 34. The network has a ring density of 0.02941176 and a ringless density of 0.03030303. The average node degree is 2. The network diameter is 7, and the average distance between the nodes is 2.57018. There are 24 orthologous enzymes and 10 compounds in this metabolic network. The number of unknown groups that have no references in KEGG is 1; the number of activation reactions is 27; the number of expression reactions is 2; and the number of inhibition reactions is 5.

Degree centrality
The degree centrality of the overall network of the HIF-1 signaling pathway is analyzed and includes the overall centrality, the input centrality, and the output centrality. The top 10 nodes are listed in Table 1. Four nodes appeared in the top 10 list in terms of the three-degree centrality indicators, numbered 20, 60, 61 and 78. They were serine/threonine-protein kinase mTOR(node 20), toll-like receptor 4(node 60), transcription factor p65, nuclear factor NF-kappa-B p105 subunit(node 61), and mitogen-activated protein kinase 1/3(node 78), respectively.

Closeness centrality
The analysis of the closeness centrality of the overall network of the HIF-1 signaling pathway is performed to calculate the total closeness centrality, the input closeness centrality, and the output closeness centrality. Table 2 lists the top 10 nodes in terms of the closeness centrality. There are two nodes with both the overall and the input closeness centrality ranked in the top 10, numbered 29 and 82. They are pyruvate dehydrogenase kinase isoform 1 and apoptosis regulator Bcl-2, respectively. The number of the node with the overall and output closeness centrality in the top 10 is 85, which is an undefined complex of gene products in KEGG. The node with the input closeness centrality and output closeness centrality ranked in the top 10 is numbered 20, which is serine/threonine-protein kinase mTOR.

Figure 3
The core network bow-tie structure of the HIF-1 signaling pathway. According to the analysis results, the bow-tie structure of the core network of the HIF-1 signaling pathway includes three types of components. The first is a strongly connected core, corresponding to the yellow node, ko: K08268, in the figure, and the corresponding reactant in the KEGG database is HIF-1 α. The second is the input components, including 20 green nodes, such as ko: K09592, corresponding to 20 reactants, such as PHD (EGLN, HPH), in the KEGG database. The third is a tendril-like structure including 13 pink nodes, such as ko: K04526, ko: K04357, and ko: K05459, which correspond to 13 reactants, such as GF (INS, EGF, IGF1), in the KEGG database.   Table 3 shows the betweenness analysis results of the overall network of the HIF-1 signaling pathway. Two nodes appeared in the top 10 list in terms of the betweenness centrality and degree centrality, numbered 20 and 78, which are serine/threonine-protein kinase mTOR and mitogen-activated protein kinase 1/3, respectively.

Cohesive subgroup analysis
First, the distribution of the subset of the overall network is determined. Then, the core network based on the distribution is determined. The minimum component size is 1; i.e., the smallest subnet can be an isolated node without connections. Table 4 shows the analysis results.
The data in the table show that the overall network of the HIF-1 signaling pathway includes 25 subnets. The minimum size of the subnet is 1; i.e., there is only one node in the subnet. The maximum size is 34; i.e., 34 nodes are in the subnet.
We colored different subnets differently for the convenience of analysis in this paper. The number of the subnet to which the node belongs is identified. The subnet distribution map of the overall network of the HIF-1 signaling pathway is obtained based on Pajek, as shown in Fig. 2. Figure 2 depicts the subnet distribution of the overall network of the HIF-1 signaling pathway graphically. For example, ''[6.00] 20'' means that node 20 belongs to the number 6 subnet. Table 4 and Fig. 2 show that the number 6 subnet contains most nodes of all the subnets and has meaningful reaction relationships. Therefore, subnet 6 is the core network of the HIF-1 signaling pathway.

Bow-tie structure analysis
The Bow-tie analysis module of Pajek software was used to analyze the macrostructure of the core network of the HIF-1 signaling pathway. Table 5 shows the distribution of the structure.
We colored the different structures in the core network topology of the HIF-1 signaling pathway, as shown in Fig. 3. Table 5 The quantity distribution of the nodes in the core network of the HIF-1 signaling pathway. The specification of the vertices of the core component of the HIF-1 signaling pathway is shown in Table 6.

Structural complexity analysis
We analyzed the distribution of network node degrees. That is, the scale of the core network of the HIF-1 signaling pathway is measured. The degree distribution of the core network of the HIF-1 signaling pathway is shown in Table 7.
Based on the above methods, the power-law fitting results of the core network scale of the HIF-1 signaling pathway in this study are shown in Fig. 4.
Based on Clauset's method, the power-law fitting model for the core network of the HIF-1 signaling pathway can be calculated as follows: The scale parameter of the power-law distribution of the core network of the HIF-1 signaling pathway is 3.28, and x min = 2. That is, the nodes with a degree of 2 or more in the network obey the Pareto distribution rate. The similarity detection index of this fitted model is much larger than 0.1 and closer to 1. This indicates that the fitting effect is good.

DISCUSSION
The overall network centrality analysis of the HIF-1 signaling pathway suggests that K07203 appears in the top 10 list in terms of the betweenness centrality, input closeness centrality, and output closeness centrality. It corresponds to the mTOR enzyme, which is known as an essential element in the field of HRN research. These results suggest that the mTOR enzyme has a crucial influence on HIF-1 signaling, which conforms to the known findings, indicating the effectiveness of the network analytical method.
With regard to the component analysis of the overall network of the HIF-1 signaling pathway, the largest connected subnet of the overall network of the HIF-1 signaling pathway is the number 6 subnet; i.e., the number 6 subnet is the core network of the HIF-1 signaling pathway, which is consistent with the findings of current biomedical research.
With respect to the bow-tie structure analysis, the work scope of Ma et al. does not involve the macrostructure analysis of the core network of the HIF-1 signaling pathway (Ma & Zeng, 2003). Therefore, this study can be regarded as a supplement to their work   and as a verification of their conclusions. The core network of the HIF-1 signaling pathway has a bow-tie structure. It includes macrostructures such as a strongly connected core, an input, and a tendril-like structure. It is worth noting that there are no output components in the structure; that is, HIF-1α has no metabolites. The reason is that the corresponding orthologous enzymes are not listed in the KGML file of KEGG. In KEGG's hand-painted HIF-1 signaling pathway diagram, HIF-1α has two output pathways: aerobic degradation and hypoxic DNA expression. However, there are no orthologous enzyme products. Therefore, the diagram is consistent with the analysis results obtained by this study.
network of the HIF-1 signaling pathway and the biomedical meaning of the network indicators based on the construction of the network using the data retrieved from the KEGG database. The centrality indicators of the overall network, the remarkable nodes, and the biomedical meaning of these nodes could be identified through centrality analysis. Based on component analysis, the cohesive subgroups of the overall network were obtained; the different subnets were indicated; and the core HRN was identified. In the analysis of the core network, we refined the core network of the HIF-1 signaling pathway via topology analysis of the network information based on the overall network analysis. We obtained a topological structure diagram and the quantitative indicators of this network. Then, we determined the bow-tie structure existing in this network through the analysis of the bow-tie structure. This structure is composed of three types of components: the core structure, the input structure, and the tendril-like structure. We can verify and further add to the existing research conclusions by analyzing each part's node distribution. Additionally, a power-law fitting model of this network was constructed through the verification of its complex attributes. The results showed that the power-law fitting model of the core network of the HIF-1 signaling pathway presented a good fitting effect. This proved that the core network of the HIF-1 signaling pathway has scale-free characteristics and is a relatively typical complex network. We are aware that our research may have two limitations. The first is some reactants in the network undefined in KEGG, which prevent in-depth recognition of HRN. The second is the network analysis performed in this research confined to the HIF-signaling pathway, while other essential pathways of HRN are not analyzed. These limitations highlight the direction for follow-up in-depth research. Although this network analysis is preliminary work, it enriches and improves upon the existing research conclusions. It provides a basis for in-depth research on HRNs at different levels using network analytical methods. In particular, these results can be applied in parallel with traditional studies of the HRN, which will promote and complement each other. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.