A Network Pharmacology Approach to Uncover the Molecular Mechanisms of Herbal Formula Ban-Xia-Xie-Xin-Tang

Ban-Xia-Xie-Xin-Tang (BXXXT) is a classical formula from Shang-Han-Lun which is one of the earliest books of TCM clinical practice. In this work, we investigated the therapeutic mechanisms of BXXXT for the treatment of multiple diseases using a network pharmacology approach. Here three BXXXT representative diseases (colitis, diabetes mellitus, and gastric cancer) were discussed, and we focus on in silico methods that integrate drug-likeness screening, target prioritizing, and multilayer network extending. A total of 140 core targets and 72 representative compounds were finally identified to elucidate the pharmacology of BXXXT formula. After constructing multilayer networks, a good overlap between BXXXT nodes and disease nodes was observed at each level, and the network-based proximity analysis shows that the relevance between the formula targets and disease genes was significant according to the shortest path distance (SPD) and a random walk with restart (RWR) based scores for each disease. We found that there were 22 key pathways significantly associated with BXXXT, and the therapeutic effects of BXXXT were likely addressed by regulating a combination of targets in a modular pattern. Furthermore, the synergistic effects among BXXXT herbs were highlighted by elucidating the molecular mechanisms of individual herbs, and the traditional theory of “Jun-Chen-Zuo-Shi” of TCM formula was effectively interpreted from a network perspective. The proposed approach provides an effective strategy to uncover the mechanisms of action and combinatorial rules of BXXXT formula in a holistic manner.


Introduction
Traditional Chinese medicine (TCM) has been widely used in China for thousands of years. Understanding the mechanisms of TCM helps to improve its clinical application. Therefore, there has been considerable interest in the development of experimental technologies to discover relationships between biological processes and TCM treatment, such as highthroughput biological profiling [1]. Recent advances in this field result in large accumulated data, and more powerful tools and approaches should be needed to achieve a comprehensive analysis by integrating systematic information to contextualize the holistic characteristic of TCM rather than focus on highly selective agents targeting specific proteins. Recently, a network pharmacology approach has been proposed, and the overall analysis strategy has been shifting away from "one drug-one target-one disease" to the idea of binding multiple targets [2][3][4][5][6]. Li [7] presented a novel concept of "network target" that regards the biological network as a target through which the best drug intervention can be designed and developed. His group introduced this network-based methodology to investigate the basic biological knowledge underling TCM [8,9]. Network pharmacology provides a deeper insight or scientific evidence for TCM knowledge and helps us elucidate action mechanisms at a biological molecular level. Recent advances in TCM network pharmacology research have demonstrated the systematic mechanisms of TCM treatment for complex diseases [10][11][12][13][14]. Several classical herbal formulas [15][16][17][18] were identified as scientifically proven in a systematic manner, as well as their combinatorial rules. For example, Liang et al. [18] applied a network analysis to point out the mechanism of Liu-Wei-Di-Huang pill for the treatments of different diseases. The key biological processes of Qing-Luo-Yin against rheumatoid 2 Evidence-Based Complementary and Alternative Medicine arthritis were revealed by the integrative TCM network pharmacology platform [11]. Li et al. [19] performed a system-level investigation into the mechanisms of compound Danshen formula for the treatment of cardiovascular disease by integrating oral bioavailability screening and molecular docking techniques.
Ban-Xia-Xie-Xin-Tang (BXXXT) [20] is a classical formula from Shang-Han-Lun which is one of the earliest books of TCM clinical practice. BXXXT [20,21] have illustrated that BXXXT possesses anti-inflammatory activities, and it can be used for the treatments of various digestive inflammations such as colitis, esophagitis, and gastritis [20,22]. On the other hand, multiple and diverse indications that extend the applications of original context were reported for the use of BXXXT. These extensive indications include functional dyspepsia [21], diabetes mellitus (DM) [23], gastric cancer (GC) [24], and cardiovascular disease [25,26]. Although some targets and pathways were examined in the past few decades [27][28][29], the biological mechanism of BXXXT in a holistic manner remains unknown because of its complex nature. To better understand its molecular basis of "one formula, different diseases", in this work, a network pharmacology approach was proposed to uncover the mechanisms of BXXXT for the treatments of three representative diseases including colitis, DM, and GC. Herein we focus on in silico methods, and the main contributions of our work include the following. (1) Chemical characteristics of BXXXT compounds were analyzed, and the active compounds of the formula were recognized by using a druglikeness measure. Then an overrepresentation analysis was performed to determine the core targets of the formula, and the core ingredients were identified after a target score function was defined. (2) Multilayer networks were constructed based on the core targets and disease (colitis, DM, and GC) genes, respectively. Comparisons of network statistics were performed to better understand the biological profiles. Furthermore, two network-based proximity measures were used to evaluate relationships between ingredient targets and disease genes. (3) Biological functional annotation analysis was performed on ingredient targets and disease genes, respectively. Then common biological entities and processes between diseases and the formula could be achieved to facilitate the analysis of molecular mechanisms of BXXXT for the treatments of multiple diseases. Our results show that there were good overlaps between formula-related network and disease-related network at each layer level, and BXXXT treats multiple diseases probably by regulating some potential shared biological processes. This work provides a new insight into the biological mechanisms of BXXXT and promotes the drug development based on this formula.

Materials and Methods
. . Data Preparation . . . Ingredients of BXXXT and eir Targets. There are seven herbs in BXXXT, and the compounds contained in these herbs can be considered as ingredients of BXXXT. Ingredients of BXXXT were collected from four data sources: (1) Chinese academy of sciences Chemistry Database [19], (2) HIT database [30,31], (3) TCMID database [30,32], and (4) traditional Chinese medicine information database [33]. All compounds were processed as follows: (1) they were represented by SMILES format and were prepared by Prepare Ligand Module in Discovery Studio 2.5 (DS2.5) and (2) duplicates were removed. A total of 560 unique compounds were obtained. In addition to the above data sources, STITCH database [34] was used to retrieve compound targets. This leads to a data set of 4680 interactions between 194 compounds and 2213 targets. All compounds and their targets are available in Supporting Information Table S1.
. . . Disease Associated Genes. In this work, colitis, DM, and GC were considered as BXXXT diseases, and their associated genes were collected from the Online Mendelian Inheritance in Man (OMIM) database [35] and therapeutic target database (TTD) [36], respectively. As a result, we obtained 28, 119, and 30 genes associated with colitis, DM, and GC, respectively. All disease associated genes are available in Supporting Information Table S2.
. . . FDA-Approved Drugs and eir Targets. The FDAapproved drugs for the treatment of colitis, DM, and GC and their targets were extracted from the DrugBank database [37]. The results are as follows: (1) 9 drugs and 21 targets for colitis, (2) 40 drugs and 43 targets for DM, and (3) 8 drugs and 19 targets for GC. The detailed information of these drugs and targets are available in Supporting Information Table S3. . . . Protein-Protein Interaction Data. In this work, proteinprotein interaction (PPI) data were imported from Human Integrated Protein-Protein Interaction rEference (HIPPIE) database [38,39]. HIPPIE has integrated interactions from some studies [38,39] and multiple public PPI databases such as HPRD [40], BioGRID [41], IntAct [42], MINT [43], DIP [44], and BIND [45]. Furthermore, confidence scores of interactions are presented in HIPPIE to signify the reliability of their experimental evidences. In total, HIPPIE (version 1.8) consists of 239685 interactions. To obtain high-confidence interactions, we have removed the bottom scored interactions (score = 0). This leads to a PPI data concerning 16503 genes and 230434 interactions.
. . Chemical Characteristics of BXXXT. Oral administration is the main route of administration for TCM. However, it is limited by the drug's ADME (absorption, distribution, metabolism, and elimination) characteristics [12,46,47]. The poor ADME properties are largely accountable for drug failure in exerting pharmacodynamic effect on target site in Evidence-Based Complementary and Alternative Medicine 3 vivo [12]. Drug-likeness is usually used to signify whether the compound has acceptable ADME properties. The assessment of drug-likeness of compounds helps to identify the bioactive ingredients in TCM formulas [12,48]. In this work, the quantitative estimate of drug-likeness (QED) presented by Bickerton [49] was used to prescreen pharmaceutically active compounds in BXXXT. This measure of drug-likeness is calculated by integrating eight physicochemical properties of molecules [49]: (1) molecular mass, (2) number of hydrogen bond donors (HBDs), (3) octanol-water partition coefficient (ALOGP), (4) number of hydrogen bond acceptors (HBAs), (5) number of rotatable bonds (ROTBs), (6) number of aromatic rings (AROMs), (7) molecular polar surface area (PSA), and (8) number of structural alerts (ALERTS). Our previous work suggested that QED values can be used for the assessment of some ADME characteristics straightforwardly [46].
. . Target Profiling of BXXXT Ingredients. The core targets of BXXXT can be determined by the analysis of target profiling of BXXX ingredients. The current work is strongly motivated by the previously published approach [18] for the analysis of core targets. The target that interacts with many compounds can be regarded as a core target in the pharmacological effects of the formula. Then, the probability of a target being core target was evaluated as follows using a binomial statistical model: where is the number of compounds interacting with the investigated target, is the total number of compounds, and denotes the average number of compounds per target. Then ( ≥ ), after being adjusted by the false discovery rate method, measures the probability of a target interacting with more than compounds in the profile of compounds by random chance. These statistics help to estimate how likely that the result is stochastic. The target with a low value ( value < 0.05), which means that the observed number of interacting compounds is significantly larger than the expected one, can be considered as a core target for the formula.
Then the score for a specific target (GS) was assessed as follows: where of the core target was calculated by use of a numerator that equaled the negative logarithm of ( ≥ ) and a denominator that equaled the rank of ( ≥ ); otherwise it equaled zero. In order to identify the core ingredients of BXXXT, the score of a compound (CS) was calculated by averaging its corresponding target scores: where represents the number of interacting targets for compound and is the score of target in the target profile of compound .

. . Network Construction and Analysis
. . . Network Construction. In this work, we consider the core targets of the formula as formula targets. In order to elucidate the biological profiles of BXXXT at different levels, multilayer networks were built based on formula targets and disease (colitis, DM, and GC) genes, respectively. These multilayer networks mainly include three levels as follows.
(1) Core net (CN) level: here we consider the formula targets or disease genes as query genes. The query genes were mapped to PPI network. Then the CN of the formula and disease was generated from a subnet containing only the query genes and all the edges among them, respectively.
(2) Shortest path extending net (SPEN) level: it is assumed that the intermediate genes which are along the shortest paths between all pairs of query genes in PPI network may have a high probability of being important roles for the coregulation of query genes [50]. The SPEN was generated by adding the intermediate genes, and the extension steps were performed as follows: first, the lengths of pairwise shortest paths between query genes were calculated. To reduce the high-false positives on the long paths, those paths whose lengths were no greater than 3 were picked out for investigation. Second, the query genes were updated by adding the genes along the satisfying paths. Finally, The SPEN of the formula and disease was generated by the same way that the CN was built.
(3) Neighbor extending net (NEN) level: the query genes were updated by adding their neighbors in PPI network. Then the NEN of the formula and disease was generated by the same way that the CN was built.
As shown in Figure 1, for example, there is a simple PPI network consisting of eight genes, named Genes A∼H, respectively. Gene A and Gene B are query genes. Then the CN contains only Gene A and Gene B. The length of shortest path between A and B is 2 (≤3), and Gene E on this path is selected for extending the network. The SPEN contains Gene A, Gene B, and Gene E, and all the edges among them. Gene A's neighbors (E and D) and Gene B's neighbors (E and C), together with the query genes, are used for the NEN construction. After multilayer network construction, descriptive analysis of network characteristics was performed for the formula networks and disease networks, respectively. Besides, a similarity score function was defined as follows to measure the degree of node overlap between networks.   the number of common nodes between networks, and min is a function to calculate the minimum number of nodes in networks.
. . . Functional Profile Analysis. For functional analysis, enrichment analysis was implemented to identify whether a gene ontology (GO) term or a pathway was significantly associated with the formula targets and disease genes, respectively. A hypergeometric test was used to estimate the association of annotation terms to the query genes, and the probability of getting at least genes from the reference list by chance can be calculated as follows [51].
where is the total number of genes from the reference terms, represents the number of genes annotated by a specific reference term (GO or pathway), is the number of query genes, and denotes the number of common genes between query genes and the reference set. was adjusted by the false discovery rate method, and the low adjusted value (<0.01) indicate the significant association. In this work, GO enrichment analysis for molecular function (MF), cellular component (CC), and biological process (BP) was performed, and pathway enrichment analysis was based on KEGG pathway database [52].
. . . Network-Based Proximity Measures. Generally speaking, the network-based relevance inference methods make use of information from the network topological structure [53,54]. It is important to consider the relevance between the formula targets and disease genes for the elucidation of the associations between the formula and disease. The basic assumption is that if the formula is effective for the treatment of the disease, the formula targets are likely to be close to disease genes in the PPI network [55]. In this study, two networkbased proximity measure approaches were used to capture such relatedness: the shortest path distance (SPD) [56] and a random walk with restart (RWR) [57] based scoring method.
The shortest path between genes is a path with the minimal number of genes, and the distance between genes is the number of edges in a shortest path linking them. The average SPD between the formula targets and disease genes in the PPI network was calculated. The RWR method is a diffusion based approach [57]. In this approach, a set of random walkers were simulated. They started at disease genes and moved to their neighbors randomly at each step. Then the probability of the random walkers hitting the formula targets at step could be calculated iteratively as follows.
where 0 , , and +1 represent the probability of finding the random walkers at the formula targets at initial state, step , and step +1, respectively. is the transition matrix of the PPI network, and is the restart probability with which the random walkers can return to the disease genes. Here, 0 represents the initial probability of the random walkers hitting the formula targets, and 0 was denoted by a vector with equal probabilities assigned to disease genes in the PPI network. In this work, was set to 0.7, and the initial probabilities of disease genes were all set to 1 and others were all set to zero. A stable state will be reached if the difference between and +1 falls below 10 −10 after some steps. Then ∞ can be considered as a proximity score for measuring the closeness between formula targets and disease genes.
. . . Network Module Analysis. A network module can be viewed as a group of densely connected nodes in the interactome network. Identification of gene modules can provide better understanding of the underlying mechanisms. With the increasing application of network technique, several methods have been devised and implemented to detect communities of a complex network [58][59][60][61][62][63][64][65][66]. In this work, five different network module identification methods were explored and compared: (1) the fast greedy (FG) modularity  based algorithm [59], (2) molecular complex detection (MCODE) [63], (3) neighbor-sharing score with hierarchical agglomerative clustering for module identification (NeMo) [64], (4) Mofinder [65], and (5) IPCA [66]. A scoring scheme was developed to measure the goodness of network partition. This module score was defined as : where is the modularity of the divided network and was calculated as (8) [59] in which is the number of edges, represents the network adjacency matrix, is the node degree of gene, and denotes an indicator showing whether the two genes belong to the same module. According to the definition above, measures the degree of density for the module.
is a biological homogeneity index that measures the functional homogeneity of genes within a module, and was calculated using (9) [67] in which is the number of modules, is the total number of the genes in module annotated in KEGG database, and denotes an indicator showing whether the two genes belong to the same KEGG pathway. More details about this metric have been described elsewhere [67]. In our scoring scheme, both metrics fall in the range of 0 to 1, and a large value indicates a good network partition method. Then the best network module identification method was determined and implemented, and the functional modules were further analyzed to explore the molecular mechanisms.

Results and Discussion
. . Chemical Characteristics of BXXXT Compounds. A total of 560 unique compounds were obtained from seven herbs in BXXXT. In order to characterize the structure features and assess the drug-likeness of compounds in BXXXT, a combined dataset was constructed by merging 1805 FDAapproved drugs from the DrugBank database [37] and BXXXT compounds. Then eight drug-likeness descriptors were calculated for each compound, and a principal component analysis (PCA) was employed to visually characterize the spatial distributions of the two classes of compounds. Figure 2 shows the PCA score plot. The total variance explained by the first three principal components was 73.37%. The BXXXT and DrugBank compounds were coloured in red and blue, respectively. As seen in Figure 2, there is no  clearly defined separation between the BXXXT and Drug-Bank compounds on the major source of variation, which means that the BXXXT compounds share the most of the chemical space with the DrugBank drugs. The distribution of the weighted composite of the other eight drug-likeness descriptors by the desirability function (QED) is shown in Figure 3. There are more than 80 percent of DrugBank drugs with the QED value greater than 0.25, and a good overlap is observed in the distributions between BXXXT compounds and DrugBank drugs. The distributions of QED of both classes are slightly skewed toward higher values. The comparison analysis of the property distributions suggests that most BXXXT compounds may be drug-like. In this work, a cutoff value of 0.25 was used to filter out the potential non-drug-like BXXXT compounds. Consequently, 402 BXXXT compounds with QED values greater than 0.25 and their corresponding targets were remained for subsequent analysis.
. . e Formula Target Profiling. After removing compounds without any interactions, a compound-target dataset of 3582 interactions between 144 compounds and 1850 targets was obtained. Then each gene was scored by (2), and a total of 140 protein targets with GS value greater than zero were selected as the formula targets (Table 1). Similarly, each compound was ranked according to its CS value calculated by (3), and the top 72 compounds that covered all of the formula targets were considered as representative compounds in BXXXT. All representative compounds and their CS values are available in Supporting Information Table S4. Table 2 summarizes the numbers of the formula targets and representative compounds of each herb in BXXXT. We can see that the numbers of representative compounds are all below 20, while the numbers of targets vary widely from 37 to 126.
To compare the mechanisms of BXXXT and approved drugs for the treatment of according disease, a simple overlap analysis was performed on their target profiling, respectively. As a result, there were 16 common target genes shared between BXXXT and approved drugs, and the number of common targets for colitis, DM, and GC was 8, 6, and 3, respectively. As can be seen from Table 3, most of common targets were shared by colitis drugs, and three colitis drugs (mesalazine, sulfasalazine, and balsalazide) all had more than 3 common targets with BXXXT. These findings suggest that BXXXT may have the similar mechanisms of action of these drugs for the treatment of colitis.
. . Multilayer Network Analysis. In our investigation, multilayer networks were constructed based on the formula targets and disease genes, respectively. The general network properties of these networks are summarized in Table 4. We can see that the CNs were more sparse than SPENs and NENs because no extension methods were applied, while, for each level network, the formula network shows very different from the corresponding disease networks in terms of most of network parameters. According to our extension technique, the set of CN genes is a subset of that of SPEN genes, and the set of SPEN genes is a subset of that of NEN genes. SPENs were most dense by adding the close genes into CNs. The degree of node overlap between networks was measured for each level ( Table 5). As can be seen in Table 5, the similarity scores are all greater than zero, which means that BXXXT targets the disease genes directly for each level. There was a good overlap between BXXXT nodes and disease nodes. As expected, higher similarity scores were observed at SPEN level and NEN level. The largest similarity score (0.8273) was observed Evidence-Based Complementary and Alternative Medicine 7  (gene symbol)  ABCA1, ACHE, ADORA1, PARP1, ADRA2C, AHR, AKT1, ALB, AKR1B1, ALOX5, ALOX15, BIRC5, BAX, BCHE, CCND1, BCL2, BDNF,  CA1, CA2, CA3, CA4, CA5A, CA6, CA7, CA9, CA12, CASP3, CASP8, CASP9, CAT, CDK1, CDK2, COMT, MAPK14, CSN1S1, CYP1A1,  CYP1A2, CYP1B1, CYP2B6, CYP2C19, CYP2C9, CYP2D6, CYP2E1, CYP3A4, CYP19A1, DECR1, NQO1, DRD2, DRD3, DRD4, EGFR,  ESR1, ESR2, F3, FASN, FLT3, FOS, GCG, GSK3B, GSTP1, HIF1A, HMGCR, HMOX1, HRH1, HRH2, HTR1A, HTR7, ICAM1, IFNG, IL1B,  IL4, IL6, CXCL8, INS, INSR, EIF6      between BXXXT and colitis at NEN level, which means that 82.73% of colitis-NEN genes were also included in BXXXT-NEN. Then the diseases were ranked according to their similarity scores as follows: colitis > GC > DM. These findings suggest that the therapeutic effects of BXXXT may be associated with the ability of the formula compounds targeting on both the disease genes directly and the genes closely connected to disease genes indirectly, and the most highly associated disease is colitis.
. . Functional Annotations . . . GO Enrichment Analysis. GO enrichment analysis was carried out for the identification of common features of the formula targets and disease genes in terms of GO annotation, respectively. The detailed results of enrichment analysis for three ontologies (MF, BP, and CC) are available in Supporting Information Tables S5-S7. The number of GO terms that are significantly associated with the formula targets for MF, BP, and CC is 84, 1515, and 21 (adjusted P value < 0.01), respectively. By comparison with the results of disease genes, we found that the formula shared many significantly associated GO terms with the diseases for each ontology. The number of common significantly associated MF GO terms between the formula and colitis, DM, and GC is 2, 3, and 3, respectively. Similarly, the numbers of common significantly associated BP GO terms and CC GO terms between the formula and colitis, DM, and GC are 292 and 2, 431 and 6, and 119 and 2, respectively. Figures 4-6 show the top 10 GO terms associated with the formula and diseases for three ontologies, respectively. For MF ontology, the top 2 GO terms (GO:0005125 and GO:0005126) associated with colitis are cytokine-related terms, and they are also significantly associated with BXXXT. Although there were a small number of significant terms for CC ontology and several common terms between BXXXT and diseases were still observed. Good overlaps among all categories were found for BP ontology, which suggested that BXXXT treats multiple diseases probably by regulating some potential shared biological processes among diseases.  Furthermore, the cumulative distribution of the percentages of common enriched-GO terms was used as a measure for comparing the degree of association between BXXXT and different diseases [70]. The motivation of using this measure is that a disease that has a higher probability of association with BXXXT will share more common terms in the top-k enriched-GO terms of BXXXT. The significantly enriched-GO terms were combined and sorted in ascending order according to adjusted P values for each group, and the percentage of common terms from top (30) enriched terms in BXXXT for each disease is presented in Figure 7. We observe that the cumulative curve of colitis achieved the largest percentage value at each value of . The area under the cumulative curve (AUCC) was calculated using trapezoidal integration. The AUCC value was 20.43, 14.57, and 13.61 for colitis, DM, and GC, respectively. These findings highlight the strong association between BXXXT targets and colitis genes in terms of GO.
. . . KEGG Pathway Enrichment Analysis. The formula targets were mapped onto KEGG pathways, and a pathway enrichment analysis was performed to identify the biological pathways regulated by BXXXT. In this section, in order to elucidate the basic biological process, we ignored the KEGG pathway sections of human disease and drug development. As a result, 22 key pathways with adjusted P value < 0.01 were found to be significantly associated with BXXXT (Table 6).
BXXXT acts on a large fraction of pathways in signal transduction and endocrine system. The top significantly affected pathway is tumor necrosis factor (TNF) signaling pathway (Figure 8), and the formula targets were coloured in red. As can be seen in Figure 8, a total of 20 genes are regulated by BXXXT. As a critical cytokine, TNF is associated with various physiological and pathological processes such as coagulation, cell proliferation and apoptosis, and proinflammation [71]. BXXXT has effects on two TNF receptors (TNFR1 and TNFR2). Through complex signaling cascades and networks, these effects lead to the regulations of nuclear factor-kappa B (NF-Kappa B), activation protein-1 (AP-1) via the jun nh2terminal kinase (JNK), and caspase family members (CASP8 and CASP3). Some of our predictions have been supported by recent experimental evidence [29]. Chen's experiment [29] found that BXXXT significantly regulated the level of TNF-, IL-1 , IL-17, IL-23, COX-2, p-p65, MPO, SOD, and Nrf2 in colorectal tissue of dextran sulfate sodium-induced chronic ulcerative colitis mice. TNF mediates many genes that are involved in other pathways including NF-kappa B signaling pathway [29], PI3K-Akt signaling pathway, and apoptosis pathway leading to cell survival and death [72]. Table 6 shows that these crossed pathways are also significant (adjusted P value < 0.01), implying that BXXXT exhibits multifunctional biological activities by regulating multiple pathways.
Moreover, the associations between these key pathways and disease genes were investigated in the same way, and the results are also presented in Table 6. We can find that the  number of significantly overrepesented pathways for colitis, DM, and GC was 5, 9, and 10, respectively. This means that each key pathway is enriched by both the formula targets and disease genes. BXXXT can be used to treat multiple diseases probably by regulating some common biological pathways. The results of recent experiment [73] showed that the therapeutic mechanism of action of BXXXT was via insulin signaling pathway, which were concordant with our predictions (Table 6). Similarly, the cumulative distribution of the percentages of common enriched pathways (Figure 9) shows that the curve of colitis achieved the largest percentage value in top 12 enriched pathways, and the AUCC value was 9.52, 3.40, and 8.08 for colitis, DM, and GC, respectively. These findings further confirmed the strong association between BXXXT targets and colitis genes in terms of KEGG pathway.
. . Network-Based Proximity Analysis. To measure relevance between the formula targets and disease genes, SPD and RWR based scores were calculated. The permutation test was used to assess the significance of relevance scores. For each disease, we kept the original formula targets and randomly selected the same number of genes as that of the disease in PPI network, and a shuffled version score was calculated. This procedure was repeated independently 2000 times, and the P value of the original relevance score was derived thereafter. Table 7 summarizes the results of network-based proximity analysis. We can see that the average SPD score is significantly smaller between the formula targets and disease genes than between the formula targets and randomly selected genes (P value < 0.01) for each disease, while the average RWR score is significantly larger. The results highlight the specificity of BXXXT for treating these diseases.
. . Network Module Analysis. In this study, we compared five different network module detection methods based on BXXXT-SPEN, and Table 8 presents the results. We found that Mofinder achieved the highest MS score and the moderate number of modules. Then the modules resulting from Mofinder were saved and further analyzed. These modules  could be considered as primarily pharmacological units of BXXXT. The association between each module and each disease genes was scored using the RWR method. The detailed results are available in Supporting Information Table  S8. The top associated module where there were comembers of the formula targets for each disease is plotted in Figure 10. As shown in Figure 10, the top associated module with colitis (M76), DM (M81), and GC (M46) was coded in red, yellow, and green, respectively. The formula targets were shaped by squares and the intermediate genes were shaped by circles. We found that M76 mainly consists of the interleukin family cytokines. These cytokine genes including IL4, IL13RA2, IL13RA1, IL4R, and IL2RG have been shown to play key roles in mediating the inflammation process [74]. M81 mainly consists of the adenosine related genes, and these proteins were found to be associated with metabolic diseases by many studies [75][76][77][78][79][80][81][82][83][84][85][86]. As an intermediate gene, the wellknown target (DPP4) was included in M81 by our extending approach. A relatively new class of diabetes drugs have been successfully developed by blocking DPP4 such as sitagliptin, vildagliptin, and saxagliptin [87]. In M46, protein FLT3, RASA1, and PTPRJ are considered as key signaling nodes to coordinately regulate various cellular processes including cellular proliferation, differentiation, mitotic cycle, and oncogenic transformation [88][89][90][91][92][93][94][95]. KDR acts as a major growth factor for endothelial cells, allowing regulation of endothelial proliferation, migration, and survival [96], which is associated with a variety of tumor types [97,98]. KEGG pathway enrichment analysis was implemented to identify the pathways regulated by the above modules and disease genes, respectively. Figure 11 shows that a total of 10 pathways were found to be significantly associated with the module genes (adjusted P value < 0.01). The most relevant functions and pathways for M76, M81, and M46 were related to Jak-STAT signaling pathway and cytokine-cytokine receptor interaction, cAMP signaling pathway, and Ras signaling pathway, respectively. Most of these pathways were also significantly associated with the disease genes. Such good overlaps   suggest that the involved pathways partly explain the molecular mechanisms of BXXXT in treating multiple diseases.
. . Combinatorial Rules of BXXXT. Considering that BXXXT was developed for treating digestive inflammations according to the theory of original context, colitis is regarded as a major indication for BXXXT. In this section, to better elucidate combinatorial rules of BXXXT for treating colitis, the mechanism of action of each herb in the formula was analyzed in the same way. First, targets of each herb were assessed using (2), and top 30 ranked targets according to their GSs were selected as the core targets of herb, named herb targets. Second, KEGG pathway enrichment analysis was performed for herb targets to identify the biological pathways of each herb. Third, to understand the role of herbs in BXXXT for treating major disease, the relevance between herb targets and colitis genes was analyzed based on RWR scores. The herb targets of BXXXT are available in Supporting Information    Table S9. A simple overlap analysis between herb targets and colitis genes shows that the most common target (TNF) is shared by four herbs, which indicates the synergistic effect of the formula compositions. The significantly overrepresented pathways with adjusted P value < 0.01 for each herb were identified ( Figure 12). The number of significantly enriched pathways for BX, GJ, HL, HQ, RS, DZ, and RGC was 18,22,9,34,7,4 of significantly enriched pathways (adjusted P value < 0.01) among BX, GJ, HQ, and RGC, and a total of 31 pathways are regulated by multiple herbs. These findings suggest that the synergistic strength of multiple herbs in BXXXT may be associated with common regulated pathways. RWR based score was calculated to measure relevance between colitis genes and herb targets for each herb. It should be noted that targets with greater GS values could be more important. Therefore, in this procedure, herb targets were sorted in descending order according to their GSs. Then each target's RWR score was sequentially calculated, and Figure 13 presents the cumulative distribution curve of RWR score of top N targets for each herb. We can see that BX, GJ, and HL achieved higher scores, while RS and DZ showed smaller relevance with the disease. TCM prescriptions are usually based on the principle of "Jun-Chen-Zuo-Shi". Herbs in the formula play different roles during treatment. According to the theory of original context, "Jun" herb treats the disease directly, while others can treat the disease indirectly. In BXXXT, BX acts as "Jun" herb, GJ, HL, and HQ are "Chen" herbs, DZ and RS are "Zuo" herbs, and "RGC" serves as "Shi" herb. After calculating AUCC of RWR score curve for each herb (Figure 14), we found that AUCC value was increased in the following order: BX>GJ>HL>RGC>HQ>DZ>RS, which shows better concordance with the theory of original context except RGC. It is particularly surprising that RGC achieved higher relevance with colitis. This is probably due to the fact that this herb contains a variety of triterpene saponins that have shown a wide range of corticosteroid-like activities, and many studies have reported their anti-inflammatory actions [99][100][101][102][103][104].

Limitations and Future Work
In this study, we presented a computational approach for identification of the molecular mechanisms of BXXXT. Although it has achieved encouraging results, there are some limitations. First, our results largely rely on available data sources including herbal chemical identifications, chemicalprotein interactions, and protein-protein interactions. Due to the lack of standardization and full assessment, these retrieves may contain many false positive and false negative interactions. On the other hand, the exact spectrum of compounds of TCM herbs is not defined, resulting in bias and incomplete inferences. Therefore, an updated and better validated data source may achieve more reliable and robust network models. Second, only three diseases (colitis, DM, and GC) were discussed because of their highly representative features of BXXXT treatment in clinical practices. However, TCM treatment is based on TCM syndromes that may be diverse and complicated even in the same disease or show the same characteristics in the different diseases. Although our results suggested that some common biological processes among diseases may be the potential mechanisms of BXXXT for the treatments of multiple diseases, these associations need to be further investigated in order to elucidate the biological basis of TCM syndromes. Third, common functional terms (GO  terms and KEGG pathways) were identified by enrichment analysis to elucidate the relationship between the formula and disease. However, the exact associations were not investigated. It is necessary to address these association patterns in future work to achieve more accurate and more meaningful inferences. Finally, cautious interpretations should be made as our approach presented here is based on in silico analysis. Therefore, further experimental validation on the prediction of network pharmacology is needed to support the presented hypothesis in the future work.

Conclusions
The current work applied a network pharmacology approach to the case of BXXXT formula, in order to elucidate its therapeutic mechanisms in treating multiple diseases. To maintain a reasonable level of reliability, a network-level investigation that integrated drug-likeness screening, target prioritizing, and multilayer network extending was conducted. We focused on 140 core formula targets, and three representative BXXXT diseases (colitis, DM, and GC) were discussed. Our main findings are as follows.  (1) After constructing multilayer networks, a good overlap between BXXXT nodes and disease nodes was observed at each level. The degree of similarity between BXXXT and colitis achieved the highest score. Moreover, the networkbased proximity analysis shows that the relevance between the formula targets and disease genes was significant according to SPD and RWR based scores for each disease. These results suggest that the formula targets are significantly close to disease genes in the PPI network, and the therapeutic effects of BXXXT may be addressed by targeting on both the disease genes directly and the genes closely connected to disease genes indirectly.
(2) The pathway enrichment analysis shows that there were 22 key pathways significantly associated with BXXXT, and the top significantly affected pathway was TNF signaling pathway. The analysis of the cumulative distribution of percentages of common enriched terms (GO terms and KEGG pathways) further confirmed the good association between BXXXT targets and disease genes.
(3) Our network module analysis has taken into account density and functional homogeneity simultaneously. The therapeutic effects of BXXXT were likely addressed by regulating a combination of targets in a modular pattern.
(4) The synergistic effects among BXXXT herbs were highlighted by elucidating that multiple herbs act on the same targets and the same pathways. Besides, the traditional roles of individual herbs in BXXXT formula were effectively interpreted based on the molecular level. It provides a new way to shed lights on the theory of "Jun-Chen-Zuo-Shi" of TCM from a network perspective.
In summary, the proposed approach is an effective strategy to understand the mechanisms of action and combinatorial rules of BXXXT formula. Also, the results of this work may facilitate generating hypothesis to drug development based on BXXXT and enable further research in a more timesaving and cost-effective manner.

Data Availability
Supporting data and materials are available: BXXXT compounds and their targets (Table S1), disease associated genes (Table S2), and DrugBank drugs and their targets (Table S3), all representative compounds in BXXXT and their CS values (Table S4), enrichment analysis for MF ontology (Table S5), enrichment analysis for BP ontology (Table S6), enrichment analysis for CC ontology (Table S7), results of network module analysis (Table S8), and the targets of BXXXT herbs (Table S9).