Abstract
Biological systems can be modeled and described by biological networks. Biological networks are typical complex networks with widely real-world applications. Many problems arising in biological systems can be boiled down to the identification of important nodes. For example, biomedical researchers frequently need to identify important genes that potentially leaded to disease phenotypes in animal and explore crucial genes that were responsible for stress responsiveness in plants. To facilitate the identification of important nodes in biological systems, one needs to know network structures or behavioral data of nodes (such as gene expression data). If network topology was known, various centrality measures can be developed to solve the problem; while if only behavioral data of nodes were given, some sophisticated statistical methods can be employed. This paper reviewed some of the recent works on statistical identification of important nodes in biological systems from three aspects, that is, 1) in general complex networks based on complex networks theory and epidemic dynamic models; 2) in biological networks based on network motifs; and 3) in plants based on RNA-seq data. The identification of important nodes in a complex system can be seen as a mapping from the system to the ranking score vector of nodes, such mapping is not necessarily with explicit form. The three aspects reflected three typical approaches on ranking nodes in biological systems and can be integrated into one general framework. This paper also proposed some challenges and future works on the related topics. The associated investigations have potential real-world applications in the control of biological systems, network medicine and new variety cultivation of crops.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Newman M, Barabási A L, and Watts D J, The Structure and Dynamics of Networks, Princeton University Press, Princeton and Oxford, 2006.
Wu X, Wei W, Tang L, et al., Coreness and h-index for weighted networks, IEEE Trans. Circuits Syst. I: Reg. Papers, 2019, 66(8): 3113–3122.
Mei G, Wu X, Wang Y, et al., Compressive-sensing-based structure identification for multilayer networks, IEEE Trans. Cyber., 2018, 48(2): 754–764.
Wei X, Wu X, Chen S, et al., Cooperative epidemic spreading on a two-layered interconnected network, SIAM J. Appl. Dyn. Syst., 2018, 17(2): 1503–1520.
Jia Z, Chen H, Tu L, et al., Stability and feedback control for a coupled hematopoiesis nonlinear system, Adv. Differ. Equa., 2018, 2018: 401.
Long Y, Jia Z, and Wang Y, Coarse graining method based on generalized degree in complex network, Physica A, 2018, 505: 655–665.
Chen L, Wang R, and Zhang X, Biomolecular Networks: Methods and Applications in Systems Biology, Wiley, New Jersey, 2009.
Liu S, Xu Q, Chen A, et al., Structural controllability of static and dynamic transcriptional regulatory networks for Saccharomyces cerevisiae, Physica A, 2020, 537: 122772.
Barabási A L, Gulbahce N, and Loscalzo J, Network medicine: A network-based approach to human disease, Nat. Rev., 2011, 12: 56–68.
Wang Z, Yang C, Chen H, et al., Multi-gene co-transformation can improve comprehensive resistance to abiotic stresses in B. napus L., Plant Sci., 2018, 274: 410–419.
Shang B, Zang Y, Zhao X, et al., Functional characterization of GhPHOT2 in chloroplast avoidance of Gossypium hirsutum, Plant Physiol. Bioch., 2019, 135: 51–60.
Qu X, Cao B, Kang J, et al., Fine-tuning stomatal movement through small signaling peptides, Front Plant Sci., 2019, 10: 69.
Wang D, Yang C, Dong L, et al., Comparative transcriptome analyses of drought-resistant and -susceptible Brassica napus L. and development of EST-SSR markers by RNA-Seq, J. Plant Biol., 2015, 58: 259–269.
Zhang S, Li X, Pan J, et al., Use of comparative transcriptome analysis to identify candidate genes related to albinism in channel catfish (Ictalurus punctatus), Aquaculture, 2018, 500: 75–81.
Dong, W, Li M M, Li Z G, et al., Transcriptome analysis of the molecular mechanism of Chrysanthemum flower color change under short-day photoperiods, Plant Physiol. Bioch., 2020, 146: 315–328.
Zhang G F, Yue C M, Lu T T, et al., Genome-wide identification and expression analysis of NADPH oxidase genes in response to ABA and abiotic stresses, and in fibre formation in Gossypium, Peer J, 2020, 8: e8404.
Kitsak M, Gallos L K, Havlin S, et al., Identification of influential spreaders in complex networks, Nat. Phys., 2010, 6: 888–893.
Wang P, Tian C, and Lu J, Identifying influential spreaders in artificial complex networks, Journal of Systems Science and Complexity, 2014, 27(4): 650–665.
Lü L Y, Chen D, Ren X, et al., Vital nodes identification in complex networks, Phys. Rep., 2016, 650: 1–63.
Zhang Z K, Liu C, Zhan X X, et al., Dynamics of information diffusion and its applications on complex networks, Phys. Rep., 2016, 651: 1–34.
Ksiazek T G, Erdman D, Goldsmith C S, et al., A novel coronavirus associated with severe acute respiratory syndrome, N. Engl. J. Med., 2003, 348: 1953–1966.
Kuiken T, Fouchier R, Schutten M, et al., Newly discovered coronavirus as the primary cause of severe acute respiratory syndrome, Lancet, 2003, 362: 263–270.
Zhu N, Zhang D, Wang W, et al., A novel coronavirus from patients with pneumonia in China, N. Engl. J. Med., 2020, 382: 727–733.
Huang C, Wang Y, Li X, et al., Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China, Lancet, 2020, 395: 497–506.
Wang P, Lu J, Jin Y, et al., Statistical and network analysis of 1212 COVID-19 patients in Henan, China, Int. J. Infect. Disease, 2020, 95: 391–398.
Pastor-Satorras R and Vespignani A, Epidemic spreading in scale-free networks, Phys. Rev. Lett., 2001, 86(14): 3200–3203.
Boguna M, Pastor-Satorras R, and Vespignani A, Absence of epidemic threshold in scale-free networks with degree correlations, Phys. Rev. Lett., 2003, 90(2): 028701.
Gallos L K, Liljeros F, Argyrakis P, et al., Improving immunization strategies, Phys. Rev. E, 2007, 75(4): 045104.
Xu S, Wang P, Zhang C, et al., Spectral learning algorithm reveals propagation capability of complex network, IEEE Trans. Cyber., 2019, 49(12): 4253–4261.
Wang P, Lü J, and Yu X, Identification of important nodes in directed biological networks: A network motif approach, PLoS One, 2014, 9(8): e106132.
Wang P, Chen Y, Lü J, et al., Graphical features of functional genes in human protein interaction network, IEEE Trans. Biomed. Circuits Syst., 2016, 10(3): 707–720.
Wang P, Yang C, Chen H, et al., Exploring transcriptional factors reveals crucial members and regulatory networks involved in different abiotic stresses in Brassica napus L., BMC Plant Biol., 2018, 18: 202.
Wang P, Yang C, Chen H, et al., Transcriptomic basis for drought-resistance in Brassica napus L., Sci. Rep., 2017, 7: 40532.
Chen F, Wang Y, Wang B, et al., Graph representation learning: A survey, 2019, arXiv: 1909.00958.
Wu Z, Pan S, Chen F, et al., A comprehensive survey on graph neural networks, 2019, ArXiv: 1901.00596v3.
Bühlmann P and van de Geer S, Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer-Verlag, Berlin Heidelberg, 2011.
Wang P, Yu X, and Lü J, Identification and evolution of structurally dominant nodes in proteinprotein interaction networks, IEEE Trans. Biomed. Circuits Syst., 2014, 8(1): 87–97.
Xu S, Wang P, and Lü J, Iterative neighbour-information gathering for ranking nodes in complex networks, Sci. Rep., 2017, 7: 41321.
Brin S and Page L, Reprint of: The anatomy of a large-scale hypertextual web search engine, Comput. Netw., 2012, 56(18): 3825–3833.
Lü L, Zhang Y, Yeung C H, et al., Leaders in social networks, the delicious case, PLoS One, 2011, 6: e21202.
Xu S and Wang P, Identifying important nodes by adaptive LeaderRank, Physica A, 2017, 469: 654–664.
Metzner R, Fundamental of statistical and thermal physics, Phys. Today, 1967, 20(12): 85–87.
Milo R, Shen-Orr S, Itzkovitz S, et al., Network motifs: Simple building blocks of complex networks, Science, 2002, 298: 824–827.
Koschützki D, Schwöbbermeyer H, and Schreiber F, Ranking of network elements based on functional substructures, J. Theor. Biol., 2007, 248: 471–479.
Alon U, Network motifs: Theory and experimental approaches, Nat. Rev. Genet., 2007, 8(6): 450–461.
Koschützki D and Schreiber F, Centrality analysis methods for biological networks and their application to gene regulatory networks, Gene Regulat. Syst. Biol., 2008, 2: 193–201.
Sporns O and Kötter R, Motifs in brain networks, PLoS Biol., 2004, 2: e369.
Sporns O, Honey C J, and Kötter R, Identification and classification of hubs in brain networks, PLoS One, 2007, 2: e1049.
Rubinov M and Sporns O, Complex network measures of brain connectivity: Uses and interpretations, NeuroImage, 2010, 52: 1059–1069.
Härdle W K and Simar L, Applied Multivariate Statistical Analysis, Springer-Verlag, Berlin Heidelberg, 2012.
Li W and Li J, Modeling and analysis of RNA-seq data: A review from a statistical perspective, Quantitative Biol., 2018, 6(3): 195–209.
Samuels M L, Witmer J A, and Schaffner A A, Statistics for the Life Sciences, 5th Edition, Pearson Education, Edinburgh Gate, Harlow, 2016.
Anders S and Huber W, Differential expression analysis for sequence count data, Genome Biol., 2010, 11(10): R106.
Love M I, Huber W, and Anders S, Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2, Genome Biol., 2014, 15(12): 550.
Li H, Wei Z, and Maris J M, A hidden Markov random field model for genome-wide association studies, Biostat., 2010, 11: 139–150.
Chen M, Cho J, Zhao H, et al., Incorporating biological pathways via a Markov random field model in genome-wide association studies, PLoS Genet., 2011, 7: e1001353.
Hou L, Chen M, Zhang C K, et al., Guilt by rewiring: Gene prioritization through network rewiring in genome wide association studies, Hum. Mol. Genet., 2014, 23(10): 2780–2790.
Chalhoub B, Denoeud F, Liu S, et al., Early allopolyploid evolution in the post-neolithic Brassica napus oilseed genome, Science, 2014, 345: 950–953.
Wang X, Wang H, Wang J, et al., The genome of the mesopolyploid crop species Brassica rapa, Nat Genet., 43: 1035–1039.
Liu S, Liu Y, Yong C, et al., The Brassica oleracea genome reveals the asymmetrical evolution of polyploid genomes, Nat. Commun., 2014, 5: 3930.
Huala E, Dickerman A W, Garciahernandez M, et al., The Arabidopsis Information Resource (TAIR): A comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant, Nucleic Acids Res., 2001, 29: 102–105.
Li C and Li H, Network-constrained regularization and variable selection for analysis of genomic data, Bioinformat., 2008, 24(9): 1175–1182.
Liao J G and Chin K V, Logistic regression for disease classification using microarray data: Model selection in a large p and small n case, Bioinformat., 2007, 23(15): 1945–1951.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper was supported by the National Natural Science Foundation of China under Grant No. 61773153, the Natural Science Foundation of Henan under Grant No. 202300410045, the Supporting Plan for Scientific and Technological Innovative Talents in Universities of Henan Province under Grant No. 20HASTIT025, and the Training Plan of Young Key Teachers in Colleges and Universities of Henan Province under Grant No. 2018GGJS021. Partly supported by the Supporting Grant of Bioinformatics Center of Henan University under Grant No. 2018YLJC03.
This paper was recommended for publication by Editor GUO Jin.
Rights and permissions
About this article
Cite this article
Wang, P. Statistical Identification of Important Nodes in Biological Systems. J Syst Sci Complex 34, 1454–1470 (2021). https://doi.org/10.1007/s11424-020-0013-0
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11424-020-0013-0