Skip to main content
Log in

Essential proteins discovery based on dominance relationship and neighborhood similarity centrality

  • Research
  • Published:
Health Information Science and Systems Aims and scope Submit manuscript

Abstract

Essential proteins play a vital role in development and reproduction of cells. The identification of essential proteins helps to understand the basic survival of cells. Due to time-consuming, costly and inefficient with biological experimental methods for discovering essential proteins, computational methods have gained increasing attention. In the initial stage, essential proteins are mainly identified by the centralities based on protein–protein interaction (PPI) networks, which limit their identification rate due to many false positives in PPI networks. In this study, a purified PPI network is firstly introduced to reduce the impact of false positives in the PPI network. Secondly, by analyzing the similarity relationship between a protein and its neighbors in the PPI network, a new centrality called neighborhood similarity centrality (NSC) is proposed. Thirdly, based on the subcellular localization and orthologous data, the protein subcellular localization score and ortholog score are calculated, respectively. Fourthly, by analyzing a large number of methods based on multi-feature fusion, it is found that there is a special relationship among features, which is called dominance relationship, then, a novel model based on dominance relationship is proposed. Finally, NSC, subcellular localization score, and ortholog score are fused by the dominance relationship model, and a new method called NSO is proposed. In order to verify the performance of NSO, the seven representative methods (ION, NCCO, E_POC, SON, JDC, PeC, WDC) are compared on yeast datasets. The experimental results show that the NSO method has higher identification rate than other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig.1
Fig.2
Fig.3
Fig.4
Fig.5
Fig.6
Fig.7
Fig.8
Fig.9
Fig.10
Fig.11

Similar content being viewed by others

Data availability

Data are publicly available.

References

  1. Fields S, Song O-K. A novel genetic system to detect protein–protein interactions. Nature. 1989;340(6230):245–6.

    Article  Google Scholar 

  2. Glass JI, Hutchison CA III, Smith HO, Venter JC. A systems biology tour de force for a near-minimal bacterium. Mol Syst Biol. 2009;5(1):330.

    Article  Google Scholar 

  3. Winzeler EA, Shoemaker DD, Astromoff A, Liang H, Anderson K, Andre B, et al. Functional characterization of the S cerevisiae genome by gene deletion and parallel analysis. Science. 1999;285(5429):901–6.

    Article  Google Scholar 

  4. Zhang R, Lin Y. DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res. 2009;37(suppl_1):D455–8.

    Article  Google Scholar 

  5. Clatworthy AE, Pierson E, Hung DT. Targeting virulence: a new paradigm for antimicrobial therapy. Nat Chem Biol. 2007;3(9):541–8.

    Article  Google Scholar 

  6. Giaever G, Chu AM, Ni L, Connelly C, Riles L, Veronneau S, et al. Functional profiling of the Saccharomyces cerevisiae genome. Nature. 2002;418(6896):387–91. https://doi.org/10.1038/nature00935.

    Article  Google Scholar 

  7. Roemer T, Jiang B, Davison J, Ketela T, Veillette K, Breton A, et al. Large-scale essential gene identification in Candida albicans and applications to antifungal drug discovery. Mol Microbiol. 2003;50(1):167–81.

    Article  Google Scholar 

  8. Cullen LM, Arndt GM. Genome-wide screening for gene function using RNAi in mammalian cells. Immunol Cell Biol. 2005;83(3):217–23.

    Article  Google Scholar 

  9. Hahn MW, Kern AD. Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol Biol Evol. 2005;22(4):803–6. https://doi.org/10.1093/molbev/msi072.

    Article  Google Scholar 

  10. Joy MP, Brock A, Ingber DE, Huang S. High-betweenness proteins in the yeast protein interaction network. J Biomed Biotechnol. 2005;2005(2):96–103. https://doi.org/10.1155/JBB.2005.96.

    Article  Google Scholar 

  11. Wuchty S, Stadler PF. Centers of complex networks. J Theor Biol. 2003;223(1):45–53. https://doi.org/10.1016/s0022-5193(03)00071-7.

    Article  MathSciNet  Google Scholar 

  12. Estrada E, Rodriguez-Velazquez JA. Subgraph centrality in complex networks. Phys Rev E. 2005;71(5 Pt 2): 056103. https://doi.org/10.1103/PhysRevE.71.056103.

    Article  MathSciNet  Google Scholar 

  13. Bonacich P. Power and centrality: a family of measures. Am J Sociol. 1987;92:12.

    Article  Google Scholar 

  14. Stephenson K, Zelen M. Rethinking centrality: methods and examples. Soc Netw. 1989;11(1):1–37.

    Article  MathSciNet  Google Scholar 

  15. Wang J, Li M, Wang H, Pan Y. Bioinformatics. Identification of essential proteins based on edge clustering coefficient. IEEE/ACM Trans Comput Biol. 2011;9(4):1070–80.

    Article  Google Scholar 

  16. Acencio ML, Lemke N. Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinform. 2009;10(1):290. https://doi.org/10.1186/1471-2105-10-290.

    Article  Google Scholar 

  17. Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW. Evolutionary rate in the protein interaction network. Science. 2002;296(5568):750–2. https://doi.org/10.1126/science.1068696.

    Article  Google Scholar 

  18. Jordan IK, Rogozin IB, Wolf YI, Koonin EV. Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 2002;12(6):962–8.

    Article  Google Scholar 

  19. Batada NN, Hurst LD, Tyers M. Evolutionary and physiological importance of hub proteins. PLoS Comput Biol. 2006;2(7): e88. https://doi.org/10.1371/journal.pcbi.0020088.

    Article  Google Scholar 

  20. Sharp PM. Determinants of DNA sequence divergence between Escherichia coli and Salmonella typhimurium: codon usage, map position, and concerted evolution. J Mol Evol. 1991;33:23–33.

    Article  Google Scholar 

  21. Rocha EP, Danchin A. An analysis of determinants of amino acids substitution rates in bacterial proteins. Mol Biol Evol. 2004;21(1):108–16. https://doi.org/10.1093/molbev/msh004.

    Article  Google Scholar 

  22. Wang J, Peng X, Li M, Pan Y. Construction and application of dynamic protein interaction network based on time course gene expression data. Proteomics. 2013;13(2):301–12. https://doi.org/10.1002/pmic.201200277.

    Article  Google Scholar 

  23. Xiao Q, Wang J, Peng X, Wu F-X, Pan Y. Identifying essential proteins from active PPI networks constructed with dynamic gene expression. BMC Genomics. 2015;16:1–7.

    Article  Google Scholar 

  24. Zhang Y, Lin H, Yang Z, Wang J. Construction of dynamic probabilistic protein interaction networks for protein complex identification. BMC Bioinform. 2016;17:1–13.

    Article  Google Scholar 

  25. Li M, Meng X, Zheng R, Wu FX, Li Y, Pan Y, et al. Identification of protein complexes by using a spatial and temporal active protein interaction network. IEEE/ACM Trans Comput Biol Bioinform. 2017;17:817–27.

    Article  Google Scholar 

  26. Tang X, Wang J, Zhong J, Pan Y. Predicting essential proteins based on weighted degree centrality. IEEE/ACM Trans Comput Biol Bioinform. 2013;11(2):407–18.

    Article  Google Scholar 

  27. Zhang X, Xiao W, Hu X. Predicting essential proteins by integrating orthology, gene expressions, and PPI networks. PLoS ONE. 2018;13(4): e0195410.

    Article  Google Scholar 

  28. Li G, Li M, Wang J, Wu J, Wu F-X, Pan Y. Predicting essential proteins based on subcellular localization, orthology and PPI networks. BMC Bioinform. 2016;17(8):571–81.

    Google Scholar 

  29. Li M, Zhang H, Wang JX, Pan Y. A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data. BMC Syst Biol. 2012;6:15. https://doi.org/10.1186/1752-0509-6-15.

    Article  Google Scholar 

  30. Zhong J, Tang C, Peng W, Xie M, Sun Y, Tang Q, et al. A novel essential protein identification method based on PPI networks and gene expression data. BMC Bioinform. 2021;22(1):248. https://doi.org/10.1186/s12859-021-04175-8.

    Article  Google Scholar 

  31. Zhang W, Xu J, Zou X. Predicting essential proteins by integrating network topology, subcellular localization information, gene expression profile and go annotation data. IEEE/ACM Trans Comput Biol Bioinform. 2019;17(6):2053–61.

    Article  Google Scholar 

  32. Peng W, Wang J, Wang W, Liu Q, Wu FX, Pan Y. Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks. BMC Syst Biol. 2012;6(1):87. https://doi.org/10.1186/1752-0509-6-87.

    Article  Google Scholar 

  33. Zhang Z, Jiang M, Wu D, Zhang W, Yan W, Qu X. A novel method for identifying essential proteins based on non-negative matrix tri-factorization. Front Genet. 2021;12: 709660.

    Article  Google Scholar 

  34. Li G, Li M, Wang J, Li Y, Pan Y. United neighborhood closeness centrality and orthology for predicting essential proteins. IEEE/ACM Trans Comput Biol Bioinform. 2020;17(4):1451–8. https://doi.org/10.1109/TCBB.2018.2889978.

    Article  Google Scholar 

  35. Li G, Li M, Peng W, Li Y, Pan Y, Wang J. A novel extended Pareto optimality consensus model for predicting essential proteins. J Theor Biol. 2019;480:141–9.

    Article  Google Scholar 

  36. Xenarios I, Salwinski L, Duan XJ, Higney P, Kim SM, Eisenberg D. DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 2002;30(1):303–5. https://doi.org/10.1093/nar/30.1.303.

    Article  Google Scholar 

  37. Yu H, Luscombe NM, Qian J, Gerstein M. Genomic analysis of gene expression relationships in transcriptional regulatory networks. Trends Genet. 2003;19(8):422–7. https://doi.org/10.1016/S0168-9525(03)00175-6.

    Article  Google Scholar 

  38. Nepusz T, Yu H, Paccanaro A. Detecting overlapping protein complexes in protein-protein interaction networks. Nat Methods. 2012;9(5):471–2. https://doi.org/10.1038/nmeth.1938.

    Article  Google Scholar 

  39. Mewes HW, Amid C, Arnold R, Frishman D, Guldener U, Mannhaupt G, et al. MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Res. 2004;32:D41–4. https://doi.org/10.1093/nar/gkh092.

    Article  Google Scholar 

  40. Cherry JM, Adler C, Ball C, Chervitz SA, Dwight SS, Hester ET, et al. SGD: Saccharomyces Genome Database. Nucleic Acids Res. 1998;26(1):73–9. https://doi.org/10.1093/nar/26.1.73.

    Article  Google Scholar 

  41. Saccharomyces Genome Deletion Project. http://www-sequence.stanford.edu/group/.

  42. Tu BP, Kudlicki A, Rowicka M, McKnight SL. Logic of the yeast metabolic cycle: temporal compartmentalization of cellular processes. Science. 2005;310(5751):1152–8. https://doi.org/10.1126/science.1120499.

    Article  Google Scholar 

  43. COMPARTMENTS. http://compartments.jensenlab.org. Accessed 28 Dec 2014.

  44. Östlund G, Schmitt T, Forslund K, Köstler T, Messina DN, Roopra S, et al. InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res. 2010;38:D196–203.

    Article  Google Scholar 

  45. Zhou Y, Zhou B, Pache L, Chang M, Khodabakhshi AH, Tanaseichuk O, et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat Commun. 2019;10(1):1523. https://doi.org/10.1038/s41467-019-09234-6.

    Article  Google Scholar 

Download references

Funding

This research is supported by National Natural Science Foundation of China (Nos. 62141207, 62302107, 62366007, 61972185), Guangxi Natural Science Foundation (No. 2022GXNSFAA035625), Natural Science Foundation of Yunnan Province of China (No. 2019FA024), Research Fund of Guangxi Key Lab of Multi-source Information Mining & Security (Nos. 20-A-01-03, 19-A-03-01), Guangxi Normal University Science Research Project (Natural Science) (No.2021JC008), Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing, Innovation Project of Guangxi Graduate Education (YCSW2023180).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jingli Wu or Wei Peng.

Ethics declarations

Conflict of interest

The authors declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Ethical approval

Not applicable.

Informed consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, G., Luo, X., Hu, Z. et al. Essential proteins discovery based on dominance relationship and neighborhood similarity centrality. Health Inf Sci Syst 11, 55 (2023). https://doi.org/10.1007/s13755-023-00252-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13755-023-00252-9

Keywords

Navigation