Skip to main content

Random Forest Algorithm for Prediction of HIV Drug Resistance

  • Chapter
  • First Online:
Pattern Recognition Techniques Applied to Biomedical Problems

Abstract

Random forest algorithm is a popular choice for genomic data analysis and bioinformatics research. The fundamental idea behind this technique is to combine many decision trees into a single model and use the random subspace method for selection of predictor variables. It is a nonparametric algorithm, efficient for both regression and classification problems, and has a good predictive performance for many types of data. This chapter describes the general characteristics of the random forest algorithm, showing, in practice, a comprehensive application of how this approach can be applied to predict HIV-1 drug resistance. The random forest results were compared to the other two models, logistic regression and classification tree, and presented lower variability in its results, showing to be a classifier with greater stability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mutalib, S., & Mohamed, A. (2011). A brief survey on GWAS and ML algorithms. In 2011 11th International Conference on Hybrid Intelligent Systems (HIS) (pp. 658–661). Piscataway: IEEE.

    Chapter  Google Scholar 

  2. Szymczak, S., Biernacka, J. M., Cordell, H. J., et al. (2009). Machine learning in genome-wide association studies. Genetic Epidemiology, 33, S51–S57. https://doi.org/10.1002/gepi.20473.

    Article  Google Scholar 

  3. Swan, A. L., Mobasheri, A., Allaway, D., et al. (2013). Application of machine learning to proteomics data: Classification and biomarker identification in postgenomics biology. OMICS, 17, 595–610. https://doi.org/10.1089/omi.2013.0017.

    Article  Google Scholar 

  4. Barla, A., Jurman, G., Riccadonna, S., et al. (2007). Machine learning methods for predictive proteomics. Briefings in Bioinformatics, 9, 119–128. https://doi.org/10.1093/bib/bbn008.

    Article  Google Scholar 

  5. Wale, N. (2011). Machine learning in drug discovery and development. Drug Development Research, 72, 112–119. https://doi.org/10.1002/ddr.20407.

    Article  Google Scholar 

  6. Lima, A. N., Philot, E. A., Trossini, G. H. G., et al. (2016). Use of machine learning approaches for novel drug discovery. Expert Opinion on Drug Discovery, 11, 225–239. https://doi.org/10.1517/17460441.2016.1146250.

    Article  Google Scholar 

  7. Kourou, K., Exarchos, T. P., Exarchos, K. P., et al. (2015). Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal, 13, 8–17. https://doi.org/10.1016/J.CSBJ.2014.11.005.

    Article  Google Scholar 

  8. Kononenko, I. (2001). Machine learning for medical diagnosis: History, state of the art and perspective. Artificial Intelligence in Medicine, 23, 89–109. https://doi.org/10.1016/S0933-3657(01)00077-X.

    Article  Google Scholar 

  9. Najami, M., Abedallah, N., & Khalifa, L. (2014). Computational approaches for bio-marker discovery. Journal of Intelligent Learning Systems and Applications, 6, 153–161. https://doi.org/10.4236/jilsa.2014.64012.

    Article  Google Scholar 

  10. Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32. https://doi.org/10.1023/A:1010933404324.

    Article  MATH  Google Scholar 

  11. Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 832–844. https://doi.org/10.1109/34.709601.

    Article  Google Scholar 

  12. Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140. https://doi.org/10.1007/BF00058655.

    Article  MATH  Google Scholar 

  13. Díaz-Uriarte, R., & Alvarez de Andrés, S. (2006). Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 7, 3. https://doi.org/10.1186/1471-2105-7-3.

    Article  Google Scholar 

  14. Strobl, C., Boulesteix, A.-L., Zeileis, A., & Hothorn, T. (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics, 8, 25. https://doi.org/10.1186/1471-2105-8-25.

    Article  Google Scholar 

  15. Hsueh, H.-M., Zhou, D.-W., & Tsai, C.-A. (2013). Random forests-based differential analysis of gene sets for gene expression data. Gene, 518, 179–186. https://doi.org/10.1016/J.GENE.2012.11.034.

    Article  Google Scholar 

  16. Wu, X., Wu, Z., & Li, K. (2008). Identification of differential gene expression for microarray data using recursive random forest. Chinese Medical Journal, 121, 2492–2496.

    Article  Google Scholar 

  17. Montaño-Gutierrez, L. F., Ohta, S., Kustatscher, G., et al. (2017). Nano Random Forests to mine protein complexes and their relationships in quantitative proteomics data. Molecular Biology of the Cell, 28, 673–680. https://doi.org/10.1091/mbc.e16-06-0370.

    Article  Google Scholar 

  18. Cao, Z. W., Han, L. Y., Zheng, C. J., et al. (2005). Computer prediction of drug resistance mutations in proteins. Drug Discovery Today, 10, 521–529. https://doi.org/10.1016/S1359-6446(05)03377-5.

    Article  Google Scholar 

  19. Chen, T., Cao, Y., Zhang, Y., et al. (2013). Random forest in clinical metabolomics for phenotypic discrimination and biomarker selection. Evidence-based Complementary and Alternative Medicine, 2013, 298183. https://doi.org/10.1155/2013/298183.

    Article  Google Scholar 

  20. Abdullah, M. N., Yap, B. W., Zakaria, Y., & Abdul Majeed, A. B. (2016). Metabolites selection and classification of metabolomics data on Alzheimer’s disease using random forest. In M. Berry, A. Hj Mohamed, & B. Yap (Eds.), Soft computing in data science. SCDS 2016. Communications in Computer and Information Science (Vol. 652, pp. 100–112). Singapore: Springer.

    Google Scholar 

  21. Goldstein, B. A., Hubbard, A. E., Cutler, A., & Barcellos, L. F. (2010). An application of Random Forests to a genome-wide association dataset: Methodological considerations & new findings. BMC Genetics, 11, 49. https://doi.org/10.1186/1471-2156-11-49.

    Article  Google Scholar 

  22. Goldstein, B. A., Polley, E. C., & Briggs, F. B. S. (2011). Random forests for genetic association studies. Statistical Applications in Genetics and Molecular Biology, 10, 32. https://doi.org/10.2202/1544-6115.1691.

    Article  MathSciNet  MATH  Google Scholar 

  23. Nguyen, T.-T., Huang, J., Wu, Q., et al. (2015). Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests. BMC Genomics, 16, S5. https://doi.org/10.1186/1471-2164-16-S2-S5.

    Article  Google Scholar 

  24. Shen, C., Yu, X., Harrison, R. W., & Weber, I. T. (2016). Automated prediction of HIV drug resistance from genotype data. BMC Bioinformatics, 17, 278. https://doi.org/10.1186/s12859-016-1114-6.

    Article  Google Scholar 

  25. Heider, D., Verheyen, J., & Hoffmann, D. (2010). Predicting Bevirimat resistance of HIV-1 from genotype. BMC Bioinformatics, 11, 37. https://doi.org/10.1186/1471-2105-11-37.

    Article  Google Scholar 

  26. Wang, D., Larder, B., Revell, A., et al. (2009). A comparison of three computational modelling methods for the prediction of virological response to combination HIV therapy. Artificial Intelligence in Medicine, 47, 63–74. https://doi.org/10.1016/J.ARTMED.2009.05.002.

    Article  Google Scholar 

  27. Khalid, Z., & Sezerman, O. U. (2016). Prediction of HIV drug resistance by combining sequence and structural properties. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 15, 966–973. https://doi.org/10.1109/TCBB.2016.2638821.

    Article  Google Scholar 

  28. Tarasova, O., Biziukova, N., Filimonov, D., et al. (2018). A computational approach for the prediction of HIV resistance based on amino acid and nucleotide descriptors. Molecules, 23, 2751. https://doi.org/10.3390/molecules23112751.

    Article  Google Scholar 

  29. Revell, A. D., Wang, D., Perez-Elias, M.-J., et al. (2018). 2018 update to the HIV-TRePS system: The development of new computational models to predict HIV treatment outcomes, with or without a genotype, with enhanced usability for low-income settings. The Journal of Antimicrobial Chemotherapy, 73, 2186–2196. https://doi.org/10.1093/jac/dky179.

    Article  Google Scholar 

  30. Bronze, M., Steegen, K., Wallis, C. L., et al. (2012). HIV-1 phenotypic reverse transcriptase inhibitor drug resistance test interpretation is not dependent on the subtype of the virus backbone. PLoS One, 7, e34708. https://doi.org/10.1371/journal.pone.0034708.

    Article  Google Scholar 

  31. Beerenwinkel, N., Schmidt, B., Walter, H., et al. (2002). Diversity and complexity of HIV-1 drug resistance: A bioinformatics approach to predicting phenotype from genotype. Proceedings of the National Academy of Sciences of the United States of America, 99, 8271–8276. https://doi.org/10.1073/pnas.112177799.

    Article  Google Scholar 

  32. Vercauteren, J., & Vandamme, A. M. (2006). Algorithms for the interpretation of HIV-1 genotypic drug resistance information. Antiviral Research, 71, 335–342. https://doi.org/10.1016/j.antiviral.2006.05.003.

    Article  Google Scholar 

  33. Schutten, M. (2006). Resistance assays. In A. M. Geretti (Ed.), Antiretroviral resistance in clinical practice. London: Mediscript.

    Google Scholar 

  34. Breiman, L., Friedman, J., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmont: Wadsworth.

    MATH  Google Scholar 

  35. Hastie, T., Tibshirani, R., & Friedman, J. (2008). The elements of statistical learning data mining, inference, And prediction. New York: Springer.

    MATH  Google Scholar 

  36. Efron, B., & Tibshirani, R. (1994). An introduction to the bootstrap. New York: Chapman & Hall.

    MATH  Google Scholar 

  37. Cutler, A., Cutler, D. R., & Stevens, J. R. (2012). Random forests. In Ensemble machine learning (pp. 157–175). Boston: Springer US.

    Chapter  Google Scholar 

  38. Tibshirani, R., & Tibshirani, R. (1996). Bias, variance and prediction error for classification rules. Toronto: University of Toronto.

    MATH  Google Scholar 

  39. Wolpert, D. H., & Macready, W. G. (1999). An efficient method to estimate bagging’s generalization error. Machine Learning, 35, 41–55. https://doi.org/10.1023/A:1007519102914.

    Article  MATH  Google Scholar 

  40. Breiman, L. (1996). Out-of-bag estimation. Berkeley, CA.

    Google Scholar 

  41. Janitza, S., & Hornung, R. (2018). On the overestimation of random forest’s out-of-bag error. PLoS One, 13, e0201904. https://doi.org/10.1371/journal.pone.0201904.

    Article  Google Scholar 

  42. Breiman, L., & Cutler, A. (2004). RFtools – for predicting and understanding data. Berkeley University, Berkeley, CA.

    Google Scholar 

  43. Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2, 18–22.

    Google Scholar 

  44. Janitza, S., Celik, E., & Boulesteix, A.-L. (2018). A computationally fast variable importance test for random forests for high-dimensional data. Advances in Data Analysis and Classification, 12, 885–915. https://doi.org/10.1007/s11634-016-0276-4.

    Article  MathSciNet  MATH  Google Scholar 

  45. Breiman, L. (2002). Manual on setting up, using, and understanding random forests v3.1. Berkeley, CA.

    Google Scholar 

  46. Nicodemus, K. K. (2011). Letter to the editor: On the stability and ranking of predictors from random forest variable importance measures. Briefings in Bioinformatics, 12, 369–373. https://doi.org/10.1093/bib/bbr016.

    Article  Google Scholar 

  47. Nicodemus, K. K., Malley, J. D., Strobl, C., & Ziegler, A. (2010). The behaviour of random forest permutation-based variable importance measures under predictor correlation. BMC Bioinformatics, 11, 110. https://doi.org/10.1186/1471-2105-11-110.

    Article  Google Scholar 

  48. Szymczak, S., Holzinger, E., Dasgupta, A., et al. (2016). r2VIM: A new variable selection method for random forests in genome-wide association studies. BioData Mining, 9, 7. https://doi.org/10.1186/s13040-016-0087-3.

    Article  Google Scholar 

  49. Ziegler, A., & König, I. R. (2014). Mining data with random forests: Current options for real-world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4, 55–63. https://doi.org/10.1002/widm.1114.

    Article  Google Scholar 

  50. Zhang, J., Zulkernine, M., & Haque, A. (2008). Random-forests-based network intrusion detection systems. IEEE Transactions on Systems, Man, and Cybernetics – Part C: Applications and Reviews, 38, 649–659. https://doi.org/10.1109/TSMCC.2008.923876.

    Article  Google Scholar 

  51. Breiman, L., & Cutler, A. Random forests – classification description. https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#prox. Accessed 19 Dec 2018.

  52. Pawar, S. D., Freas, C., Weber, I. T., & Harrison, R. W. (2018). Analysis of drug resistance in HIV protease. BMC Bioinformatics, 19, 362. https://doi.org/10.1186/s12859-018-2331-y.

    Article  Google Scholar 

  53. Singh, Y. (2017). Machine learning to improve the effectiveness of ANRS in predicting HIV drug resistance. Healthcare Informatics Research, 23, 271. https://doi.org/10.4258/hir.2017.23.4.271.

    Article  Google Scholar 

  54. Raposo, L. M. L. M., & Nobre, F. F. F. F. (2017). Ensemble classifiers for predicting HIV-1 resistance from three rule-based genotypic resistance interpretation systems. Journal of Medical Systems, 41, 155. https://doi.org/10.1007/s10916-017-0802-8.

    Article  Google Scholar 

  55. Geretti, A. M., & National Center for Biotechnology Information (U.S.). (2006). Antiretroviral resistance in clinical practice. London: Mediscript Ltd.

    Google Scholar 

  56. Winters, B., Montaner, J., Harrigan, P. R., et al. (2008). Determination of clinically relevant cutoffs for HIV-1 phenotypic resistance estimates through a combined analysis of clinical trial and cohort data. JAIDS Journal of Acquired Immune Deficiency Syndromes, 48, 26–34. https://doi.org/10.1097/QAI.0b013e31816d9bf4.

    Article  Google Scholar 

  57. Reeves, J. D., & Parkin, N. T. (2017). Viral phenotypic resistance assays. In Antimicrobial drug resistance (pp. 1389–1407). Cham: Springer International Publishing.

    Chapter  Google Scholar 

  58. Bozek, K., Lengauer, T., Sierra, S., et al. (2013). Analysis of physicochemical and structural properties determining HIV-1 coreceptor usage. PLoS Computational Biology, 9, e1002977. https://doi.org/10.1371/journal.pcbi.1002977.

    Article  Google Scholar 

  59. Rö Gnvaldsson, T., You, L., & Garwicz, D. (2015). State of the art prediction of HIV-1 protease cleavage sites. Bioinformatics, 31(8), 1204–1210. https://doi.org/10.1093/bioinformatics/btu810.

    Article  Google Scholar 

  60. Sheik Amamuddy, O., Bishop, N. T., & Tastan Bishop, Ö. (2017). Improving fold resistance prediction of HIV-1 against protease and reverse transcriptase inhibitors using artificial neural networks. BMC Bioinformatics, 18, 369. https://doi.org/10.1186/s12859-017-1782-x.

    Article  Google Scholar 

  61. Van der Borght, K., Verheyen, A., Feyaerts, M., et al. (2013). Quantitative prediction of integrase inhibitor resistance from genotype through consensus linear regression modeling. Virology Journal, 10, 8. https://doi.org/10.1186/1743-422X-10-8.

    Article  Google Scholar 

  62. Dybowski, J. N., Riemenschneider, M., Hauke, S., et al. (2011). Improved Bevirimat resistance prediction by combination of structural and sequence-based classifiers. BioData Mining, 4, 26. https://doi.org/10.1186/1756-0381-4-26.

    Article  Google Scholar 

  63. Riemenschneider, M., Hummel, T., & Heider, D. (2016). SHIVA – a web application for drug resistance and tropism testing in HIV. BMC Bioinformatics, 17, 314. https://doi.org/10.1186/s12859-016-1179-2.

    Article  Google Scholar 

  64. Kawashima, S., & Kanehisa, M. (2000). AAindex: Amino acid index database. Nucleic Acids Research, 28, 374–374. https://doi.org/10.1093/nar/28.1.374.

    Article  Google Scholar 

  65. Riemenschneider, M., Cashin, K. Y., Budeus, B., et al. (2016). Genotypic prediction of co-receptor tropism of HIV-1 subtypes A and C. Scientific Reports, 6, 24883. https://doi.org/10.1038/srep24883.

    Article  Google Scholar 

  66. Heider, D., Dybowski, J. N., Wilms, C., & Hoffmann, D. (2014). A simple structure-based model for the prediction of HIV-1 co-receptor tropism. BioData Mining, 7, 14. https://doi.org/10.1186/1756-0381-7-14.

    Article  Google Scholar 

  67. Kuhn, M. (2016). Package “caret.” ftp://cran.r-project.org/pub/R/web/packages/caret/caret.pdf. Accessed 20 Feb 2017.

  68. Stanford University – HIV Drug Resistance Database. (2016). PI resistance notes – HIV Drug Resistance Database. https://hivdb.stanford.edu/dr-summary/resistance-notes/PI/. Accessed 27 Dec 2018.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Letícia M. Raposo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Raposo, L.M., Rosa, P.T.C.R., Nobre, F.F. (2020). Random Forest Algorithm for Prediction of HIV Drug Resistance. In: Ortiz-Posadas, M. (eds) Pattern Recognition Techniques Applied to Biomedical Problems. STEAM-H: Science, Technology, Engineering, Agriculture, Mathematics & Health. Springer, Cham. https://doi.org/10.1007/978-3-030-38021-2_6

Download citation

Publish with us

Policies and ethics