Skip to main content

Dimensionality Reduction on Metagenomic Data with Recursive Feature Elimination

  • Conference paper
  • First Online:
Book cover Complex, Intelligent and Software Intensive Systems (CISIS 2021)

Abstract

The Fourth Industrial Revolution has a significant impact on many aspects, which help improve and develop significantly. These beneficial works give a better life for all society. When we mention the medical or healthcare field, there has been much creative and vital research that promotes everyone’s life. Inflammatory Bowel Disease (IBD) is one of the most dangerous diseases that can cause millions of deaths every year. In this research, we would like to raise a topic about IBD diagnosis using metagenomic data to advance prediction for initial detection. The problem is not well-studied adequately due to the lack of data and information in the past. However, with the rapid development of technology, we obtain massive data where a metagenomic sample can contain thousands of bacterial species. To evaluate which species are essential to the considered disease, this work investigates a dimension reduction approach based on Recursive Feature Elimination combining with Random Forest to provide practical prediction tasks on metagenomic data. The relationship between bacteria causing IBD is what we have to figure out. Our goal is to evaluate whether we can make a more reliable prediction using a precise quantity of features decided by Recursive Feature Elimination (RFE). The proposed method gives positively promising results, which can reach 0.927 in accuracy using thirty selected features and achieve a significant improvement compared to the random feature selection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Stidham, R.W., Higgins, P.: Colorectal cancer in inflammatory bowel disease. Clin. Colon Rectal Surg. 31(3), 168–178 (2018). https://doi.org/10.1055/s-0037-1602237

    Article  Google Scholar 

  2. Flynn, S., Eisenstein, S.: Inflammatory bowel disease presentation and diagnosis. Surg. Clin. North America 99(6), 1051–1062 (2019). https://doi.org/10.1016/j.suc.2019.08.001

    Article  Google Scholar 

  3. World Health Organization. Cancer. https://www.who.int/news-room/fact-sheets/detail/cancer. Accessed 25 Jan 2021

  4. Abbas, M., et al.: Biomarker discovery in inflammatory bowel diseases using network-based feature selection. PLoS ONE 14(11), e0225382 (2019). https://doi.org/10.1371/journal.pone.0225382

  5. Yuan, F., Liu, G., Yang, X., Wang, S., Wang, X.: Prediction of oxidoreductase subfamily classes based on RFE-SND-CC-PSSM and machine learning methods. J. Bioinform. Comput. Biol. 17(4), 1950029 (2019). https://doi.org/10.1142/S021972001950029X

  6. Mundra, P.A., Rajapakse, J.C.: SVM-RFE with MRMR filter for gene selection. IEEE Trans. Nanobiosci. 9(1), 31–37 (2010). https://doi.org/10.1109/TNB.2009.2035284

    Article  Google Scholar 

  7. Capriotti, E., Casadio, R.: K-Fold: a tool for the prediction of the protein folding kinetic order and rate. Bioinformatics (Oxford, England) 23(3), 385–386 (2007). https://doi.org/10.1093/bioinformatics/btl610

    Article  Google Scholar 

  8. Price, C.J., Ramsden, S., Hope, T.M., Friston, K.J., Seghier, M.L.: Predicting IQ change from brain structure: a cross-validation study. Dev. Cogn. Neurosci. 5, 172–184 (2013). https://doi.org/10.1016/j.dcn.2013.03.001. Epub 15 March 2013. PMID: 23567505; PMCID: PMC3682176

  9. Cai, J., Kai, X., Zhu, Y., Fang, H., Li, L.: Prediction and analysis of net ecosystem carbon exchange based on gradient boosting regression and random forest. Appl. Energy 262, 114566 (2020). https://doi.org/10.1016/j.apenergy.2020.114566. ISSN 0306-2619

  10. Acharya, B.K., et al.: Mapping environmental suitability of scrub typhus in Nepal using MaxEnt and random forest models. Int. J. Environ. Res. Public Health 16(23), 4845 (2019). https://doi.org/10.3390/ijerph16234845

    Article  Google Scholar 

  11. Brownlee, J.: Recursive Feature Elimination (RFE) for Feature Selection in Python (2020). https://machinelearningmastery.com/rfe-feature-selection-in-python/. Accessed 27 Jan

  12. Darst, B.F., Malecki, K.C., Engelman, C.D.: Using recursive feature elimination in the random forest to account for correlated variables in high dimensional data. BMC Genet. 19(Suppl. 1), 65 (2018). https://doi.org/10.1186/s12863-018-0633-8

  13. Dimitriadis, S.I., Liparas, D.A., Initiative, D.N.: How random is the random forest? Random forest algorithm on structural imaging biomarkers’ service for Alzheimer’s disease: from Alzheimer’s disease neuroimaging initiative (ADNI) database. Neural Regeneration Res. 13(6), 962–970 (2018). https://doi.org/10.4103/1673-5374.233433

    Article  Google Scholar 

  14. Brownlee, J.: A Gentle Introduction to k-fold Cross-Validation (2018). https://machinelearningmastery.com/k-fold-cross-validation/. Accessed 28 Jan 2021

  15. Wang, Y., Li, J.: Credible intervals for precision and recall based on a k-fold cross-validated beta distribution. Neural Comput. 28(8), 1694–1722 (2016). https://doi.org/10.1162/NECO_a_00857

    Article  MathSciNet  MATH  Google Scholar 

  16. Chicco, D., Jurman, G.: The Matthews correlation coefficient (MCC) advantages over F1 score and accuracy in binary classification evaluation. BMC Genomics 21(1), 6 (2020). https://doi.org/10.1186/s12864-019-6413-7

    Article  Google Scholar 

  17. Wikipedia. Matthews Correlation Coefficient (2020). https://en.wikipedia.org/wiki/Matthews_correlation_coefficient. Accessed 28 Jan 2021

  18. Ma, H., Bandos, A.I., Gur, D.: On the use of partial area under the ROC curve for comparison of two diagnostic tests. Biometrical J. Biometrische Zeitschrift 57(2), 304–320 (2015). https://doi.org/10.1002/bimj.201400023

    Article  MathSciNet  MATH  Google Scholar 

  19. Sokol, H., et al.: Fungal microbiota dysbiosis in IBD. Gut 66(6), 1039–1048 (2017). https://doi.org/10.1136/gutjnl-2015-310746

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hai Thanh Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Luong, H.H., Phan, N.T.L., Duong, T.T., Dang, T.M., Nguyen, T.D., Nguyen, H.T. (2021). Dimensionality Reduction on Metagenomic Data with Recursive Feature Elimination. In: Barolli, L., Yim, K., Enokido, T. (eds) Complex, Intelligent and Software Intensive Systems. CISIS 2021. Lecture Notes in Networks and Systems, vol 278. Springer, Cham. https://doi.org/10.1007/978-3-030-79725-6_7

Download citation

Publish with us

Policies and ethics