Abstract
The Fourth Industrial Revolution has a significant impact on many aspects, which help improve and develop significantly. These beneficial works give a better life for all society. When we mention the medical or healthcare field, there has been much creative and vital research that promotes everyone’s life. Inflammatory Bowel Disease (IBD) is one of the most dangerous diseases that can cause millions of deaths every year. In this research, we would like to raise a topic about IBD diagnosis using metagenomic data to advance prediction for initial detection. The problem is not well-studied adequately due to the lack of data and information in the past. However, with the rapid development of technology, we obtain massive data where a metagenomic sample can contain thousands of bacterial species. To evaluate which species are essential to the considered disease, this work investigates a dimension reduction approach based on Recursive Feature Elimination combining with Random Forest to provide practical prediction tasks on metagenomic data. The relationship between bacteria causing IBD is what we have to figure out. Our goal is to evaluate whether we can make a more reliable prediction using a precise quantity of features decided by Recursive Feature Elimination (RFE). The proposed method gives positively promising results, which can reach 0.927 in accuracy using thirty selected features and achieve a significant improvement compared to the random feature selection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Stidham, R.W., Higgins, P.: Colorectal cancer in inflammatory bowel disease. Clin. Colon Rectal Surg. 31(3), 168–178 (2018). https://doi.org/10.1055/s-0037-1602237
Flynn, S., Eisenstein, S.: Inflammatory bowel disease presentation and diagnosis. Surg. Clin. North America 99(6), 1051–1062 (2019). https://doi.org/10.1016/j.suc.2019.08.001
World Health Organization. Cancer. https://www.who.int/news-room/fact-sheets/detail/cancer. Accessed 25 Jan 2021
Abbas, M., et al.: Biomarker discovery in inflammatory bowel diseases using network-based feature selection. PLoS ONE 14(11), e0225382 (2019). https://doi.org/10.1371/journal.pone.0225382
Yuan, F., Liu, G., Yang, X., Wang, S., Wang, X.: Prediction of oxidoreductase subfamily classes based on RFE-SND-CC-PSSM and machine learning methods. J. Bioinform. Comput. Biol. 17(4), 1950029 (2019). https://doi.org/10.1142/S021972001950029X
Mundra, P.A., Rajapakse, J.C.: SVM-RFE with MRMR filter for gene selection. IEEE Trans. Nanobiosci. 9(1), 31–37 (2010). https://doi.org/10.1109/TNB.2009.2035284
Capriotti, E., Casadio, R.: K-Fold: a tool for the prediction of the protein folding kinetic order and rate. Bioinformatics (Oxford, England) 23(3), 385–386 (2007). https://doi.org/10.1093/bioinformatics/btl610
Price, C.J., Ramsden, S., Hope, T.M., Friston, K.J., Seghier, M.L.: Predicting IQ change from brain structure: a cross-validation study. Dev. Cogn. Neurosci. 5, 172–184 (2013). https://doi.org/10.1016/j.dcn.2013.03.001. Epub 15 March 2013. PMID: 23567505; PMCID: PMC3682176
Cai, J., Kai, X., Zhu, Y., Fang, H., Li, L.: Prediction and analysis of net ecosystem carbon exchange based on gradient boosting regression and random forest. Appl. Energy 262, 114566 (2020). https://doi.org/10.1016/j.apenergy.2020.114566. ISSN 0306-2619
Acharya, B.K., et al.: Mapping environmental suitability of scrub typhus in Nepal using MaxEnt and random forest models. Int. J. Environ. Res. Public Health 16(23), 4845 (2019). https://doi.org/10.3390/ijerph16234845
Brownlee, J.: Recursive Feature Elimination (RFE) for Feature Selection in Python (2020). https://machinelearningmastery.com/rfe-feature-selection-in-python/. Accessed 27 Jan
Darst, B.F., Malecki, K.C., Engelman, C.D.: Using recursive feature elimination in the random forest to account for correlated variables in high dimensional data. BMC Genet. 19(Suppl. 1), 65 (2018). https://doi.org/10.1186/s12863-018-0633-8
Dimitriadis, S.I., Liparas, D.A., Initiative, D.N.: How random is the random forest? Random forest algorithm on structural imaging biomarkers’ service for Alzheimer’s disease: from Alzheimer’s disease neuroimaging initiative (ADNI) database. Neural Regeneration Res. 13(6), 962–970 (2018). https://doi.org/10.4103/1673-5374.233433
Brownlee, J.: A Gentle Introduction to k-fold Cross-Validation (2018). https://machinelearningmastery.com/k-fold-cross-validation/. Accessed 28 Jan 2021
Wang, Y., Li, J.: Credible intervals for precision and recall based on a k-fold cross-validated beta distribution. Neural Comput. 28(8), 1694–1722 (2016). https://doi.org/10.1162/NECO_a_00857
Chicco, D., Jurman, G.: The Matthews correlation coefficient (MCC) advantages over F1 score and accuracy in binary classification evaluation. BMC Genomics 21(1), 6 (2020). https://doi.org/10.1186/s12864-019-6413-7
Wikipedia. Matthews Correlation Coefficient (2020). https://en.wikipedia.org/wiki/Matthews_correlation_coefficient. Accessed 28 Jan 2021
Ma, H., Bandos, A.I., Gur, D.: On the use of partial area under the ROC curve for comparison of two diagnostic tests. Biometrical J. Biometrische Zeitschrift 57(2), 304–320 (2015). https://doi.org/10.1002/bimj.201400023
Sokol, H., et al.: Fungal microbiota dysbiosis in IBD. Gut 66(6), 1039–1048 (2017). https://doi.org/10.1136/gutjnl-2015-310746
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Luong, H.H., Phan, N.T.L., Duong, T.T., Dang, T.M., Nguyen, T.D., Nguyen, H.T. (2021). Dimensionality Reduction on Metagenomic Data with Recursive Feature Elimination. In: Barolli, L., Yim, K., Enokido, T. (eds) Complex, Intelligent and Software Intensive Systems. CISIS 2021. Lecture Notes in Networks and Systems, vol 278. Springer, Cham. https://doi.org/10.1007/978-3-030-79725-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-79725-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79724-9
Online ISBN: 978-3-030-79725-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)