skip to main content
10.1145/3647444.3647934acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicimmiConference Proceedingsconference-collections
research-article

Evaluating Feature Selection Methods to Enhance Diabetes Prediction with Random Forest

Published:13 May 2024Publication History

ABSTRACT

Abstract - The issue of diabetes is a significant and urgent matter in global health, necessitating the development of precise predictive models to facilitate early intervention and effective management. The objective of this study is to assess the effectiveness of different feature selection techniques in improving the accuracy of diabetes prediction when employing the Random Forest algorithm. The present study evaluates the impact of three feature selection methods, namely Recursive Feature Elimination (RFE), Mutual Information, and L1 Regularization (Lasso), on the performance of the model. This study employs a comprehensive dataset obtained from Kaggle, which includes various attributes such as age, blood pressure, and laboratory measurements. The primary objective of this research is to systematically evaluate and compare the predictive performance of Random Forest models when integrated with different feature selection techniques. The evaluated metrics encompass accuracy, precision, recall, and F1-Score. The findings demonstrate significant enhancements in the accuracy of predictions when employing feature selection techniques. Specifically, the utilization of L1 Regularization (Lasso) resulted in the highest level of accuracy, reaching 98.45%. Furthermore, the features chosen by each method provide insight into the variables that have the greatest impact on predicting diabetes. This study offers significant findings regarding the significance of feature selection in healthcare applications, thereby enabling more accurate and effective diagnosis and treatment of diabetes. The results of this study not only make a valuable contribution to improving the accuracy of diabetes prediction, but also highlight the importance of carefully selecting relevant features when developing predictive models for healthcare datasets. This study promotes the need for additional investigation into feature selection methodologies and their implementation in the field of healthcare analytics, offering potential advantages in the realms of disease prediction and management.

References

  1. S. Saxena, D. Mohapatra, S. Padhee, and G. K. Sahoo, “Machine learning algorithms for diabetes detection: a comparative evaluation of performance of algorithms,” Evol. Intell., vol. 16, no. 2, pp. 587–603, 2023, doi: 10.1007/s12065-021-00685-9.Google ScholarGoogle ScholarCross RefCross Ref
  2. A. Tuppad and S. D. Patil, “Machine learning for diabetes clinical decision support: a review,” Adv. Comput. Intell., vol. 2, no. 2, pp. 1–24, 2022, doi: 10.1007/s43674-022-00034-y.Google ScholarGoogle ScholarCross RefCross Ref
  3. N. P. Tigga and S. Garg, “Prediction of Type 2 Diabetes using Machine Learning Classification Methods,” Procedia Comput. Sci., vol. 167, no. 2019, pp. 706–716, 2020, doi: 10.1016/j.procs.2020.03.336.Google ScholarGoogle ScholarCross RefCross Ref
  4. S. Islam Ayon and M. Milon Islam, “Diabetes Prediction: A Deep Learning Approach,” Int. J. Inf. Eng. Electron. Bus., vol. 11, no. 2, pp. 21–27, 2019, doi: 10.5815/ijieeb.2019.02.03.Google ScholarGoogle ScholarCross RefCross Ref
  5. A. Yahyaoui, A. Jamil, J. Rasheed, and M. Yesiltepe, “A Decision Support System for Diabetes Prediction Using Machine Learning and Deep Learning Techniques,” 1st Int. Informatics Softw. Eng. Conf. Innov. Technol. Digit. Transform. IISEC 2019 - Proc., pp. 1–4, 2019, doi: 10.1109/UBMYK48245.2019.8965556.Google ScholarGoogle ScholarCross RefCross Ref
  6. A. Dagliati , “Machine Learning Methods to Predict Diabetes Complications,” J. Diabetes Sci. Technol., vol. 12, no. 2, pp. 295–302, 2018, doi: 10.1177/1932296817706375.Google ScholarGoogle ScholarCross RefCross Ref
  7. [D. Sisodia and D. S. Sisodia, “Prediction of Diabetes using Classification Algorithms,” Procedia Comput. Sci., vol. 132, no. Iccids, pp. 1578–1585, 2018, doi: 10.1016/j.procs.2018.05.122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras, I. Vlahavas, and I. Chouvarda, “Machine Learning and Data Mining Methods in Diabetes Research,” Comput. Struct. Biotechnol. J., vol. 15, pp. 104–116, 2017, doi: 10.1016/j.csbj.2016.12.005.Google ScholarGoogle ScholarCross RefCross Ref
  9. G. Swapna, R. Vinayakumar, and K. P. Soman, “Diabetes detection using deep learning algorithms,” ICT Express, vol. 4, no. 4, pp. 243–246, 2018, doi: 10.1016/j.icte.2018.10.005.Google ScholarGoogle ScholarCross RefCross Ref
  10. Q. Zou, K. Qu, Y. Luo, D. Yin, Y. Ju, and H. Tang, “Predicting Diabetes Mellitus With Machine Learning Techniques,” Front. Genet., vol. 9, no. November, pp. 1–10, 2018, doi: 10.3389/fgene.2018.00515.Google ScholarGoogle ScholarCross RefCross Ref
  11. H. Lai, H. Huang, K. Keshavjee, A. Guergachi, and X. Gao, “Predictive models for diabetes mellitus using machine learning techniques,” BMC Endocr. Disord., vol. 19, no. 1, pp. 1–9, 2019, doi: 10.1186/s12902-019-0436-6.Google ScholarGoogle ScholarCross RefCross Ref
  12. A. Z. Woldaregay , “Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes,” Artif. Intell. Med., vol. 98, no. April 2018, pp. 109–134, 2019, doi: 10.1016/j.artmed.2019.07.007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. K. Hasan, M. A. Alam, D. Das, E. Hossain, and M. Hasan, “Diabetes prediction using ensembling of different machine learning classifiers,” IEEE Access, vol. 8, pp. 76516–76531, 2020, doi: 10.1109/ACCESS.2020.2989857.Google ScholarGoogle ScholarCross RefCross Ref
  14. S. R. Kamel and R. Yaghoubzadeh, “Feature selection using grasshopper optimization algorithm in diagnosis of diabetes disease,” Informatics Med. Unlocked, vol. 26, p. 100707, 2021, doi: 10.1016/j.imu.2021.100707.Google ScholarGoogle ScholarCross RefCross Ref
  15. O. R. Shahin, H. H. Alshammari, A. A. Alzahrani, H. Alkhiri, and A. I. Taloba, “A robust deep neural network framework for the detection of diabetes,” Alexandria Eng. J., vol. 74, pp. 715–724, 2023, doi: 10.1016/j.aej.2023.05.072.Google ScholarGoogle ScholarCross RefCross Ref
  16. B. F. Wee, S. Sivakumar, K. H. Lim, W. K. Wong, and F. H. Juwono, “Diabetes detection based on machine learning and deep learning approaches,” Multimed. Tools Appl., 2023, doi: 10.1007/s11042-023-16407-5.Google ScholarGoogle ScholarCross RefCross Ref
  17. K. Lv , “Detection of diabetic patients in people with normal fasting glucose using machine learning,” BMC Med., vol. 21, no. 1, pp. 1–13, 2023, doi: 10.1186/s12916-023-03045-9.Google ScholarGoogle ScholarCross RefCross Ref
  18. UCI Machine Learning, “Pima Indians Diabetes Database,” Https://Www.Kaggle.Com/. 2016.Google ScholarGoogle Scholar
  19. Singh, U. P., Saxena, V., Kumar, A., Bhari, P., & Saxena, D. (2022, December). Unraveling the Prediction of Fine Particulate Matter over Jaipur, India using Long Short-Term Memory Neural Network. In Proceedings of the 4th International Conference on Information Management & Machine Intelligence (pp. 1-5).Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kumar, A., Bhari, P. L., Singh, U. P., & Saxena, V. (2022, December). Comparative Study of different Machine Learning Algorithms to Analyze Sentiments with a Case Study of Two Person's Microblogs on Twitter. In Proceedings of the 4th International Conference on Information Management & Machine Intelligence (pp.1-6).Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Saxena, V., Saxena, D., & Singh, U. P. (2022, December). Security Enhancement using Image verification method to Secure Docker Containers. In Proceedings of the 4th International Conference on Information Management & Machine Intelligence (pp. 1-5).Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Chauhan, M., Malhotra, R., Pathak, M., & Singh, U. P. (2012). Different aspects of cloud security. International Journal of Engineering Research and Applications, 2, 864-869.Google ScholarGoogle Scholar
  23. Mittal, A. K., Singh, U. P., Tiwari, A., Dwivedi, S., Joshi, M. K., & Tripathi, K. C. (2015). Short-term predictions by statistical methods in regions of varying dynamical error growth in a chaotic system. Meteorology and Atmospheric Physics, 127, 457-465.Google ScholarGoogle Scholar
  24. Singh, U. P., Mittal, A. K., Dwivedi, S., & Tiwari, A. (2015). Predictability study of forced Lorenz model: an artificial neural network approach. History, 40(181), 27-33.Google ScholarGoogle Scholar
  25. Singh, U. P., Mittal, A. K., Dwivedi, S., & Tiwari, A. (2020). Evaluating the predictability of central Indian rainfall on short and long timescales using theory of nonlinear dynamics. Journal of water and Climate Change, 11(4), 1134-1149.Google ScholarGoogle Scholar
  26. Singh, U., Pathak, M., Malhotra, R., & Chauhan, M. (2012). Secure communication protocol for ATM using TLS handshake. Journal of Engineering Research and Applications (IJERA), 2(2), 838-948.Google ScholarGoogle Scholar
  27. Singh, U. P., & Mittal, A. K. (2021). Testing reliability of the spatial Hurst exponent method for detecting a change point. Journal of Water and Climate Change, 12(8), 3661-3674.Google ScholarGoogle Scholar
  28. Tiwari, A., Mittal, A. K., Dwivedi, S., & Singh, U. P. (2015). Nonlinear time series analysis of rainfall over central Indian region using CMIP5 based climate model. Climate Change, 1(4), 411-417.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICIMMI '23: Proceedings of the 5th International Conference on Information Management & Machine Intelligence
    November 2023
    1215 pages

    Copyright © 2023 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 13 May 2024

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited
  • Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)4

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format