research-article

Evaluating Feature Selection Methods to Enhance Diabetes Prediction with Random Forest

Authors:
Pushpa Negi

Symbiosis Law School, Nagpur Campus, Symbiosis International (Deemed University),, India

Symbiosis Law School, Nagpur Campus, Symbiosis International (Deemed University),, India

0000-0003-1962-2486
View Profile

,
AnishKumar Dhablia

Engineering Manager, Altimetrik India Pvt Ltd, India

Engineering Manager, Altimetrik India Pvt Ltd, India

0000-0002-9046-9747
View Profile

,
Hrishikesh Bhanudas Vanjari

E&TC Engineering Dept, BVCOE Lavale , SPPU Pune, India

E&TC Engineering Dept, BVCOE Lavale , SPPU Pune, India

0000-0003-0357-7027
View Profile

,
Jayashree Tamkhade

E&TC, VIIT, India

E&TC, VIIT, India

0000-0002-6169-9316
View Profile

,
Sharayu Ikhar

Department of Information technology, Vishwakarma Institute of Information technology, India

Department of Information technology, Vishwakarma Institute of Information technology, India

0009-0003-6633-6834
View Profile

,
Shrinivas T. Shirkande

Computer Science & Engineering, S.B.Patil College of Engineering Indapur, India

Computer Science & Engineering, S.B.Patil College of Engineering Indapur, India

0009-0006-5515-229X
View Profile

ICIMMI '23: Proceedings of the 5th International Conference on Information Management & Machine IntelligenceNovember 2023Article No.: 107Pages 1–7https://doi.org/10.1145/3647444.3647934

Published:13 May 2024Publication History

ICIMMI '23: Proceedings of the 5th International Conference on Information Management & Machine Intelligence

Pages 1–7

ABSTRACT

Abstract - The issue of diabetes is a significant and urgent matter in global health, necessitating the development of precise predictive models to facilitate early intervention and effective management. The objective of this study is to assess the effectiveness of different feature selection techniques in improving the accuracy of diabetes prediction when employing the Random Forest algorithm. The present study evaluates the impact of three feature selection methods, namely Recursive Feature Elimination (RFE), Mutual Information, and L1 Regularization (Lasso), on the performance of the model. This study employs a comprehensive dataset obtained from Kaggle, which includes various attributes such as age, blood pressure, and laboratory measurements. The primary objective of this research is to systematically evaluate and compare the predictive performance of Random Forest models when integrated with different feature selection techniques. The evaluated metrics encompass accuracy, precision, recall, and F1-Score. The findings demonstrate significant enhancements in the accuracy of predictions when employing feature selection techniques. Specifically, the utilization of L1 Regularization (Lasso) resulted in the highest level of accuracy, reaching 98.45%. Furthermore, the features chosen by each method provide insight into the variables that have the greatest impact on predicting diabetes. This study offers significant findings regarding the significance of feature selection in healthcare applications, thereby enabling more accurate and effective diagnosis and treatment of diabetes. The results of this study not only make a valuable contribution to improving the accuracy of diabetes prediction, but also highlight the importance of carefully selecting relevant features when developing predictive models for healthcare datasets. This study promotes the need for additional investigation into feature selection methodologies and their implementation in the field of healthcare analytics, offering potential advantages in the realms of disease prediction and management.

References

S. Saxena, D. Mohapatra, S. Padhee, and G. K. Sahoo, “Machine learning algorithms for diabetes detection: a comparative evaluation of performance of algorithms,” Evol. Intell., vol. 16, no. 2, pp. 587–603, 2023, doi: 10.1007/s12065-021-00685-9.Google ScholarCross Ref
A. Tuppad and S. D. Patil, “Machine learning for diabetes clinical decision support: a review,” Adv. Comput. Intell., vol. 2, no. 2, pp. 1–24, 2022, doi: 10.1007/s43674-022-00034-y.Google ScholarCross Ref
N. P. Tigga and S. Garg, “Prediction of Type 2 Diabetes using Machine Learning Classification Methods,” Procedia Comput. Sci., vol. 167, no. 2019, pp. 706–716, 2020, doi: 10.1016/j.procs.2020.03.336.Google ScholarCross Ref
S. Islam Ayon and M. Milon Islam, “Diabetes Prediction: A Deep Learning Approach,” Int. J. Inf. Eng. Electron. Bus., vol. 11, no. 2, pp. 21–27, 2019, doi: 10.5815/ijieeb.2019.02.03.Google ScholarCross Ref
A. Yahyaoui, A. Jamil, J. Rasheed, and M. Yesiltepe, “A Decision Support System for Diabetes Prediction Using Machine Learning and Deep Learning Techniques,” 1st Int. Informatics Softw. Eng. Conf. Innov. Technol. Digit. Transform. IISEC 2019 - Proc., pp. 1–4, 2019, doi: 10.1109/UBMYK48245.2019.8965556.Google ScholarCross Ref
A. Dagliati , “Machine Learning Methods to Predict Diabetes Complications,” J. Diabetes Sci. Technol., vol. 12, no. 2, pp. 295–302, 2018, doi: 10.1177/1932296817706375.Google ScholarCross Ref
[D. Sisodia and D. S. Sisodia, “Prediction of Diabetes using Classification Algorithms,” Procedia Comput. Sci., vol. 132, no. Iccids, pp. 1578–1585, 2018, doi: 10.1016/j.procs.2018.05.122.Google ScholarDigital Library
I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras, I. Vlahavas, and I. Chouvarda, “Machine Learning and Data Mining Methods in Diabetes Research,” Comput. Struct. Biotechnol. J., vol. 15, pp. 104–116, 2017, doi: 10.1016/j.csbj.2016.12.005.Google ScholarCross Ref
G. Swapna, R. Vinayakumar, and K. P. Soman, “Diabetes detection using deep learning algorithms,” ICT Express, vol. 4, no. 4, pp. 243–246, 2018, doi: 10.1016/j.icte.2018.10.005.Google ScholarCross Ref
Q. Zou, K. Qu, Y. Luo, D. Yin, Y. Ju, and H. Tang, “Predicting Diabetes Mellitus With Machine Learning Techniques,” Front. Genet., vol. 9, no. November, pp. 1–10, 2018, doi: 10.3389/fgene.2018.00515.Google ScholarCross Ref
H. Lai, H. Huang, K. Keshavjee, A. Guergachi, and X. Gao, “Predictive models for diabetes mellitus using machine learning techniques,” BMC Endocr. Disord., vol. 19, no. 1, pp. 1–9, 2019, doi: 10.1186/s12902-019-0436-6.Google ScholarCross Ref
A. Z. Woldaregay , “Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes,” Artif. Intell. Med., vol. 98, no. April 2018, pp. 109–134, 2019, doi: 10.1016/j.artmed.2019.07.007.Google ScholarDigital Library
M. K. Hasan, M. A. Alam, D. Das, E. Hossain, and M. Hasan, “Diabetes prediction using ensembling of different machine learning classifiers,” IEEE Access, vol. 8, pp. 76516–76531, 2020, doi: 10.1109/ACCESS.2020.2989857.Google ScholarCross Ref
S. R. Kamel and R. Yaghoubzadeh, “Feature selection using grasshopper optimization algorithm in diagnosis of diabetes disease,” Informatics Med. Unlocked, vol. 26, p. 100707, 2021, doi: 10.1016/j.imu.2021.100707.Google ScholarCross Ref
O. R. Shahin, H. H. Alshammari, A. A. Alzahrani, H. Alkhiri, and A. I. Taloba, “A robust deep neural network framework for the detection of diabetes,” Alexandria Eng. J., vol. 74, pp. 715–724, 2023, doi: 10.1016/j.aej.2023.05.072.Google ScholarCross Ref
B. F. Wee, S. Sivakumar, K. H. Lim, W. K. Wong, and F. H. Juwono, “Diabetes detection based on machine learning and deep learning approaches,” Multimed. Tools Appl., 2023, doi: 10.1007/s11042-023-16407-5.Google ScholarCross Ref
K. Lv , “Detection of diabetic patients in people with normal fasting glucose using machine learning,” BMC Med., vol. 21, no. 1, pp. 1–13, 2023, doi: 10.1186/s12916-023-03045-9.Google ScholarCross Ref
UCI Machine Learning, “Pima Indians Diabetes Database,” Https://Www.Kaggle.Com/. 2016.Google Scholar
Singh, U. P., Saxena, V., Kumar, A., Bhari, P., & Saxena, D. (2022, December). Unraveling the Prediction of Fine Particulate Matter over Jaipur, India using Long Short-Term Memory Neural Network. In Proceedings of the 4th International Conference on Information Management & Machine Intelligence (pp. 1-5).Google ScholarDigital Library
Kumar, A., Bhari, P. L., Singh, U. P., & Saxena, V. (2022, December). Comparative Study of different Machine Learning Algorithms to Analyze Sentiments with a Case Study of Two Person's Microblogs on Twitter. In Proceedings of the 4th International Conference on Information Management & Machine Intelligence (pp.1-6).Google ScholarDigital Library
Saxena, V., Saxena, D., & Singh, U. P. (2022, December). Security Enhancement using Image verification method to Secure Docker Containers. In Proceedings of the 4th International Conference on Information Management & Machine Intelligence (pp. 1-5).Google ScholarDigital Library
Chauhan, M., Malhotra, R., Pathak, M., & Singh, U. P. (2012). Different aspects of cloud security. International Journal of Engineering Research and Applications, 2, 864-869.Google Scholar
Mittal, A. K., Singh, U. P., Tiwari, A., Dwivedi, S., Joshi, M. K., & Tripathi, K. C. (2015). Short-term predictions by statistical methods in regions of varying dynamical error growth in a chaotic system. Meteorology and Atmospheric Physics, 127, 457-465.Google Scholar
Singh, U. P., Mittal, A. K., Dwivedi, S., & Tiwari, A. (2015). Predictability study of forced Lorenz model: an artificial neural network approach. History, 40(181), 27-33.Google Scholar
Singh, U. P., Mittal, A. K., Dwivedi, S., & Tiwari, A. (2020). Evaluating the predictability of central Indian rainfall on short and long timescales using theory of nonlinear dynamics. Journal of water and Climate Change, 11(4), 1134-1149.Google Scholar
Singh, U., Pathak, M., Malhotra, R., & Chauhan, M. (2012). Secure communication protocol for ATM using TLS handshake. Journal of Engineering Research and Applications (IJERA), 2(2), 838-948.Google Scholar
Singh, U. P., & Mittal, A. K. (2021). Testing reliability of the spatial Hurst exponent method for detecting a change point. Journal of Water and Climate Change, 12(8), 3661-3674.Google Scholar
Tiwari, A., Mittal, A. K., Dwivedi, S., & Singh, U. P. (2015). Nonlinear time series analysis of rainfall over central Indian region using CMIP5 based climate model. Climate Change, 1(4), 411-417.Google Scholar

Recommendations

An evidential reasoning rule based feature selection for improving trauma outcome prediction
Abstract
Various demographic and medical factors can be linked to severe deterioration of patients suffering from traumatic injuries. Accurate identification of the most relevant variables is essential for building more accurate prediction ...
Highlights
- The paper focuses on improving trauma outcome prediction accuracy through feature selection.
Read More
Efficient prediction of early-stage diabetes using XGBoost classifier with random forest feature selection technique
Abstract
Diabetes is one of the most common and serious diseases affecting human health. Early diagnosis and treatment are vital to prevent or delay complications related to diabetes. An automated diabetes detection system assists physicians in the early ...
Read More
Feature Selection and Prediction Model for Type 2 Diabetes in the Chinese Population with Machine Learning
CSAE '20: Proceedings of the 4th International Conference on Computer Science and Application Engineering

Diabetes is a chronic disease characterized by hyperglycemia. Based on the rising incidence of the disease in recent years, diabetes is affecting more and more families. In 2017 alone, it caused 5 million deaths and cost $850 billion in global ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICIMMI '23: Proceedings of the 5th International Conference on Information Management & Machine Intelligence
November 2023
1215 pages
ISBN:9798400709418
DOI:10.1145/3647444
Editors:
Dinesh Goyal,
Anil Kumar,
Dharm Singh,
Marcin Paprzycki,
Pooja Jain,
B. B. Gupta,
Uday Pratap Singh
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 May 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Diabetes Prediction
Early Intervention
Feature Selection
Healthcare Analytics
Random Forest
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 4
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Evaluating Feature Selection Methods to Enhance Diabetes Prediction with Random Forest

ICIMMI '23: Proceedings of the 5th International Conference on Information Management & Machine Intelligence

ABSTRACT

References

Cited By

Recommendations

An evidential reasoning rule based feature selection for improving trauma outcome prediction

Efficient prediction of early-stage diabetes using XGBoost classifier with random forest feature selection technique

Feature Selection and Prediction Model for Type 2 Diabetes in the Chinese Population with Machine Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Evaluating Feature Selection Methods to Enhance Diabetes Prediction with Random Forest

ICIMMI '23: Proceedings of the 5th International Conference on Information Management & Machine Intelligence

ABSTRACT

References

Cited By

Recommendations

An evidential reasoning rule based feature selection for improving trauma outcome prediction

Efficient prediction of early-stage diabetes using XGBoost classifier with random forest feature selection technique

Feature Selection and Prediction Model for Type 2 Diabetes in the Chinese Population with Machine Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media