ABSTRACT
Abstract - The issue of diabetes is a significant and urgent matter in global health, necessitating the development of precise predictive models to facilitate early intervention and effective management. The objective of this study is to assess the effectiveness of different feature selection techniques in improving the accuracy of diabetes prediction when employing the Random Forest algorithm. The present study evaluates the impact of three feature selection methods, namely Recursive Feature Elimination (RFE), Mutual Information, and L1 Regularization (Lasso), on the performance of the model. This study employs a comprehensive dataset obtained from Kaggle, which includes various attributes such as age, blood pressure, and laboratory measurements. The primary objective of this research is to systematically evaluate and compare the predictive performance of Random Forest models when integrated with different feature selection techniques. The evaluated metrics encompass accuracy, precision, recall, and F1-Score. The findings demonstrate significant enhancements in the accuracy of predictions when employing feature selection techniques. Specifically, the utilization of L1 Regularization (Lasso) resulted in the highest level of accuracy, reaching 98.45%. Furthermore, the features chosen by each method provide insight into the variables that have the greatest impact on predicting diabetes. This study offers significant findings regarding the significance of feature selection in healthcare applications, thereby enabling more accurate and effective diagnosis and treatment of diabetes. The results of this study not only make a valuable contribution to improving the accuracy of diabetes prediction, but also highlight the importance of carefully selecting relevant features when developing predictive models for healthcare datasets. This study promotes the need for additional investigation into feature selection methodologies and their implementation in the field of healthcare analytics, offering potential advantages in the realms of disease prediction and management.
- S. Saxena, D. Mohapatra, S. Padhee, and G. K. Sahoo, “Machine learning algorithms for diabetes detection: a comparative evaluation of performance of algorithms,” Evol. Intell., vol. 16, no. 2, pp. 587–603, 2023, doi: 10.1007/s12065-021-00685-9.Google ScholarCross Ref
- A. Tuppad and S. D. Patil, “Machine learning for diabetes clinical decision support: a review,” Adv. Comput. Intell., vol. 2, no. 2, pp. 1–24, 2022, doi: 10.1007/s43674-022-00034-y.Google ScholarCross Ref
- N. P. Tigga and S. Garg, “Prediction of Type 2 Diabetes using Machine Learning Classification Methods,” Procedia Comput. Sci., vol. 167, no. 2019, pp. 706–716, 2020, doi: 10.1016/j.procs.2020.03.336.Google ScholarCross Ref
- S. Islam Ayon and M. Milon Islam, “Diabetes Prediction: A Deep Learning Approach,” Int. J. Inf. Eng. Electron. Bus., vol. 11, no. 2, pp. 21–27, 2019, doi: 10.5815/ijieeb.2019.02.03.Google ScholarCross Ref
- A. Yahyaoui, A. Jamil, J. Rasheed, and M. Yesiltepe, “A Decision Support System for Diabetes Prediction Using Machine Learning and Deep Learning Techniques,” 1st Int. Informatics Softw. Eng. Conf. Innov. Technol. Digit. Transform. IISEC 2019 - Proc., pp. 1–4, 2019, doi: 10.1109/UBMYK48245.2019.8965556.Google ScholarCross Ref
- A. Dagliati , “Machine Learning Methods to Predict Diabetes Complications,” J. Diabetes Sci. Technol., vol. 12, no. 2, pp. 295–302, 2018, doi: 10.1177/1932296817706375.Google ScholarCross Ref
- [D. Sisodia and D. S. Sisodia, “Prediction of Diabetes using Classification Algorithms,” Procedia Comput. Sci., vol. 132, no. Iccids, pp. 1578–1585, 2018, doi: 10.1016/j.procs.2018.05.122.Google ScholarDigital Library
- I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras, I. Vlahavas, and I. Chouvarda, “Machine Learning and Data Mining Methods in Diabetes Research,” Comput. Struct. Biotechnol. J., vol. 15, pp. 104–116, 2017, doi: 10.1016/j.csbj.2016.12.005.Google ScholarCross Ref
- G. Swapna, R. Vinayakumar, and K. P. Soman, “Diabetes detection using deep learning algorithms,” ICT Express, vol. 4, no. 4, pp. 243–246, 2018, doi: 10.1016/j.icte.2018.10.005.Google ScholarCross Ref
- Q. Zou, K. Qu, Y. Luo, D. Yin, Y. Ju, and H. Tang, “Predicting Diabetes Mellitus With Machine Learning Techniques,” Front. Genet., vol. 9, no. November, pp. 1–10, 2018, doi: 10.3389/fgene.2018.00515.Google ScholarCross Ref
- H. Lai, H. Huang, K. Keshavjee, A. Guergachi, and X. Gao, “Predictive models for diabetes mellitus using machine learning techniques,” BMC Endocr. Disord., vol. 19, no. 1, pp. 1–9, 2019, doi: 10.1186/s12902-019-0436-6.Google ScholarCross Ref
- A. Z. Woldaregay , “Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes,” Artif. Intell. Med., vol. 98, no. April 2018, pp. 109–134, 2019, doi: 10.1016/j.artmed.2019.07.007.Google ScholarDigital Library
- M. K. Hasan, M. A. Alam, D. Das, E. Hossain, and M. Hasan, “Diabetes prediction using ensembling of different machine learning classifiers,” IEEE Access, vol. 8, pp. 76516–76531, 2020, doi: 10.1109/ACCESS.2020.2989857.Google ScholarCross Ref
- S. R. Kamel and R. Yaghoubzadeh, “Feature selection using grasshopper optimization algorithm in diagnosis of diabetes disease,” Informatics Med. Unlocked, vol. 26, p. 100707, 2021, doi: 10.1016/j.imu.2021.100707.Google ScholarCross Ref
- O. R. Shahin, H. H. Alshammari, A. A. Alzahrani, H. Alkhiri, and A. I. Taloba, “A robust deep neural network framework for the detection of diabetes,” Alexandria Eng. J., vol. 74, pp. 715–724, 2023, doi: 10.1016/j.aej.2023.05.072.Google ScholarCross Ref
- B. F. Wee, S. Sivakumar, K. H. Lim, W. K. Wong, and F. H. Juwono, “Diabetes detection based on machine learning and deep learning approaches,” Multimed. Tools Appl., 2023, doi: 10.1007/s11042-023-16407-5.Google ScholarCross Ref
- K. Lv , “Detection of diabetic patients in people with normal fasting glucose using machine learning,” BMC Med., vol. 21, no. 1, pp. 1–13, 2023, doi: 10.1186/s12916-023-03045-9.Google ScholarCross Ref
- UCI Machine Learning, “Pima Indians Diabetes Database,” Https://Www.Kaggle.Com/. 2016.Google Scholar
- Singh, U. P., Saxena, V., Kumar, A., Bhari, P., & Saxena, D. (2022, December). Unraveling the Prediction of Fine Particulate Matter over Jaipur, India using Long Short-Term Memory Neural Network. In Proceedings of the 4th International Conference on Information Management & Machine Intelligence (pp. 1-5).Google ScholarDigital Library
- Kumar, A., Bhari, P. L., Singh, U. P., & Saxena, V. (2022, December). Comparative Study of different Machine Learning Algorithms to Analyze Sentiments with a Case Study of Two Person's Microblogs on Twitter. In Proceedings of the 4th International Conference on Information Management & Machine Intelligence (pp.1-6).Google ScholarDigital Library
- Saxena, V., Saxena, D., & Singh, U. P. (2022, December). Security Enhancement using Image verification method to Secure Docker Containers. In Proceedings of the 4th International Conference on Information Management & Machine Intelligence (pp. 1-5).Google ScholarDigital Library
- Chauhan, M., Malhotra, R., Pathak, M., & Singh, U. P. (2012). Different aspects of cloud security. International Journal of Engineering Research and Applications, 2, 864-869.Google Scholar
- Mittal, A. K., Singh, U. P., Tiwari, A., Dwivedi, S., Joshi, M. K., & Tripathi, K. C. (2015). Short-term predictions by statistical methods in regions of varying dynamical error growth in a chaotic system. Meteorology and Atmospheric Physics, 127, 457-465.Google Scholar
- Singh, U. P., Mittal, A. K., Dwivedi, S., & Tiwari, A. (2015). Predictability study of forced Lorenz model: an artificial neural network approach. History, 40(181), 27-33.Google Scholar
- Singh, U. P., Mittal, A. K., Dwivedi, S., & Tiwari, A. (2020). Evaluating the predictability of central Indian rainfall on short and long timescales using theory of nonlinear dynamics. Journal of water and Climate Change, 11(4), 1134-1149.Google Scholar
- Singh, U., Pathak, M., Malhotra, R., & Chauhan, M. (2012). Secure communication protocol for ATM using TLS handshake. Journal of Engineering Research and Applications (IJERA), 2(2), 838-948.Google Scholar
- Singh, U. P., & Mittal, A. K. (2021). Testing reliability of the spatial Hurst exponent method for detecting a change point. Journal of Water and Climate Change, 12(8), 3661-3674.Google Scholar
- Tiwari, A., Mittal, A. K., Dwivedi, S., & Singh, U. P. (2015). Nonlinear time series analysis of rainfall over central Indian region using CMIP5 based climate model. Climate Change, 1(4), 411-417.Google Scholar
Recommendations
An evidential reasoning rule based feature selection for improving trauma outcome prediction
AbstractVarious demographic and medical factors can be linked to severe deterioration of patients suffering from traumatic injuries. Accurate identification of the most relevant variables is essential for building more accurate prediction ...
Highlights- The paper focuses on improving trauma outcome prediction accuracy through feature selection.
Efficient prediction of early-stage diabetes using XGBoost classifier with random forest feature selection technique
AbstractDiabetes is one of the most common and serious diseases affecting human health. Early diagnosis and treatment are vital to prevent or delay complications related to diabetes. An automated diabetes detection system assists physicians in the early ...
Feature Selection and Prediction Model for Type 2 Diabetes in the Chinese Population with Machine Learning
CSAE '20: Proceedings of the 4th International Conference on Computer Science and Application EngineeringDiabetes is a chronic disease characterized by hyperglycemia. Based on the rising incidence of the disease in recent years, diabetes is affecting more and more families. In 2017 alone, it caused 5 million deaths and cost $850 billion in global ...
Comments