Skip to main content

Advertisement

Log in

Deep learning approach for diabetes prediction using PIMA Indian dataset

  • Research article
  • Published:
Journal of Diabetes & Metabolic Disorders Aims and scope Submit manuscript

Abstract

Purpose

International Diabetes Federation (IDF) stated that 382 million people are living with diabetes worldwide. Over the last few years, the impact of diabetes has been increased drastically, which makes it a global threat. At present, Diabetes has steadily been listed in the top position as a major cause of death. The number of affected people will reach up to 629 million i.e. 48% increase by 2045. However, diabetes is largely preventable and can be avoided by making lifestyle changes. These changes can also lower the chances of developing heart disease and cancer. So, there is a dire need for a prognosis tool that can help the doctors with early detection of the disease and hence can recommend the lifestyle changes required to stop the progression of the deadly disease.

Method

Diabetes if untreated may turn into fatal and directly or indirectly invites lot of other diseases such as heart attack, heart failure, brain stroke and many more. Therefore, early detection of diabetes is very significant so that timely action can be taken and the progression of the disease may be prevented to avoid further complications. Healthcare organizations accumulate huge amount of data including Electronic health records, images, omics data, and text but gaining knowledge and insight into the data remains a key challenge. The latest advances in Machine learning technologies can be applied for obtaining hidden patterns, which may diagnose diabetes at an early phase. This research paper presents a methodology for diabetes prediction using a diverse machine learning algorithm using the PIMA dataset.

Results

The accuracy achieved by functional classifiers Artificial Neural Network (ANN), Naive Bayes (NB), Decision Tree (DT) and Deep Learning (DL) lies within the range of 90–98%. Among the four of them, DL provides the best results for diabetes onset with an accuracy rate of 98.07% on the PIMA dataset. Hence, this proposed system provides an effective prognostic tool for healthcare officials. The results obtained can be used to develop a novel automatic prognosis tool that can be helpful in early detection of the disease.

Conclusion

The outcome of the study confirms that DL provides the best results with the most promising extracted features. DL achieves the accuracy of 98.07% which can be used for further development of the automatic prognosis tool. The accuracy of the DL approach can further be enhanced by including the omics data for prediction of the onset of the disease.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. “Global Report on Diabetes, 2016”. Available at: https://apps.who.int/iris/bitstream/handle/10665/204871/9789241565257_eng.pdf;jsessionid=2BC28035503CFAFF295E70CFB4A0E1DF?Sequence=1.

  2. “Diabetes: Asia's 'silent killer'”, November 14, 2013”. Available at: www.bbc.com/news/world-asia-24740288.

  3. Mathers CD, Loncar D. Projections of global mortality and burden of disease from 2002 to 2030. 2015;3(11). https://doi.org/10.1371/journal.pmed.0030442.

  4. Swapna G, Vinayakumar R, Soman KP. Diabetes detection using deep learning algorithms. ICT Express. 2018;4(4):243–6. https://doi.org/10.1016/j.icte.2018.10.005. Elsevier B.V.

    Article  Google Scholar 

  5. Wu H, et al. Type 2 diabetes mellitus prediction model based on data mining. Informatics in Medicine Unlocked. 2018;10:100–7. https://doi.org/10.1016/j.imu.2017.12.006. Elsevier Ltd.

    Article  Google Scholar 

  6. Emerging T, Factors R. Diabetes mellitus , fasting blood glucose concentration , and risk of vascular disease : a collaborative meta-analysis of 102 prospective studies. The Lancet. 2010;375(9733):2215–22. https://doi.org/10.1016/S0140-6736(10)60484-9 Elsevier Ltd.

    Article  CAS  Google Scholar 

  7. Palaniappan S, Awang R. Intelligent heart disease prediction system using data mining techniques. 2008 IEEE/ACS International Conference on Computer Systems and Applications 2008;108–15. https://doi.org/10.1109/AICCSA.2008.4493524.

  8. Huang CL, Chen MC, Wang CJ. Credit scoring with a data mining approach based on support vector machines. Expert Syst Appl. 2007;33(4):847–56. https://doi.org/10.1016/j.eswa.2006.07.007.

    Article  Google Scholar 

  9. Zhang LM. Genetic deep neural networks using different activation functions for financial data mining. In: Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015; 2015. p. 2849–51. https://doi.org/10.1109/BigData.2015.7364099.

  10. Grundy SM. Obesity, Metabolic Syndrome , and Cardiovascular Disease. 2004;89(6):2595–600. https://doi.org/10.1210/jc.2004-0372.

  11. Palaniappan S. Intelligent heart disease prediction system using data mining techniques, (march 2008). 2017. https://doi.org/10.1109/AICCSA.2008.4493524.

  12. Craven MW, Shavlik JW. Using neural networks for data mining. Futur Gener Comput Syst. 1997;13(2–3):211–29. https://doi.org/10.1016/s0167-739x(97)00022-8.

    Article  Google Scholar 

  13. Radhimeenakshi S. Classification and prediction of heart disease risk using data mining techniques of support vector machine and artificial neural networks. In: 2016 International Conference on Computing for Sustainable Global Development (INDIACom); 2016;3107–11.

    Google Scholar 

  14. El-Jerjawi NS, Abu-Naser SS. Diabetes prediction using artificial neural network. International Journal of Advanced Science and Technology. 2018;121:55–64. https://doi.org/10.14257/ijast.2018.121.05.

    Article  Google Scholar 

  15. Perveen S, Shahbaz M, Keshavjee K, Guergachi A. Metabolic syndrome and development of diabetes mellitus: predictive modeling based on machine learning techniques, IEEE Access. IEEE. 2019;7:1365–75. https://doi.org/10.1109/ACCESS.2018.2884249.

  16. Perveen S, et al. Performance analysis of data mining classification techniques to predict diabetes. Procedia Computer Science. 2016;82:115–21. https://doi.org/10.1016/j.procs.2016.04.016 Elsevier Masson SAS.

    Article  Google Scholar 

  17. Barakat N, Bradley AP, Barakat MNH. Intelligible support vector machines for diagnosis of diabetes mellitus. IEEE Trans Inf Technol Biomed. 2010;14(4):1114–20. https://doi.org/10.1109/TITB.2009.2039485.

    Article  PubMed  Google Scholar 

  18. Ravizza S, Huschto T, Adamov A, Böhm L, Büsser A, Flöther FF, et al. Predicting the early risk of chronic kidney disease in patients with diabetes using real-world data. Nature Medicine. 2019;25(1):57–9. https://doi.org/10.1038/s41591-018-0239-8. Springer US.

  19. Miotto R, Wang F, Wang S, Jiang X, Dudley JT. Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform. 2017;19(6):1236–46. https://doi.org/10.1093/bib/bbx044.

    Article  PubMed Central  Google Scholar 

  20. Alade OM, Sowunmi OY. Information technology science. 2018;724:14–22. https://doi.org/10.1007/978-3-319-74980-8.

  21. Carrera EV, Carrera R. Automated detection of diabetic retinopathy using SVM, 2017. pp. 6–9.

  22. Huang YP, Nashrullah M. SVM-based decision tree for medical knowledge representation. In: 2016 International Conference on Fuzzy Theory and Its Applications, iFuzzy 2016; 2017. https://doi.org/10.1109/iFUZZY.2016.8004949.

  23. Young SR, et al. Optimizing deep learning hyper-parameters through an evolutionary algorithm, (November). 2015. https://doi.org/10.1145/2834892.2834896.

  24. “Machine Learning: Pima Indians Diabetes”, April 14, 2018. Available at: https://www.andreagrandi.it/2018/04/14/machine-learning-pima-indians-diabetes/.

  25. Anderson KM, et al. Cardiovascular disease risk profiles. American Heart Journal. 1991;121(1 PART 2):293–8.

    Article  CAS  Google Scholar 

  26. Kim JK, Kang S. Neural network-based coronary heart disease risk prediction using feature correlation analysis. Journal of healthcare engineering. 2017;2017(2017):1–13.

    CAS  Google Scholar 

  27. Mierswa I, et al. YALE : rapid prototyping for complex data mining tasks. 2006.

  28. Davazdahemami B, Delen D. The confounding role of common diabetes medications in developing acute renal failure: a data mining approach with emphasis on drug-drug interactions. Expert Systems with Applications. 2019;123:168–77. https://doi.org/10.1016/j.eswa.2019.01.006. Elsevier Ltd.

    Article  Google Scholar 

  29. “Intuitions on L1 and L2 Regularisation, Dec 26, 2018”. Available at: https://towardsdatascience.com/intuitions-on-l1-and-l2-regularisation-235f2db4c261.

  30. “Lasso and Ridge Regularization, May 18, 2017”. Available at: https://medium.com/@dk13093/lasso-and-ridge-regularization-7b7b847bce34.

  31. Design L, et al. Pipe failure modelling for water distribution networks using boosted decision trees. Structure and Infrastructure Engineering. 2018;14(10):1402–11. Taylor & Francis.

    Article  Google Scholar 

  32. Pei D, et al. Identification of potential type II diabetes in a Chinese population with a sensitive decision tree approach. Journal of Diabetes Research. 2019;2019:1–7. https://doi.org/10.1155/2019/4248218.

    Article  Google Scholar 

  33. Mantovani RG. An empirical study on hyperparameter tuning of decision trees” arXiv : 1812 . 02207v2 [ cs . LG ]. 2019.

  34. Raileanu LE, Stoffel K. Theoretical comparison between the Gini index and information gain criteria, (2100), pp. 77–93. 2004.

  35. Jaafari A, Zenner EK, Thai B. Wildfire spatial pattern analysis in the Zagros Mountains , Iran : A comparative study of decision tree based classifiers. Ecological informatics. 2018;43(2018):200–11.

    Article  Google Scholar 

  36. Supian S, Wahyuni S. Optimization of candidate selection using naive bayes: case study in Company X. 2018.

  37. Amato F, et al. Artificial neural networks in medical diagnosis. 2013:47–58. https://doi.org/10.2478/v10136-012-0031.

  38. Fayyad U, Piatetsky-shapiro G, Smyth P. From data mining to knowledge discovery in databases. AI Mag. 1996;17(3):37–54.

    Google Scholar 

  39. Masih N, Ahuja S. Prediction of heart diseases using data mining techniques: application on Framingham heart study. International Journal of Big Data and Analytics in Healthcare (IJBDAH). 2018;3(2):1–9.

    Article  Google Scholar 

  40. Haritha R, Babu DS, Sammulal P. A Hybrid Approach for Prediction of Type-1 and Type-2 Diabetes using Firefly and Cuckoo Search Algorithms. 2018;13(2):896–907.

  41. Zhang Y, et al. A feed-forward neural network model for the accurate prediction of diabetes mellitus. International Journal of Scientific and Technology Research. 2018;7(8):151–5. Available at: https://www.scopus.com/inward/record.uri?eid=2-s2.085059910862&partnerID=40&md5=40cdc4d37e47645feb76229e7b9c9dfd.

    Google Scholar 

  42. Iyer A, Jeyalatha S, Sumbaly R. Diagnosis of diabetes using classification mining techniques. arXiv preprint arXiv: 1502.03774. 2015.

  43. Kumari VA, Chitra R. Classification of diabetes disease using support vector machine. Int J Eng Res Appl. 2013;3(2):1797–801.

    Google Scholar 

  44. Çalişir D, Doǧantekin E. An automatic diabetes diagnosis system based on LDA-wavelet support vector machine classifier. Expert Syst Appl. 2011;38(7):8311–5. https://doi.org/10.1016/j.eswa.2011.01.017.

    Article  Google Scholar 

  45. Mohammad S, Dadgar H, Kaardaan M. A Hybrid Method of Feature Selection and Neural Network with Genetic Algorithm to Predict Diabetes. 2017;7(24):3397–404.

  46. Chen W, et al. A hybrid prediction model for type 2 diabetes using K-means and decision tree. In: Proceedings of the IEEE International Conference on Software Engineering and Service Sciences, ICSESS, 2017-Novem(61272399); 2018. p. 386–90. https://doi.org/10.1109/ICSESS.2017.8342938.

  47. Patil RN, Patil RN. International Journal of Computer Engineering and Applications , A novel scheme for predicting type 2 diabetes in women : using K-means with PCA as dimensionality reduction. International Journal of Computer Engineering and Applications. n.d.;XI(Viii):76–87.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huma Naz.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Research involving human participants and/or animals

There is no direct human participation in the manuscript.

Informed consent

Informed consent was obtained from all individual participants involved in the study.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Naz, H., Ahuja, S. Deep learning approach for diabetes prediction using PIMA Indian dataset. J Diabetes Metab Disord 19, 391–403 (2020). https://doi.org/10.1007/s40200-020-00520-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40200-020-00520-5

Keywords

Navigation