skip to main content
10.1145/3583788.3583809acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlscConference Proceedingsconference-collections
research-article

A Hybrid Machine Learning Method for Diabetes Detection based on Unsupervised Clustering

Authors Info & Claims
Published:04 June 2023Publication History

ABSTRACT

Diabetes is a common disease, and due to the increasing incidence year by year. But most diabetics can not be easily detected in the early stage, since the symptoms are not obvious. The objective of this study is to propose a machine-learning method based on unsupervised clustering to improve the accuracy of diabetes detection. Due to massive unlabeled data sets and the problems in the traditional K-means clustering algorithms, we adopt the Fuzzy c-means clustering algorithm with an improvement on the calculation of parameter m. Our method includes a combination of the principal component analysis(PCA), an improved Fuzzy c-means (FCM) clustering algorithm, and K-nearest neighbor(KNN) classification algorithm optimized with K value. After 10 times 10-fold cross-validation, the average accuracy of the proposed method reaches 99.31%, which is higher than that of other machine learning models. Therefore, our method is proven to be more suitable for detecting diabetes. At the same time, further experiments on a new data set validate the applicability of our method in a more practical way for the diabetes detection.

References

  1. Federation 2022. International Diabetes Federation. Retrieved October 27, 2022 from https://idf.org/Google ScholarGoogle Scholar
  2. Han Wu, Shengqi Yang, Zhangqin Huang, Jian He and Xiaoyi Wang. 2017. Type 2 diabetes mellitus prediction model based on data mining. J. Informatics in Medicine Unlocked, Vol. 10 (2018): 100-107. https://doi.org/10.1016/j.imu.2017.12.006Google ScholarGoogle ScholarCross RefCross Ref
  3. Changsheng Zhu, Christian Uwa Idemudia and Wenfang Feng. 2019. Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques. J. Informatics in Medicine Unlocked, Vol. 17 (2019): 100179. https://doi.org/10.1016/j.imu.2019.100179Google ScholarGoogle ScholarCross RefCross Ref
  4. Md. Shahriare Satu, Syeda Tanjila Atik and Mohammad Ali Moni. 2020. A Novel Hybrid Machine Learning Model to Predict Diabetes Mellitus. Commun. Proceedings of International Joint Conference on Computational Intelligence(IJCCI 2019), 453–465. https://doi.org/10.1007/978-981-15-3607-6_36Google ScholarGoogle ScholarCross RefCross Ref
  5. Howlader, K.C., Satu, M.S., Awal, M.A. 2022. Machine learning models for classification and identification of significant attributes to detect type 2 diabetes. J. Health Inf Sci Syst 10, 2 (2022). https://doi.org/10.1007/s13755-021-00168-2Google ScholarGoogle ScholarCross RefCross Ref
  6. Alehegn M, Joshi R R and Mulay P. 2019. Diabetes Analysis and Prediction Using Random Forest, KNN, Naïve Bayes And J48: An Ensemble Approach[J]. Int. J. Sci. Technol. Res, 2019, 8(9): 1346-1354.Google ScholarGoogle Scholar
  7. Suyanto Suyanto, Selly Meliana, Tenia Wahyuningrum and Siti Khomsah. 2022. A new nearest neighbor-based framework for diabetes detection. J. Expert Systems with Applications. Volume 199(2022): 116857. https://doi.org/10.1016/j.eswa.2022.116857Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Naz, H. and Ahuja, S. 2020. Deep learning approach for diabetes prediction using PIMA Indian dataset. J Diabetes Metab Disord 19, 391–403 (2020). https://doi.org/10.1007/s40200-020-00520-5Google ScholarGoogle ScholarCross RefCross Ref
  9. Kaggle. 2018. Pima Indians Diabetes Datasets(PIDD). Retrieved September 5, 2022 from https://www.kaggle.com/kumargh/ pimaindiansdiabetescsv?select=pima-indians-diabetes.csvGoogle ScholarGoogle Scholar
  10. Ashutosh Kumar Dubey, Umesh Gupta and Sonal Jain. 2018. Comparative Study of K-means and Fuzzy C-means Algorithms on The Breast Cancer Data. International Journal on Advanced Science. Engineering and Information Technology, vol. 8, no. 1, pp. 18-29, 2018. [Online]. Available: http://dx.doi.org/10.18517/ijaseit.8.1.3490Google ScholarGoogle ScholarCross RefCross Ref
  11. Wenchao Xing and Yilin Bei. 2019. Medical Health Big Data Classification Based on KNN Classification Algorithm. J. IEEE Access ( Volume: 8): 28808 - 28819. Available: https://doi.org/10.1109/ACCESS.2019.2955754Google ScholarGoogle ScholarCross RefCross Ref
  12. Xiaofeng Zhu, Shichao Zhang, Zhi Jin, Zili Zhang and Zhuoming Xu. 2010. Missing Value Estimation for Mixed-Attribute Data Sets. J. IEEE Transactions on Knowledge and Data Engineering ( Volume: 23, Issue: 1, January 2011): 110-121. Available: https://doi.org/10.1109/TKDE.2010.99Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jhaldiyal T, Mishra P K. 2014. Analysis and prediction of diabetes mellitus using PCA, REP and SVM[J]. International Journal of Engineering and Technical Research (IJETR), 2014, 2(8): 164-166.Google ScholarGoogle Scholar

Index Terms

  1. A Hybrid Machine Learning Method for Diabetes Detection based on Unsupervised Clustering

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICMLSC '23: Proceedings of the 2023 7th International Conference on Machine Learning and Soft Computing
      January 2023
      219 pages
      ISBN:9781450398633
      DOI:10.1145/3583788

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 June 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)54
      • Downloads (Last 6 weeks)5

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format