research-article

A Hybrid Machine Learning Method for Diabetes Detection based on Unsupervised Clustering

Authors:
Junhong Liu

School of Computer and Artificial Intelligence,Southwest Jiaotong University, China

School of Computer and Artificial Intelligence,Southwest Jiaotong University, China

0000-0002-8915-0309
View Profile

,
Bo Peng

School of Computer and Artificial Intelligence,Southwest Jiaotong University, China

School of Computer and Artificial Intelligence,Southwest Jiaotong University, China

0000-0002-8694-5106
View Profile

,
Zezhao Yin

School of Computer and Artificial Intelligence,Southwest Jiaotong University, China

School of Computer and Artificial Intelligence,Southwest Jiaotong University, China

0000-0002-6697-7733
View Profile

ICMLSC '23: Proceedings of the 2023 7th International Conference on Machine Learning and Soft ComputingJanuary 2023Pages 144–149https://doi.org/10.1145/3583788.3583809

Published:04 June 2023Publication History

ICMLSC '23: Proceedings of the 2023 7th International Conference on Machine Learning and Soft Computing

Pages 144–149

ABSTRACT

Diabetes is a common disease, and due to the increasing incidence year by year. But most diabetics can not be easily detected in the early stage, since the symptoms are not obvious. The objective of this study is to propose a machine-learning method based on unsupervised clustering to improve the accuracy of diabetes detection. Due to massive unlabeled data sets and the problems in the traditional K-means clustering algorithms, we adopt the Fuzzy c-means clustering algorithm with an improvement on the calculation of parameter m. Our method includes a combination of the principal component analysis(PCA), an improved Fuzzy c-means (FCM) clustering algorithm, and K-nearest neighbor(KNN) classification algorithm optimized with K value. After 10 times 10-fold cross-validation, the average accuracy of the proposed method reaches 99.31%, which is higher than that of other machine learning models. Therefore, our method is proven to be more suitable for detecting diabetes. At the same time, further experiments on a new data set validate the applicability of our method in a more practical way for the diabetes detection.

References

Federation 2022. International Diabetes Federation. Retrieved October 27, 2022 from https://idf.org/Google Scholar
Han Wu, Shengqi Yang, Zhangqin Huang, Jian He and Xiaoyi Wang. 2017. Type 2 diabetes mellitus prediction model based on data mining. J. Informatics in Medicine Unlocked, Vol. 10 (2018): 100-107. https://doi.org/10.1016/j.imu.2017.12.006Google ScholarCross Ref
Changsheng Zhu, Christian Uwa Idemudia and Wenfang Feng. 2019. Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques. J. Informatics in Medicine Unlocked, Vol. 17 (2019): 100179. https://doi.org/10.1016/j.imu.2019.100179Google ScholarCross Ref
Md. Shahriare Satu, Syeda Tanjila Atik and Mohammad Ali Moni. 2020. A Novel Hybrid Machine Learning Model to Predict Diabetes Mellitus. Commun. Proceedings of International Joint Conference on Computational Intelligence(IJCCI 2019), 453–465. https://doi.org/10.1007/978-981-15-3607-6_36Google ScholarCross Ref
Howlader, K.C., Satu, M.S., Awal, M.A. 2022. Machine learning models for classification and identification of significant attributes to detect type 2 diabetes. J. Health Inf Sci Syst 10, 2 (2022). https://doi.org/10.1007/s13755-021-00168-2Google ScholarCross Ref
Alehegn M, Joshi R R and Mulay P. 2019. Diabetes Analysis and Prediction Using Random Forest, KNN, Naïve Bayes And J48: An Ensemble Approach[J]. Int. J. Sci. Technol. Res, 2019, 8(9): 1346-1354.Google Scholar
Suyanto Suyanto, Selly Meliana, Tenia Wahyuningrum and Siti Khomsah. 2022. A new nearest neighbor-based framework for diabetes detection. J. Expert Systems with Applications. Volume 199(2022): 116857. https://doi.org/10.1016/j.eswa.2022.116857Google ScholarDigital Library
Naz, H. and Ahuja, S. 2020. Deep learning approach for diabetes prediction using PIMA Indian dataset. J Diabetes Metab Disord 19, 391–403 (2020). https://doi.org/10.1007/s40200-020-00520-5Google ScholarCross Ref
Kaggle. 2018. Pima Indians Diabetes Datasets(PIDD). Retrieved September 5, 2022 from https://www.kaggle.com/kumargh/ pimaindiansdiabetescsv?select=pima-indians-diabetes.csvGoogle Scholar
Ashutosh Kumar Dubey, Umesh Gupta and Sonal Jain. 2018. Comparative Study of K-means and Fuzzy C-means Algorithms on The Breast Cancer Data. International Journal on Advanced Science. Engineering and Information Technology, vol. 8, no. 1, pp. 18-29, 2018. [Online]. Available: http://dx.doi.org/10.18517/ijaseit.8.1.3490Google ScholarCross Ref
Wenchao Xing and Yilin Bei. 2019. Medical Health Big Data Classification Based on KNN Classification Algorithm. J. IEEE Access ( Volume: 8): 28808 - 28819. Available: https://doi.org/10.1109/ACCESS.2019.2955754Google ScholarCross Ref
Xiaofeng Zhu, Shichao Zhang, Zhi Jin, Zili Zhang and Zhuoming Xu. 2010. Missing Value Estimation for Mixed-Attribute Data Sets. J. IEEE Transactions on Knowledge and Data Engineering ( Volume: 23, Issue: 1, January 2011): 110-121. Available: https://doi.org/10.1109/TKDE.2010.99Google ScholarDigital Library
Jhaldiyal T, Mishra P K. 2014. Analysis and prediction of diabetes mellitus using PCA, REP and SVM[J]. International Journal of Engineering and Technical Research (IJETR), 2014, 2(8): 164-166.Google Scholar

Index Terms

A Hybrid Machine Learning Method for Diabetes Detection based on Unsupervised Clustering
1. Computing methodologies
  1. Machine learning

Recommendations

Unsupervised possibilistic clustering

In fuzzy clustering, the fuzzy c-means (FCM) clustering algorithm is the best known and used method. Since the FCM memberships do not always explain the degrees of belonging for the data well, Krishnapuram and Keller proposed a possibilistic approach to ...
Read More
Remote diagnosis of diabetics patient through speech engine and fuzzy based machine learning algorithm
Abstract
As recent development of technology, it enables patients to get treatment remotely from doctors through audio conversation. The fourth highest number of death every year is caused by diabetics. Almost 50% to 80% of patients can avoid diabetics if ...
Read More
A new nearest neighbor-based framework for diabetes detection
Abstract
Diabetes is one of the deadliest and costliest diseases. Today, automatic diabetes detection systems are primarily developed using deep learning (DL) approaches, which give high accuracy in classifying patients into two classes: have ...
Graphical abstract

Display Omitted
Highlights
- A new nearest neighbor-based framework is proposed to classify diabetic patients.
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICMLSC '23: Proceedings of the 2023 7th International Conference on Machine Learning and Soft Computing
January 2023
219 pages
ISBN:9781450398633
DOI:10.1145/3583788

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 June 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Diabetes detection
Improved Fuzzy c-means(FCM)
K-nearest neighbor(KNN)
Machine learning
Principal component analysis(PCA)
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 54
  Total Downloads
- Downloads (Last 12 months)54
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

A Hybrid Machine Learning Method for Diabetes Detection based on Unsupervised Clustering

ICMLSC '23: Proceedings of the 2023 7th International Conference on Machine Learning and Soft Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Unsupervised possibilistic clustering

Remote diagnosis of diabetics patient through speech engine and fuzzy based machine learning algorithm

A new nearest neighbor-based framework for diabetes detection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

A Hybrid Machine Learning Method for Diabetes Detection based on Unsupervised Clustering

ICMLSC '23: Proceedings of the 2023 7th International Conference on Machine Learning and Soft Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Unsupervised possibilistic clustering

Remote diagnosis of diabetics patient through speech engine and fuzzy based machine learning algorithm

A new nearest neighbor-based framework for diabetes detection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media