ABSTRACT
Diabetes is a common disease, and due to the increasing incidence year by year. But most diabetics can not be easily detected in the early stage, since the symptoms are not obvious. The objective of this study is to propose a machine-learning method based on unsupervised clustering to improve the accuracy of diabetes detection. Due to massive unlabeled data sets and the problems in the traditional K-means clustering algorithms, we adopt the Fuzzy c-means clustering algorithm with an improvement on the calculation of parameter m. Our method includes a combination of the principal component analysis(PCA), an improved Fuzzy c-means (FCM) clustering algorithm, and K-nearest neighbor(KNN) classification algorithm optimized with K value. After 10 times 10-fold cross-validation, the average accuracy of the proposed method reaches 99.31%, which is higher than that of other machine learning models. Therefore, our method is proven to be more suitable for detecting diabetes. At the same time, further experiments on a new data set validate the applicability of our method in a more practical way for the diabetes detection.
- Federation 2022. International Diabetes Federation. Retrieved October 27, 2022 from https://idf.org/Google Scholar
- Han Wu, Shengqi Yang, Zhangqin Huang, Jian He and Xiaoyi Wang. 2017. Type 2 diabetes mellitus prediction model based on data mining. J. Informatics in Medicine Unlocked, Vol. 10 (2018): 100-107. https://doi.org/10.1016/j.imu.2017.12.006Google ScholarCross Ref
- Changsheng Zhu, Christian Uwa Idemudia and Wenfang Feng. 2019. Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques. J. Informatics in Medicine Unlocked, Vol. 17 (2019): 100179. https://doi.org/10.1016/j.imu.2019.100179Google ScholarCross Ref
- Md. Shahriare Satu, Syeda Tanjila Atik and Mohammad Ali Moni. 2020. A Novel Hybrid Machine Learning Model to Predict Diabetes Mellitus. Commun. Proceedings of International Joint Conference on Computational Intelligence(IJCCI 2019), 453–465. https://doi.org/10.1007/978-981-15-3607-6_36Google ScholarCross Ref
- Howlader, K.C., Satu, M.S., Awal, M.A. 2022. Machine learning models for classification and identification of significant attributes to detect type 2 diabetes. J. Health Inf Sci Syst 10, 2 (2022). https://doi.org/10.1007/s13755-021-00168-2Google ScholarCross Ref
- Alehegn M, Joshi R R and Mulay P. 2019. Diabetes Analysis and Prediction Using Random Forest, KNN, Naïve Bayes And J48: An Ensemble Approach[J]. Int. J. Sci. Technol. Res, 2019, 8(9): 1346-1354.Google Scholar
- Suyanto Suyanto, Selly Meliana, Tenia Wahyuningrum and Siti Khomsah. 2022. A new nearest neighbor-based framework for diabetes detection. J. Expert Systems with Applications. Volume 199(2022): 116857. https://doi.org/10.1016/j.eswa.2022.116857Google ScholarDigital Library
- Naz, H. and Ahuja, S. 2020. Deep learning approach for diabetes prediction using PIMA Indian dataset. J Diabetes Metab Disord 19, 391–403 (2020). https://doi.org/10.1007/s40200-020-00520-5Google ScholarCross Ref
- Kaggle. 2018. Pima Indians Diabetes Datasets(PIDD). Retrieved September 5, 2022 from https://www.kaggle.com/kumargh/ pimaindiansdiabetescsv?select=pima-indians-diabetes.csvGoogle Scholar
- Ashutosh Kumar Dubey, Umesh Gupta and Sonal Jain. 2018. Comparative Study of K-means and Fuzzy C-means Algorithms on The Breast Cancer Data. International Journal on Advanced Science. Engineering and Information Technology, vol. 8, no. 1, pp. 18-29, 2018. [Online]. Available: http://dx.doi.org/10.18517/ijaseit.8.1.3490Google ScholarCross Ref
- Wenchao Xing and Yilin Bei. 2019. Medical Health Big Data Classification Based on KNN Classification Algorithm. J. IEEE Access ( Volume: 8): 28808 - 28819. Available: https://doi.org/10.1109/ACCESS.2019.2955754Google ScholarCross Ref
- Xiaofeng Zhu, Shichao Zhang, Zhi Jin, Zili Zhang and Zhuoming Xu. 2010. Missing Value Estimation for Mixed-Attribute Data Sets. J. IEEE Transactions on Knowledge and Data Engineering ( Volume: 23, Issue: 1, January 2011): 110-121. Available: https://doi.org/10.1109/TKDE.2010.99Google ScholarDigital Library
- Jhaldiyal T, Mishra P K. 2014. Analysis and prediction of diabetes mellitus using PCA, REP and SVM[J]. International Journal of Engineering and Technical Research (IJETR), 2014, 2(8): 164-166.Google Scholar
Index Terms
- A Hybrid Machine Learning Method for Diabetes Detection based on Unsupervised Clustering
Recommendations
Unsupervised possibilistic clustering
In fuzzy clustering, the fuzzy c-means (FCM) clustering algorithm is the best known and used method. Since the FCM memberships do not always explain the degrees of belonging for the data well, Krishnapuram and Keller proposed a possibilistic approach to ...
Remote diagnosis of diabetics patient through speech engine and fuzzy based machine learning algorithm
AbstractAs recent development of technology, it enables patients to get treatment remotely from doctors through audio conversation. The fourth highest number of death every year is caused by diabetics. Almost 50% to 80% of patients can avoid diabetics if ...
A new nearest neighbor-based framework for diabetes detection
AbstractDiabetes is one of the deadliest and costliest diseases. Today, automatic diabetes detection systems are primarily developed using deep learning (DL) approaches, which give high accuracy in classifying patients into two classes: have ...
Graphical abstractDisplay Omitted
Highlights- A new nearest neighbor-based framework is proposed to classify diabetic patients.
Comments