Predicting Diabetes Disease Using Data Mining Classification Algorithms and Comparison of Algorithm Performances

Authors

DOI:

https://doi.org/10.22399/ijcesen.233

Keywords:

Data mining, Classification, Naïve Bayes

Abstract

Data is always produced everywhere in the universe. In our globalizing world, where the internet is rapidly spreading, countless data set examples can be given. We produce data when we shop at a bookstore, make a transfer from our bank account, travel with an airline, or get tested in a hospital. Test results of hospital patients, production reports of a manufacturing factory, and exam results of students are examples of data sets. This data pile grows daily, and the data has no meaning. Data gains meaning by being processed by various tools and transformed into information. Today, information has become the most incredible power. In this study, a prediction study of diabetic patients was made with ten different algorithms from classification methods, one of the data mining methods. At the end of the prediction study, the algorithms' performances were compared. The unique aspect of this study is that the performance of the algorithms is compared by making predictions with ten different classification methods instead of just one or a few classification methods.

References

Bramer M. (2016), Principles of data mining, Springer London. DOI: https://doi.org/10.1007/978-1-4471-7307-6

Pala M., Çimen M., Boyraz Ö., Yıldız M., Boz A. (2019), Comparative performance analysis of decision tree and K-NN algorithms in diagnosing breast cancer, Published in 7th International Symposium on Innovative Technologies in Engineering and Science 22-24 November 2019 (ISITES2019 Şanlıurfa-Turkey)

Taşçı M., Şamlı R. (2020), Heart disease diagnosis with data mining, European Journal of Science and Technology Special Issue, pp. 88-95, April 2020.

Bilge H., Demircioğlu H. (2015), Analysis of gene expressions in the ovarian cancer dataset with data mining, Marmara Journal of Science and Technology 2015, 4: 125-134, DOI: 10.7240/mufbed.89154 DOI: https://doi.org/10.7240/mufbed.89154

Çataloluk H. (2012), Disease diagnosis using data mining methods on real medical data, Bilecik University, Department of Computer Engineering, master’s thesis.

Gültepe Y. (2019), A comparative evaluation on air pollution prediction with machine learning algorithms, European Journal of Science and Technology Issue 16, pp. 8-15, August 2019.

Kemalbay G., Alkış B. (2021), Prediction of stock market index movement direction with multiple logistic regression and k-nearest neighbor algorithm, Pamukkale University Journal of Engineering Sciences, 27(4), 556 569,2021. DOI: https://doi.org/10.5505/pajes.2020.57383

Aggarwal C. (2014), Data classification: Algorithm and applications, Taylor, and Francis. DOI: https://doi.org/10.1201/b17320

Dean J. (2014), Big data, data mining, and machine learning, Wiley. DOI: https://doi.org/10.1002/9781118691786

Shafaf N., Malek H. (2019), Applications of machine learning approaches in emergency medicine, a review article, Archives of Academic Emergency Medicine. 2019; 7 (1): e34.

Poyraz O. (2012), Data mining applications in medicine: Breast cancer data set analysis, Trakya University, Institute of Science and Technology, master’s thesis, Edirne.

Dilki G., Başar Ö. (2020), Comparison of distance measures via k-nearest neighbor algorithm in predicting bankruptcy of businesses, Istanbul Ticaret University Journal of Science, 19(38), Fall 2020, 224-233.

Moshkov M. (2020), Comparative analysis of deterministic and nondeterministic decision trees, Springer International Publishing. DOI: https://doi.org/10.1007/978-3-030-41728-4

Kareem E. (2017), An optimized decision tree classification algorithm for a data set, Noor Publishing.

Maimon O., Rokach L. (2014), Data mining with decision trees: Theory and applications, World Scientific Publishing Company. DOI: https://doi.org/10.1142/9097

Adak M., Yurtay N. (2013), Development of a software that allows creating a decision tree using the Gini algorithm, Journal of Information Technologies, volume: 6, issue: 3.

Marvin L. (2017), Statistics with MATLAB advanced regression, CreateSpace Independent Publishing Platform.

Schein A., Ungar L. (2007), Active learning for logistic regression: an evaluation, Mach Learn (2007) 68: 235–265, DOI 10.1007/s10994-007-5019-5. DOI: https://doi.org/10.1007/s10994-007-5019-5

Bhambri P. (2020), Design of paddy crop production technique: using k-means, naive bayes, KNN and SVM classifiers, Lambert Academic Publishing.

Sabry F. (2023), Naive bayes classifier: Fundamentals and applications, One Billion Knowledgeable.

Hatipoğlu E. (13 June 2018), Machine learning – classification- naive bayes- part 11, Date of access: 31.10.2022, Access address: https://medium.com/@ekrem.hatipoglu/machine-learning-classification-naive-bayes-part-11-4a10cd3452b4

Pala T. (2013), Realization of medical decision support system with data mining methods, Marmara University, Institute of Science and Technology, Department of Electronic Computer Education, master’s thesis, Istanbul.

Şekeroğlu S. (2010), A data mining application in the service sector, Istanbul Technical University, Institute of Natural and Applied Sciences, Department of Industrial Engineering, master’s thesis, Istanbul.

Suykens J., Signoretto M., Argyriou A. (2015), Regularization, optimization, kernels, and support vector machines, CRC Press Taylor and Francis Group. DOI: https://doi.org/10.1201/b17558

Campbell C., Ying Y. (2011), Learning with support vector machines, Morgan and Claypool Publishers. DOI: https://doi.org/10.1007/978-3-031-01552-6

Kung S.Y. (2014), Kernel methods and machine learning, Cambridge University Press. DOI: https://doi.org/10.1017/CBO9781139176224

Uğur B. (3 September 2020), Support vector machine (SVM), Date of access: 21.10.2023, Access address: https://burakugur.medium.com/destek-vekt%C3%B6r-makinesi-svm-f07f57f0a641

Akküçük U. (2011), Data mining clustering and classification algorithms, Yalın Publishing, Istanbul.

Youqiang Z., Guo C., Bisheng W., Xuesong L. (2018), A novel ensemble method for k-nearest neighbor, Pattern Recognition, Volume 85, January 2019, Pages 13-25. DOI: https://doi.org/10.1016/j.patcog.2018.08.003

Kılınç D., Borandağ E., Yücalar F., Tunalı V., Şimşek M., Özçift A. (2016), Scientific article classification using KNN algorithm and text mining with R language, Marmara Journal of Science 3: 89-94.

Kozak J. (2018), Decision tree and ensemble learning based on ant colony optimization, Springer International Publishing. DOI: https://doi.org/10.1007/978-3-319-93752-6

Ay Ş. (16 December 2019), Ensemble learning — bagging and boosting, Date of access: 20.10.2022, access address: https://medium.com/deep-learning-turkiye/ensemble-learning-bagging-ve-boosting-50643428b22b.

Ranawana R., Palade V. (2006), Optimized precision-a new measure for classifier performance evaluation, 2006 IEEE Congress on Evolutionary Computation.

Hossin M., Sulaiman M.N. (2015), A review on evaluation metrics for data classification evaluations, International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.2, March 2015. DOI: https://doi.org/10.5121/ijdkp.2015.5201

Downloads

Published

2024-04-30

How to Cite

ARSLANKAYA, S., & Saltan YAŞLI, G. (2024). Predicting Diabetes Disease Using Data Mining Classification Algorithms and Comparison of Algorithm Performances. International Journal of Computational and Experimental Science and Engineering, 10(2). https://doi.org/10.22399/ijcesen.233

Issue

Section

Research Article