Abstract
Today, artificial intelligence (AI) has been widely used in the preliminary diagnosis of many diseases. One of these diseases is hepatitis C, which affects millions of people around the world and causes thousands of deaths. In the diagnosis of this disease, data sets (electronic health records of patients) can be used, which contain statistical information and data that doctors can process to reveal new discoveries that cannot be noticed. Of the 615 patients in the data set used, 540 included healthy and 75 diseased individuals. In our study, different machine learning techniques were applied by applying some data mining techniques to eliminate the imbalance of the data set and obtain better results. Most of the machine learning algorithms applied in our study achieved more successful results compared to the literature thanks to the applied data mining technique. Obtained results are supported by accuracy, precision, recall, f1 score and confusion matrix parameters. In particular, results reaching 98.7% accuracy were obtained in the algorithms of the random forest and multi-layer perceptrons, whose parameters were determined by ourselves according to the problem. This will enable the pre-diagnosis of patients at risk of hepatitis C, fibrosis and cirrhosis using machine learning (ML).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chicco D, Jurman G (2021) An ensemble learning approach for enhanced classification of patients with hepatitis and cirrhosis. IEEE Access 9:24485–24498
Nandipati SC, XinYing C, Wah KK (2020) Hepatitis C virus (HCV) prediction by machine learning techniques. Appl Model Simul 4:89–100
Oladimeji OO, Oladimeji A, Olayanju O (2021) Machine learning models for diagnostic classification of hepatitis C tests. Front Health Inf 10(1):70
Haga H, Sato H, Koseki A, Saito T, Okumoto K, Hoshikawa K et al (2020) A machine learning-based treatment prediction model using whole genome variants of hepatitis C virus. PloS One 15(11):e0242028
Abd El-Salam SM, Ezz MM, Hashem S, Elakel W, Salama R, ElMakhzangy H, ElHefnawi M (2019) Performance of machine learning approaches on prediction of esophageal varices for Egyptian chronic hepatitis C patients. Inf Med Unlocked 17:100267
Durahim AO (2016) Comparison of sampling techniques for imbalanced learning. Yönetim Bilişim Sistemleri Dergisi 2(2):181–191
Gosain A, Sardana S (2017, Sept) Handling class imbalance problem using oversampling techniques: a review. In: 2017 International conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 79–85
Susan S, Kumar A (2019) SSOMaj-SMOTE-SSOMin: three-step intelligent pruning of majority and minority samples for learning from imbalanced datasets. Appl Soft Comput 78:141–149
Hashem S, Esmat G, Elakel W, Habashy S, Raouf SA, Elhefnawi M et al (2017) Comparison of machine learning approaches for prediction of advanced liver fibrosis in chronic hepatitis C patients. IEEE/ACM Trans Comput Biol Bioinf 15(3):861–868
Konerman MA, Beste LA, Van T, Liu B, Zhang X, Zhu J et al (2019) Machine learning models to predict disease progression among veterans with hepatitis C virus. PloS One 14(1):e0208141
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Fatourechi M, Ward RK, Mason SG, Huggins J, Schloegl A, Birch GE (2008, Dec) Comparison of evaluation metrics in classification applications with imbalanced datasets. In: 2008 Seventh international conference on machine learning and applications. IEEE, pp 777–782
Dal Pozzolo A, Caelen O, Waterschoot S, Bontempi G (2013, Oct) Racing for unbalanced methods selection. In: International conference on intelligent data engineering and automated learning. Springer, Berlin, pp 24–31
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Han H, Wang WY, Mao BH (2005, Aug) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing. Springer, Berlin, pp 878–887
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009, Apr) Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, Heidelberg, pp 475–482
He H, Bai Y, Garcia EA, Li S (2008, June) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, pp 1322–1328
Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statist Surv 4:40–79
Saud S, Jamil B, Upadhyay Y, Irshad K (2020) Performance improvement of empirical models for estimation of global solar radiation in India: a k-fold cross-validation approach. Sustain Energy Technol Assess 40:100768
Moreno-Torres JG, Sáez JA, Herrera F (2012) Study on the impact of partition-induced dataset shift on k-fold cross-validation. IEEE Trans Neural Networks Learn Syst 23(8):1304–1312
Dreiseitl S, Ohno-Machado L (2002) Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 35(5–6):352–359
Azam MS, Habibullah M, Rana HK. Performance analysis of various machine learning approaches in stroke prediction. Int J Comput Appl 975:8887
Awad M, Khanna R (2015) Efficient learning machines: theories, concepts, and applications for engineers and system designers. Springer Nature, Berlin, p 268
Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern 21(3):660–674
Jiang L, Cai Z, Wang D, Jiang S (2007, Aug) Survey of improving k-nearest-neighbor for classification. In: Fourth international conference on fuzzy systems and knowledge discovery (FSKD 2007), vol 1. IEEE, pp 679–683
Bhavsar H, Panchal MH (2012) A review on support vector machine for data classification. Int J Adv Res Comput Eng Technol (IJARCET) 1(10):185–189
Jiang L, Wang D, Cai Z, Yan X (2007, Aug) Survey of improving Naive Bayes for classification. In: International conference on advanced data mining and applications. Springer, Berlin, pp 134–145
Singhal Y, Jain A, Batra S, Varshney Y, Rathi M (2018, Dec) Review of bagging and boosting classification performance on unbalanced binary classification. In: 2018 IEEE 8th International advance computing conference (IACC). IEEE, pp 338–343
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Suiçmez, Ç., Yılmaz, C., Kahraman, H.T., Cengiz, E., Suiçmez, A. (2023). Prediction of Hepatitis C Disease with Different Machine Learning and Data Mining Technique. In: Smart Applications with Advanced Machine Learning and Human-Centred Problem Design. ICAIAME 2021. Engineering Cyber-Physical Systems and Critical Infrastructures, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-031-09753-9_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-09753-9_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09752-2
Online ISBN: 978-3-031-09753-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)