Prediction of Hepatitis C Disease with Different Machine Learning and Data Mining Technique

Suiçmez, Çağrı; Yılmaz, Cemal; Kahraman, Hamdi Tolga; Cengiz, Enes; Suiçmez, Alihan

doi:10.1007/978-3-031-09753-9_27

Çağrı Suiçmez⁶,
Cemal Yılmaz⁷,
Hamdi Tolga Kahraman⁸,
Enes Cengiz⁹ &
…
Alihan Suiçmez¹⁰

Part of the book series: Engineering Cyber-Physical Systems and Critical Infrastructures ((ECPSCI,volume 1))

Included in the following conference series:

The International Conference on Artificial Intelligence and Applied Mathematics in Engineering

516 Accesses

Abstract

Today, artificial intelligence (AI) has been widely used in the preliminary diagnosis of many diseases. One of these diseases is hepatitis C, which affects millions of people around the world and causes thousands of deaths. In the diagnosis of this disease, data sets (electronic health records of patients) can be used, which contain statistical information and data that doctors can process to reveal new discoveries that cannot be noticed. Of the 615 patients in the data set used, 540 included healthy and 75 diseased individuals. In our study, different machine learning techniques were applied by applying some data mining techniques to eliminate the imbalance of the data set and obtain better results. Most of the machine learning algorithms applied in our study achieved more successful results compared to the literature thanks to the applied data mining technique. Obtained results are supported by accuracy, precision, recall, f1 score and confusion matrix parameters. In particular, results reaching 98.7% accuracy were obtained in the algorithms of the random forest and multi-layer perceptrons, whose parameters were determined by ourselves according to the problem. This will enable the pre-diagnosis of patients at risk of hepatitis C, fibrosis and cirrhosis using machine learning (ML).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chicco D, Jurman G (2021) An ensemble learning approach for enhanced classification of patients with hepatitis and cirrhosis. IEEE Access 9:24485–24498
Article Google Scholar
Nandipati SC, XinYing C, Wah KK (2020) Hepatitis C virus (HCV) prediction by machine learning techniques. Appl Model Simul 4:89–100
Google Scholar
Oladimeji OO, Oladimeji A, Olayanju O (2021) Machine learning models for diagnostic classification of hepatitis C tests. Front Health Inf 10(1):70
Article Google Scholar
Haga H, Sato H, Koseki A, Saito T, Okumoto K, Hoshikawa K et al (2020) A machine learning-based treatment prediction model using whole genome variants of hepatitis C virus. PloS One 15(11):e0242028
Google Scholar
Abd El-Salam SM, Ezz MM, Hashem S, Elakel W, Salama R, ElMakhzangy H, ElHefnawi M (2019) Performance of machine learning approaches on prediction of esophageal varices for Egyptian chronic hepatitis C patients. Inf Med Unlocked 17:100267
Article Google Scholar
Durahim AO (2016) Comparison of sampling techniques for imbalanced learning. Yönetim Bilişim Sistemleri Dergisi 2(2):181–191
Google Scholar
Gosain A, Sardana S (2017, Sept) Handling class imbalance problem using oversampling techniques: a review. In: 2017 International conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 79–85
Google Scholar
Susan S, Kumar A (2019) SSOMaj-SMOTE-SSOMin: three-step intelligent pruning of majority and minority samples for learning from imbalanced datasets. Appl Soft Comput 78:141–149
Article Google Scholar
Hashem S, Esmat G, Elakel W, Habashy S, Raouf SA, Elhefnawi M et al (2017) Comparison of machine learning approaches for prediction of advanced liver fibrosis in chronic hepatitis C patients. IEEE/ACM Trans Comput Biol Bioinf 15(3):861–868
Google Scholar
Konerman MA, Beste LA, Van T, Liu B, Zhang X, Zhu J et al (2019) Machine learning models to predict disease progression among veterans with hepatitis C virus. PloS One 14(1):e0208141
Google Scholar
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Article Google Scholar
Fatourechi M, Ward RK, Mason SG, Huggins J, Schloegl A, Birch GE (2008, Dec) Comparison of evaluation metrics in classification applications with imbalanced datasets. In: 2008 Seventh international conference on machine learning and applications. IEEE, pp 777–782
Google Scholar
Dal Pozzolo A, Caelen O, Waterschoot S, Bontempi G (2013, Oct) Racing for unbalanced methods selection. In: International conference on intelligent data engineering and automated learning. Springer, Berlin, pp 24–31
Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article MATH Google Scholar
Han H, Wang WY, Mao BH (2005, Aug) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing. Springer, Berlin, pp 878–887
Google Scholar
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009, Apr) Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, Heidelberg, pp 475–482
Google Scholar
He H, Bai Y, Garcia EA, Li S (2008, June) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, pp 1322–1328
Google Scholar
Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statist Surv 4:40–79
Article MathSciNet MATH Google Scholar
Saud S, Jamil B, Upadhyay Y, Irshad K (2020) Performance improvement of empirical models for estimation of global solar radiation in India: a k-fold cross-validation approach. Sustain Energy Technol Assess 40:100768
Google Scholar
Moreno-Torres JG, Sáez JA, Herrera F (2012) Study on the impact of partition-induced dataset shift on k-fold cross-validation. IEEE Trans Neural Networks Learn Syst 23(8):1304–1312
Article Google Scholar
Dreiseitl S, Ohno-Machado L (2002) Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 35(5–6):352–359
Article Google Scholar
Azam MS, Habibullah M, Rana HK. Performance analysis of various machine learning approaches in stroke prediction. Int J Comput Appl 975:8887
Google Scholar
Awad M, Khanna R (2015) Efficient learning machines: theories, concepts, and applications for engineers and system designers. Springer Nature, Berlin, p 268
Google Scholar
Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern 21(3):660–674
Article MathSciNet Google Scholar
Jiang L, Cai Z, Wang D, Jiang S (2007, Aug) Survey of improving k-nearest-neighbor for classification. In: Fourth international conference on fuzzy systems and knowledge discovery (FSKD 2007), vol 1. IEEE, pp 679–683
Google Scholar
Bhavsar H, Panchal MH (2012) A review on support vector machine for data classification. Int J Adv Res Comput Eng Technol (IJARCET) 1(10):185–189
Google Scholar
Jiang L, Wang D, Cai Z, Yan X (2007, Aug) Survey of improving Naive Bayes for classification. In: International conference on advanced data mining and applications. Springer, Berlin, pp 134–145
Google Scholar
Singhal Y, Jain A, Batra S, Varshney Y, Rathi M (2018, Dec) Review of bagging and boosting classification performance on unbalanced binary classification. In: 2018 IEEE 8th International advance computing conference (IACC). IEEE, pp 338–343
Google Scholar

Download references

Author information

Authors and Affiliations

Electrical and Electronics Engineering, Faculty of Technology, Gazi University, Ankara, Turkey
Çağrı Suiçmez
Mingachevir State University, Mingachevir, Azerbaijan
Cemal Yılmaz
Software Engineering, Faculty of Technology, Karadeniz Technical University, Trabzon, Turkey
Hamdi Tolga Kahraman
Mechatronics Engineering, Faculty of Technology, Afyon Kocatepe University, Afyonkarahisar, Turkey
Enes Cengiz
Electrical and Electronic Engineering, Faculty of Engineering, Ondokuz Mayıs University, Samsun, Turkey
Alihan Suiçmez

Authors

Çağrı Suiçmez
View author publications
You can also search for this author in PubMed Google Scholar
Cemal Yılmaz
View author publications
You can also search for this author in PubMed Google Scholar
Hamdi Tolga Kahraman
View author publications
You can also search for this author in PubMed Google Scholar
Enes Cengiz
View author publications
You can also search for this author in PubMed Google Scholar
Alihan Suiçmez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Çağrı Suiçmez .

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Suiçmez, Ç., Yılmaz, C., Kahraman, H.T., Cengiz, E., Suiçmez, A. (2023). Prediction of Hepatitis C Disease with Different Machine Learning and Data Mining Technique. In: Smart Applications with Advanced Machine Learning and Human-Centred Problem Design. ICAIAME 2021. Engineering Cyber-Physical Systems and Critical Infrastructures, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-031-09753-9_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-09753-9_27
Published: 01 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09752-2
Online ISBN: 978-3-031-09753-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics