Skip to main content

Prediction of Hepatitis C Disease with Different Machine Learning and Data Mining Technique

  • Conference paper
  • First Online:
Smart Applications with Advanced Machine Learning and Human-Centred Problem Design (ICAIAME 2021)

Abstract

Today, artificial intelligence (AI) has been widely used in the preliminary diagnosis of many diseases. One of these diseases is hepatitis C, which affects millions of people around the world and causes thousands of deaths. In the diagnosis of this disease, data sets (electronic health records of patients) can be used, which contain statistical information and data that doctors can process to reveal new discoveries that cannot be noticed. Of the 615 patients in the data set used, 540 included healthy and 75 diseased individuals. In our study, different machine learning techniques were applied by applying some data mining techniques to eliminate the imbalance of the data set and obtain better results. Most of the machine learning algorithms applied in our study achieved more successful results compared to the literature thanks to the applied data mining technique. Obtained results are supported by accuracy, precision, recall, f1 score and confusion matrix parameters. In particular, results reaching 98.7% accuracy were obtained in the algorithms of the random forest and multi-layer perceptrons, whose parameters were determined by ourselves according to the problem. This will enable the pre-diagnosis of patients at risk of hepatitis C, fibrosis and cirrhosis using machine learning (ML).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chicco D, Jurman G (2021) An ensemble learning approach for enhanced classification of patients with hepatitis and cirrhosis. IEEE Access 9:24485–24498

    Article  Google Scholar 

  2. Nandipati SC, XinYing C, Wah KK (2020) Hepatitis C virus (HCV) prediction by machine learning techniques. Appl Model Simul 4:89–100

    Google Scholar 

  3. Oladimeji OO, Oladimeji A, Olayanju O (2021) Machine learning models for diagnostic classification of hepatitis C tests. Front Health Inf 10(1):70

    Article  Google Scholar 

  4. Haga H, Sato H, Koseki A, Saito T, Okumoto K, Hoshikawa K et al (2020) A machine learning-based treatment prediction model using whole genome variants of hepatitis C virus. PloS One 15(11):e0242028

    Google Scholar 

  5. Abd El-Salam SM, Ezz MM, Hashem S, Elakel W, Salama R, ElMakhzangy H, ElHefnawi M (2019) Performance of machine learning approaches on prediction of esophageal varices for Egyptian chronic hepatitis C patients. Inf Med Unlocked 17:100267

    Article  Google Scholar 

  6. Durahim AO (2016) Comparison of sampling techniques for imbalanced learning. Yönetim Bilişim Sistemleri Dergisi 2(2):181–191

    Google Scholar 

  7. Gosain A, Sardana S (2017, Sept) Handling class imbalance problem using oversampling techniques: a review. In: 2017 International conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 79–85

    Google Scholar 

  8. Susan S, Kumar A (2019) SSOMaj-SMOTE-SSOMin: three-step intelligent pruning of majority and minority samples for learning from imbalanced datasets. Appl Soft Comput 78:141–149

    Article  Google Scholar 

  9. Hashem S, Esmat G, Elakel W, Habashy S, Raouf SA, Elhefnawi M et al (2017) Comparison of machine learning approaches for prediction of advanced liver fibrosis in chronic hepatitis C patients. IEEE/ACM Trans Comput Biol Bioinf 15(3):861–868

    Google Scholar 

  10. Konerman MA, Beste LA, Van T, Liu B, Zhang X, Zhu J et al (2019) Machine learning models to predict disease progression among veterans with hepatitis C virus. PloS One 14(1):e0208141

    Google Scholar 

  11. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  12. Fatourechi M, Ward RK, Mason SG, Huggins J, Schloegl A, Birch GE (2008, Dec) Comparison of evaluation metrics in classification applications with imbalanced datasets. In: 2008 Seventh international conference on machine learning and applications. IEEE, pp 777–782

    Google Scholar 

  13. Dal Pozzolo A, Caelen O, Waterschoot S, Bontempi G (2013, Oct) Racing for unbalanced methods selection. In: International conference on intelligent data engineering and automated learning. Springer, Berlin, pp 24–31

    Google Scholar 

  14. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  MATH  Google Scholar 

  15. Han H, Wang WY, Mao BH (2005, Aug) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing. Springer, Berlin, pp 878–887

    Google Scholar 

  16. Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009, Apr) Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, Heidelberg, pp 475–482

    Google Scholar 

  17. He H, Bai Y, Garcia EA, Li S (2008, June) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, pp 1322–1328

    Google Scholar 

  18. Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statist Surv 4:40–79

    Article  MathSciNet  MATH  Google Scholar 

  19. Saud S, Jamil B, Upadhyay Y, Irshad K (2020) Performance improvement of empirical models for estimation of global solar radiation in India: a k-fold cross-validation approach. Sustain Energy Technol Assess 40:100768

    Google Scholar 

  20. Moreno-Torres JG, Sáez JA, Herrera F (2012) Study on the impact of partition-induced dataset shift on k-fold cross-validation. IEEE Trans Neural Networks Learn Syst 23(8):1304–1312

    Article  Google Scholar 

  21. Dreiseitl S, Ohno-Machado L (2002) Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 35(5–6):352–359

    Article  Google Scholar 

  22. Azam MS, Habibullah M, Rana HK. Performance analysis of various machine learning approaches in stroke prediction. Int J Comput Appl 975:8887

    Google Scholar 

  23. Awad M, Khanna R (2015) Efficient learning machines: theories, concepts, and applications for engineers and system designers. Springer Nature, Berlin, p 268

    Google Scholar 

  24. Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern 21(3):660–674

    Article  MathSciNet  Google Scholar 

  25. Jiang L, Cai Z, Wang D, Jiang S (2007, Aug) Survey of improving k-nearest-neighbor for classification. In: Fourth international conference on fuzzy systems and knowledge discovery (FSKD 2007), vol 1. IEEE, pp 679–683

    Google Scholar 

  26. Bhavsar H, Panchal MH (2012) A review on support vector machine for data classification. Int J Adv Res Comput Eng Technol (IJARCET) 1(10):185–189

    Google Scholar 

  27. Jiang L, Wang D, Cai Z, Yan X (2007, Aug) Survey of improving Naive Bayes for classification. In: International conference on advanced data mining and applications. Springer, Berlin, pp 134–145

    Google Scholar 

  28. Singhal Y, Jain A, Batra S, Varshney Y, Rathi M (2018, Dec) Review of bagging and boosting classification performance on unbalanced binary classification. In: 2018 IEEE 8th International advance computing conference (IACC). IEEE, pp 338–343

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Çağrı Suiçmez .

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Suiçmez, Ç., Yılmaz, C., Kahraman, H.T., Cengiz, E., Suiçmez, A. (2023). Prediction of Hepatitis C Disease with Different Machine Learning and Data Mining Technique. In: Smart Applications with Advanced Machine Learning and Human-Centred Problem Design. ICAIAME 2021. Engineering Cyber-Physical Systems and Critical Infrastructures, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-031-09753-9_27

Download citation

Publish with us

Policies and ethics