Skip to main content

Abstract

The exponential growth of hospital information systems (HIS) has led to the accumulation of vast amounts of medical data, necessitating effective analysis methods to enhance the quality and efficiency of medical services. Machine learning has emerged as a valuable technology for the automated and accurate analysis of medical data, offering potential applications in disease diagnosis and treatment. This study aims to contribute to the advancement of classification methods and address data imbalance issues in the context of hematological data. Specifically, we propose an efficient algorithm for disease classification utilizing hemogram blood test samples, employing the random forest algorithm in conjunction with the synthetic minority oversampling technique. Experimental results using real hematological data from a local hospital demonstrate the superiority of the proposed method, achieving an impressive accuracy rate of up to 97.75% and an Area Under the Curve value of up to 98.65%. The findings underscore the value of leveraging machine learning techniques in diagnoses and treatment in clinical practice, especially when integrated into HIS systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Akhtar A et al (2021) COVID-19 detection from CBC using machine learning techniques. Int J Technol Innov Manag IJTIM 1(2):65–78

    Google Scholar 

  2. Akter F et al (2018) Classification of hematological data using data mining technique to predict diseases. J Comput Commun 6(4):76

    Google Scholar 

  3. Alsheref FK, Gomaa WH (2019) Blood diseases detection using classical machine learning algorithms. Int J Adv Comput Sci Appl 10:7

    Google Scholar 

  4. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    Article  Google Scholar 

  5. Breiman L et al (1984) Classification and regression T rees (Monterey, California: Wadsworth). Inc

    Google Scholar 

  6. Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  7. Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol TIST 2(3):27

    Google Scholar 

  8. Chawla NV et al (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  9. Cortes C, Vapnik V (1995) Support vector machine. Mach Learn 20(3):273–297

    Article  Google Scholar 

  10. Deo RC (2015) Machine learning in medicine. Circulation 132(20):1920–1930

    Article  Google Scholar 

  11. Doewes RI et al (2022) Diagnosis of COVID-19 through blood sample using ensemble genetic algorithms and machine learning classifier. World J Eng 19(2):175–182

    Article  Google Scholar 

  12. Fix E, Hodges J (1952) Discriminatory analysis-nonparametric discrimination: Small sample performance. California Univ, Berkeley

    Google Scholar 

  13. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat, 1189–1232

    Google Scholar 

  14. Huynh P-H et al (2021) Enhancing COVID-19 prediction using transfer learning from Chest X-ray images. In: 2021 8th NAFOSTED conference on information and computer science (NICS), pp. 398–403. IEEE

    Google Scholar 

  15. Huynh P-H et al (2019) Enhancing gene expression classification of support vector machines with generative adversarial networks. J Inf Commun Converg Eng 17(1):14–20

    Google Scholar 

  16. Huynh P-H et al (2020) Improvements in the large p, small n classification issue. SN Comput Sci 1:1–19

    Article  Google Scholar 

  17. Huynh PH, Nguyen VH (2023) A novel ensemble of support vector machines for improving medical data classification. Eng Innov 4:47–66

    Article  Google Scholar 

  18. Kalantari A et al (2018) Computational intelligence approaches for classification of medical data: state-of-the-art, future challenges and research directions. Neurocomputing 276:2–22

    Article  Google Scholar 

  19. L Breiman RAO, J Friedman CJ (1984) Stone: classification and regression trees. Wadsworth Int Group 8:452–456

    Google Scholar 

  20. MacEachern SJ, Forkert ND (2021) Machine learning for precision medicine. Genome 64(4):416–425

    Article  Google Scholar 

  21. Obstfeld AE (2023) Hematology and machine learning. J Appl Lab Med 8(1):129–144

    Article  Google Scholar 

  22. Qi Y (2012) Random forest for bioinformatics. Ensemble Mach Learn Methods Appl, 307–323

    Google Scholar 

  23. Rahman MM, Davis DN (2013) Addressing the class imbalance problem in medical datasets. Int J Mach Learn Comput 3(2):224

    Google Scholar 

  24. Vijayarani S, Sudha S (2015) An efficient clustering algorithm for predicting diseases from hemogram blood test samples. Indian J Sci Technol 8(17):1

    Google Scholar 

  25. Vinisha FA, Sujihelen L (2022) Study on missing values and outlier detection in concurrence with data quality enhancement for efficient data processing. In: 2022 4th international conference on smart systems and inventive technology (ICSSIT), pp 1600–1607 IEEE

    Google Scholar 

  26. Vujović Z (2021) Classification model evaluation metrics. Int J Adv Comput Sci Appl 12(6):599–606

    Google Scholar 

  27. Wang Q et al (2018) An efficient random forests algorithm for high dimensional data classification. Adv Data Anal Classif, 1–20

    Google Scholar 

  28. Zhu M et al (2018) Class weights random forest algorithm for processing class imbalanced medical data. IEEE Access. 6:4641–4652

    Article  Google Scholar 

  29. Zini G (2005) Artificial intelligence in hematology. Hematology 10(5):393–400

    Article  Google Scholar 

Download references

Acknowledgment

This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCM) under grant number C2024-16-02.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Phuoc-Hai Huynh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huynh, PH., Nguyen, NM., Tran, TN., Doan, TN. (2024). Improvements in the Imbalanced Hemogram Data Classification. In: Triwiyanto, T., Rizal, A., Caesarendra, W. (eds) Proceedings of the 4th International Conference on Electronics, Biomedical Engineering, and Health Informatics. ICEBEHI 2023. Lecture Notes in Electrical Engineering, vol 1182. Springer, Singapore. https://doi.org/10.1007/978-981-97-1463-6_23

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-1463-6_23

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-1462-9

  • Online ISBN: 978-981-97-1463-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics