Improvements in the Imbalanced Hemogram Data Classification

Huynh, Phuoc-Hai; Nguyen, Ngoc-Minh; Tran, Trung-Nguyen; Doan, Thanh-Nghi

doi:10.1007/978-981-97-1463-6_23

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 1182))

Included in the following conference series:

International Conference on Electronics, Biomedical Engineering, and Health Informatics

18 Accesses

Abstract

The exponential growth of hospital information systems (HIS) has led to the accumulation of vast amounts of medical data, necessitating effective analysis methods to enhance the quality and efficiency of medical services. Machine learning has emerged as a valuable technology for the automated and accurate analysis of medical data, offering potential applications in disease diagnosis and treatment. This study aims to contribute to the advancement of classification methods and address data imbalance issues in the context of hematological data. Specifically, we propose an efficient algorithm for disease classification utilizing hemogram blood test samples, employing the random forest algorithm in conjunction with the synthetic minority oversampling technique. Experimental results using real hematological data from a local hospital demonstrate the superiority of the proposed method, achieving an impressive accuracy rate of up to 97.75% and an Area Under the Curve value of up to 98.65%. The findings underscore the value of leveraging machine learning techniques in diagnoses and treatment in clinical practice, especially when integrated into HIS systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Akhtar A et al (2021) COVID-19 detection from CBC using machine learning techniques. Int J Technol Innov Manag IJTIM 1(2):65–78
Google Scholar
Akter F et al (2018) Classification of hematological data using data mining technique to predict diseases. J Comput Commun 6(4):76
Google Scholar
Alsheref FK, Gomaa WH (2019) Blood diseases detection using classical machine learning algorithms. Int J Adv Comput Sci Appl 10:7
Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Article Google Scholar
Breiman L et al (1984) Classification and regression T rees (Monterey, California: Wadsworth). Inc
Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32
Article Google Scholar
Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol TIST 2(3):27
Google Scholar
Chawla NV et al (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article Google Scholar
Cortes C, Vapnik V (1995) Support vector machine. Mach Learn 20(3):273–297
Article Google Scholar
Deo RC (2015) Machine learning in medicine. Circulation 132(20):1920–1930
Article Google Scholar
Doewes RI et al (2022) Diagnosis of COVID-19 through blood sample using ensemble genetic algorithms and machine learning classifier. World J Eng 19(2):175–182
Article Google Scholar
Fix E, Hodges J (1952) Discriminatory analysis-nonparametric discrimination: Small sample performance. California Univ, Berkeley
Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat, 1189–1232
Google Scholar
Huynh P-H et al (2021) Enhancing COVID-19 prediction using transfer learning from Chest X-ray images. In: 2021 8th NAFOSTED conference on information and computer science (NICS), pp. 398–403. IEEE
Google Scholar
Huynh P-H et al (2019) Enhancing gene expression classification of support vector machines with generative adversarial networks. J Inf Commun Converg Eng 17(1):14–20
Google Scholar
Huynh P-H et al (2020) Improvements in the large p, small n classification issue. SN Comput Sci 1:1–19
Article Google Scholar
Huynh PH, Nguyen VH (2023) A novel ensemble of support vector machines for improving medical data classification. Eng Innov 4:47–66
Article Google Scholar
Kalantari A et al (2018) Computational intelligence approaches for classification of medical data: state-of-the-art, future challenges and research directions. Neurocomputing 276:2–22
Article Google Scholar
L Breiman RAO, J Friedman CJ (1984) Stone: classification and regression trees. Wadsworth Int Group 8:452–456
Google Scholar
MacEachern SJ, Forkert ND (2021) Machine learning for precision medicine. Genome 64(4):416–425
Article Google Scholar
Obstfeld AE (2023) Hematology and machine learning. J Appl Lab Med 8(1):129–144
Article Google Scholar
Qi Y (2012) Random forest for bioinformatics. Ensemble Mach Learn Methods Appl, 307–323
Google Scholar
Rahman MM, Davis DN (2013) Addressing the class imbalance problem in medical datasets. Int J Mach Learn Comput 3(2):224
Google Scholar
Vijayarani S, Sudha S (2015) An efficient clustering algorithm for predicting diseases from hemogram blood test samples. Indian J Sci Technol 8(17):1
Google Scholar
Vinisha FA, Sujihelen L (2022) Study on missing values and outlier detection in concurrence with data quality enhancement for efficient data processing. In: 2022 4th international conference on smart systems and inventive technology (ICSSIT), pp 1600–1607 IEEE
Google Scholar
Vujović Z (2021) Classification model evaluation metrics. Int J Adv Comput Sci Appl 12(6):599–606
Google Scholar
Wang Q et al (2018) An efficient random forests algorithm for high dimensional data classification. Adv Data Anal Classif, 1–20
Google Scholar
Zhu M et al (2018) Class weights random forest algorithm for processing class imbalanced medical data. IEEE Access. 6:4641–4652
Article Google Scholar
Zini G (2005) Artificial intelligence in hematology. Hematology 10(5):393–400
Article Google Scholar

Download references

Acknowledgment

This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCM) under grant number C2024-16-02.

Author information

Authors and Affiliations

Faculty of Information Technology, An Giang University, An Giang, Vietnam
Phuoc-Hai Huynh, Ngoc-Minh Nguyen & Thanh-Nghi Doan
Vietnam National University, Ho Chi Minh City, Vietnam
Phuoc-Hai Huynh, Ngoc-Minh Nguyen & Thanh-Nghi Doan
An Giang Province Regional General Hospital, Chau Doc, Vietnam
Trung-Nguyen Tran

Authors

Phuoc-Hai Huynh
View author publications
You can also search for this author in PubMed Google Scholar
Ngoc-Minh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Trung-Nguyen Tran
View author publications
You can also search for this author in PubMed Google Scholar
Thanh-Nghi Doan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Phuoc-Hai Huynh .

Editor information

Editors and Affiliations

Medical Electronics Technology, Poltekkes Kemenkes Surabaya, Surabaya, Indonesia
Triwiyanto Triwiyanto
School of Electrical Engineering, Telkom University, Bandung, Indonesia
Achmad Rizal
Faculty of Integrated Technologies, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
Wahyu Caesarendra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huynh, PH., Nguyen, NM., Tran, TN., Doan, TN. (2024). Improvements in the Imbalanced Hemogram Data Classification. In: Triwiyanto, T., Rizal, A., Caesarendra, W. (eds) Proceedings of the 4th International Conference on Electronics, Biomedical Engineering, and Health Informatics. ICEBEHI 2023. Lecture Notes in Electrical Engineering, vol 1182. Springer, Singapore. https://doi.org/10.1007/978-981-97-1463-6_23

Download citation

DOI: https://doi.org/10.1007/978-981-97-1463-6_23
Published: 28 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-1462-9
Online ISBN: 978-981-97-1463-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics