Skip to main content

Advertisement

Log in

Optimized Decision tree rules using divergence based grey wolf optimization for big data classification in health care

  • Special Issue
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

Most of the organizations are mainly focusing on large datasets for automatic mining of necessary information from big medical data. The major issue of the big medical data is about its complex data sets and volume, which is gradually increasing. This paper intends to propose a big data classification model (heart disease) in health care, which includes certain phases or steps. The steps are as follows: (1) Map-reduce framework (2) support vector machine (SVM) (3) optimized decision tree classifier (DT). Initially, the big data is supplied as the input to the MapReduce Framework, where it reduces the data content through some major operations. This framework uses the principle component analysis to reduce the dimensions of data. The reduced data is subjected to SVM, where it outputs the classes. The output data from SVM is processed with a new contribution called ‘Data transformation’ that paves way for optimal rule generation in decision tree classifier. The advanced optimization concept is involved in this process to optimize the weight and integer in data transformation. This paper introduces a new algorithm namely divergence based grey wolf optimization (DGWO). Finally, the transformed data is subjected to DT, where the classification takes place. The proposed DGWO model is compared over other conventional methods like firefly algorithm, artificial bee colony algorithm, particle swarm optimization algorithm, genetic algorithm and grey wolf optimizer algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Abbreviations

SVM:

Support vector machine

DT:

Decision tree

PCA:

Principle component analysis

DGWO:

Divergence based grey wolf optimization

FF:

Firefly

ABC:

Artificial bee colony

PSO:

Particle swarm optimization

GA:

Genetic algorithm

GWO:

Grey wolf optimizer

SVR:

Support vector regression

KNN:

k-Nearest neighbors

FP:

Frequent pattern

CAR:

Classification association rules

FRBCS:

Fuzzy rule-based classification system

E2LM:

Elastic extreme learning machine

LDA:

Linear discriminant analysis

nPCA:

Noisy principal component analysis

NPV:

Net present value

MCC:

Matthews correlation coefficient

FNR:

False negative rate

FDR:

False discovery rate

References

  1. Khatib EJ, Barco R, Munoz P, La Bandera ID, Serrano I (2016) Self-healing in mobile networks with big data. IEEE Commun Mag 54(1):114–120

    Google Scholar 

  2. Vatrapu R, Mukkamala RR, Hussain A, Flesch B (2016) Social set analysis: a set theoretical approach to big data analytics. IEEE Access 4:2542–2571

    Google Scholar 

  3. Wang B, Fang B, Wang Y, Liu H, Liu Y (2016) Power system transient stability assessment based on big data and the core vector machine. IEEE Trans Smart Grid 7(5):2561–2570

    Google Scholar 

  4. Zhang Q, Yang LT, Chen Z (2015) Deep computation model for unsupervised feature learning on big data. IEEE Trans Serv Comput 9(1):161–171

    Google Scholar 

  5. Hossain MS, Muhammad G (2016) Healthcare big data voice pathology assessment framework. IEEE Access 4:7806–7815

    Google Scholar 

  6. IBM (2012) Bringing big data to the enterprise. http://www-01.ibm.com/software/data/bigdata/. Accessed 4 Aug 2018

  7. Hussain A, Cambria E, Schuller B, Howard N (2014) Affective neural networks and cognitive learning systems for big data analysis. Neural Netw 58:1–3

    Google Scholar 

  8. Zikopoulo P, Eaton C (2011) Understanding big data: analytics for enterprise class hadoop and streaming data. McGraw-Hill, New York

    Google Scholar 

  9. Madden S (2012) From databases to big data. IEEE Internet Comput 16(3):4–6

    Google Scholar 

  10. Seele P (2017) Predictive sustainability control: a review assessing the potential to transfer big data driven ‘predictive policing’ to corporate sustainability management. J Clean Prod 153:673–686

    Google Scholar 

  11. Raghupathi W, Raghupathi V (2014) Big data analytics in healthcare: promise and potential. Health Inf Sci Syst 2:3

    Google Scholar 

  12. Al-Jarrah OY, Yoo PD, Muhaidat S, Karagiannidis GK, Taha K (2015) Efficient machine learning for big data: a review. Big Data Res 2(3):87–93

    Google Scholar 

  13. L’Heureux A, Grolinger K, Elyamany HF, Capretz MAM (2017) Machine learning with big data: challenges and approaches. IEEE Access 5:7776–7797

    Google Scholar 

  14. Ma Y, Wang Y, Yang J, Miao Y, Li W (2017) Big health application system based on health internet of things and big data. IEEE Access 5:7885–7897

    Google Scholar 

  15. GARTNER (2012) Big data. Disponível em: http://www.gartner.com/it-glossary/big-data/. Acesso em: 20 Set. 2014

  16. Kazakevičiūtė A, Olivo M (2017) Point separation in logistic regression on Hilbert space-valued variables. Stat Probab Lett 128:84–88

    MathSciNet  MATH  Google Scholar 

  17. Ding S, Zhang X, An Y, Xue Yu (2017) Weighted linear loss multiple birth support vector machine based on information granulation for multi-class classification. Pattern Recognit 67:32–46

    Google Scholar 

  18. Cheng K, Zhenzhou L, Wei Y, Shi Y, Zhou Y (2017) Mixed kernel function support vector regression for global sensitivity analysis. Mech Syst Signal Process 96:201–214

    Google Scholar 

  19. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297

    MATH  Google Scholar 

  20. Deng Z, Zhu X, Cheng D, Zong M, Zhang S (2016) Efficient kNN classification algorithm for big data. Neurocomputing 195:143–148

    Google Scholar 

  21. Yip S-C, Wong KS, Hew W-P, Gan M-T, Phan Raphael C-W, Tan S-W (2017) Detection of energy theft and defective smart meters in smart grids using linear regression. Int J Electr Power Energy Syst 91:230–240

    Google Scholar 

  22. Yang K, Yan X, Fan J, Luo Z (2017) Leader–follower congruence in proactive personality and work engagement: a polynomial regression analysis. Pers Individ Dif 105:43–46

    Google Scholar 

  23. Jinyin C, Xiang L, Haibing Z, Xintong B (2017) A novel cluster center fast determination clustering algorithm. Appl Soft Comput 57:539–555

    Google Scholar 

  24. Chévez P, Barbero D, Martini I, Discoli C (2017) Application of the k-means clustering method for the detection and analysis of areas of homogeneous residential electricity consumption at the Great La Plata region, Buenos Aires, Argentina. Sustain Cities Soc 32:115–129

    Google Scholar 

  25. Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106

    Google Scholar 

  26. Kobayashi M (2017) Uniqueness theorem for quaternionic neural networks. Signal Process 136:102–106

    Google Scholar 

  27. Zhang H, Kang Y-L, Zhu Y-Y, Zhao K-X, Liang Jun-Yu, Ding L, Zhang T-G, Zhang J (2017) Novel naïve Bayes classification models for predicting the chemical Ames mutagenicity. Toxicol In Vitro 41:56–63

    Google Scholar 

  28. Bechini A, Marcelloni F, Segatori A (2016) A MapReduce solution for associative classification of big data. Inf Sci 332:33–55

    Google Scholar 

  29. Lopez V, del Río S, Benítez JM, Herrera F (2015) Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data. Fuzzy Sets Syst 258:5–38

    MathSciNet  Google Scholar 

  30. Xin J, Wang Z, Luxuan Q, Wang G (2015) Elastic extreme learning machine for big data classification. Neurocomputing 149(Part A):464–471

    Google Scholar 

  31. Nair LR, Shetty SD, Shetty SD (2017) Applying spark based machine learning model on streaming big data for health status prediction. Comput Electr Eng 65:393–399

    Google Scholar 

  32. Shalaginov A, Franke K (2017) Big data analytics by automated generation of fuzzy rules for network forensics readiness. Appl Soft Comput 52:359–375

    Google Scholar 

  33. Scardapane S, Wang D, Panella M (2016) A decentralized training algorithm for echo state networks in distributed big data applications. Neural Netw 78:65–74

    MATH  Google Scholar 

  34. Ulfarsson MO, Palsson F, Sigurdsson J, Sveinsson JR (2016) Classification of big data with application to imaging genetics. Proc IEEE 104(11):2137–2154

    Google Scholar 

  35. Maćkiewicz A, Ratajczak W (1993) Principal components analysis (PCA). Comput Geosci 19(3):303–342

    Google Scholar 

  36. Yuan Y, Zhang M, Luo P, Ghassemlooy Z, Lang L, Wang D, Zhang B, Han D (2017) SVM-based detection in visible light communications. Optik 151:55–64

    Google Scholar 

  37. Maitrey S, Jha CK (2015) MapReduce: simplified data analysis of big data. Procedia Comput Sci 57:563–571

    Google Scholar 

  38. Jenhani I, Amor NB, Elouedi Z (2008) Decision trees as possibilistic classifiers. Int J Approx Reason 48(3):784–807

    Google Scholar 

  39. Muro C, Escobedo R, Spector L, Coppinger R (2011) Wolf-pack (Canis lupus) hunting strategies emerge from simple rules in computational simulations. Behav Process 88:192–197

    Google Scholar 

  40. Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61

    Google Scholar 

  41. Vosooghifard M, Ebrahimpour H (2015) Applying grey wolf optimizer-based decision tree classifier for cancer classification on gene expression data. In: 2015 5th international conference on computer and knowledge engineering (ICCKE), Mashhad, pp 147–151

  42. Gandomi AH, Yang X-S, Talatahari S, Alavi AH (2013) Firefly algorithm with chaos. Commun Nonlinear Sci Numer Simul 18(1):89–98

    MathSciNet  MATH  Google Scholar 

  43. Karaboga D, Basturk B (2008) On the performance of artificial bee colony (ABC) algorithm. Appl Soft Comput 8(1):687–697

    Google Scholar 

  44. Marini F, Walczak B (2015) Particle swarm optimization (PSO). A tutorial. Chemometr Intell Lab Syst 149:153–165

    Google Scholar 

  45. McCal J (2005) Genetic algorithms for modelling and optimisation. J Comput Appl Math 184(1):205–222

    MathSciNet  Google Scholar 

  46. Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pravin S. Game.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Game, P.S., Vaze, V. & Emmanuel, M. Optimized Decision tree rules using divergence based grey wolf optimization for big data classification in health care. Evol. Intel. 15, 971–987 (2022). https://doi.org/10.1007/s12065-019-00267-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-019-00267-w

Keywords

Navigation