Abstract
Most of the organizations are mainly focusing on large datasets for automatic mining of necessary information from big medical data. The major issue of the big medical data is about its complex data sets and volume, which is gradually increasing. This paper intends to propose a big data classification model (heart disease) in health care, which includes certain phases or steps. The steps are as follows: (1) Map-reduce framework (2) support vector machine (SVM) (3) optimized decision tree classifier (DT). Initially, the big data is supplied as the input to the MapReduce Framework, where it reduces the data content through some major operations. This framework uses the principle component analysis to reduce the dimensions of data. The reduced data is subjected to SVM, where it outputs the classes. The output data from SVM is processed with a new contribution called ‘Data transformation’ that paves way for optimal rule generation in decision tree classifier. The advanced optimization concept is involved in this process to optimize the weight and integer in data transformation. This paper introduces a new algorithm namely divergence based grey wolf optimization (DGWO). Finally, the transformed data is subjected to DT, where the classification takes place. The proposed DGWO model is compared over other conventional methods like firefly algorithm, artificial bee colony algorithm, particle swarm optimization algorithm, genetic algorithm and grey wolf optimizer algorithms.
Similar content being viewed by others
Abbreviations
- SVM:
-
Support vector machine
- DT:
-
Decision tree
- PCA:
-
Principle component analysis
- DGWO:
-
Divergence based grey wolf optimization
- FF:
-
Firefly
- ABC:
-
Artificial bee colony
- PSO:
-
Particle swarm optimization
- GA:
-
Genetic algorithm
- GWO:
-
Grey wolf optimizer
- SVR:
-
Support vector regression
- KNN:
-
k-Nearest neighbors
- FP:
-
Frequent pattern
- CAR:
-
Classification association rules
- FRBCS:
-
Fuzzy rule-based classification system
- E2LM:
-
Elastic extreme learning machine
- LDA:
-
Linear discriminant analysis
- nPCA:
-
Noisy principal component analysis
- NPV:
-
Net present value
- MCC:
-
Matthews correlation coefficient
- FNR:
-
False negative rate
- FDR:
-
False discovery rate
References
Khatib EJ, Barco R, Munoz P, La Bandera ID, Serrano I (2016) Self-healing in mobile networks with big data. IEEE Commun Mag 54(1):114–120
Vatrapu R, Mukkamala RR, Hussain A, Flesch B (2016) Social set analysis: a set theoretical approach to big data analytics. IEEE Access 4:2542–2571
Wang B, Fang B, Wang Y, Liu H, Liu Y (2016) Power system transient stability assessment based on big data and the core vector machine. IEEE Trans Smart Grid 7(5):2561–2570
Zhang Q, Yang LT, Chen Z (2015) Deep computation model for unsupervised feature learning on big data. IEEE Trans Serv Comput 9(1):161–171
Hossain MS, Muhammad G (2016) Healthcare big data voice pathology assessment framework. IEEE Access 4:7806–7815
IBM (2012) Bringing big data to the enterprise. http://www-01.ibm.com/software/data/bigdata/. Accessed 4 Aug 2018
Hussain A, Cambria E, Schuller B, Howard N (2014) Affective neural networks and cognitive learning systems for big data analysis. Neural Netw 58:1–3
Zikopoulo P, Eaton C (2011) Understanding big data: analytics for enterprise class hadoop and streaming data. McGraw-Hill, New York
Madden S (2012) From databases to big data. IEEE Internet Comput 16(3):4–6
Seele P (2017) Predictive sustainability control: a review assessing the potential to transfer big data driven ‘predictive policing’ to corporate sustainability management. J Clean Prod 153:673–686
Raghupathi W, Raghupathi V (2014) Big data analytics in healthcare: promise and potential. Health Inf Sci Syst 2:3
Al-Jarrah OY, Yoo PD, Muhaidat S, Karagiannidis GK, Taha K (2015) Efficient machine learning for big data: a review. Big Data Res 2(3):87–93
L’Heureux A, Grolinger K, Elyamany HF, Capretz MAM (2017) Machine learning with big data: challenges and approaches. IEEE Access 5:7776–7797
Ma Y, Wang Y, Yang J, Miao Y, Li W (2017) Big health application system based on health internet of things and big data. IEEE Access 5:7885–7897
GARTNER (2012) Big data. Disponível em: http://www.gartner.com/it-glossary/big-data/. Acesso em: 20 Set. 2014
Kazakevičiūtė A, Olivo M (2017) Point separation in logistic regression on Hilbert space-valued variables. Stat Probab Lett 128:84–88
Ding S, Zhang X, An Y, Xue Yu (2017) Weighted linear loss multiple birth support vector machine based on information granulation for multi-class classification. Pattern Recognit 67:32–46
Cheng K, Zhenzhou L, Wei Y, Shi Y, Zhou Y (2017) Mixed kernel function support vector regression for global sensitivity analysis. Mech Syst Signal Process 96:201–214
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
Deng Z, Zhu X, Cheng D, Zong M, Zhang S (2016) Efficient kNN classification algorithm for big data. Neurocomputing 195:143–148
Yip S-C, Wong KS, Hew W-P, Gan M-T, Phan Raphael C-W, Tan S-W (2017) Detection of energy theft and defective smart meters in smart grids using linear regression. Int J Electr Power Energy Syst 91:230–240
Yang K, Yan X, Fan J, Luo Z (2017) Leader–follower congruence in proactive personality and work engagement: a polynomial regression analysis. Pers Individ Dif 105:43–46
Jinyin C, Xiang L, Haibing Z, Xintong B (2017) A novel cluster center fast determination clustering algorithm. Appl Soft Comput 57:539–555
Chévez P, Barbero D, Martini I, Discoli C (2017) Application of the k-means clustering method for the detection and analysis of areas of homogeneous residential electricity consumption at the Great La Plata region, Buenos Aires, Argentina. Sustain Cities Soc 32:115–129
Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
Kobayashi M (2017) Uniqueness theorem for quaternionic neural networks. Signal Process 136:102–106
Zhang H, Kang Y-L, Zhu Y-Y, Zhao K-X, Liang Jun-Yu, Ding L, Zhang T-G, Zhang J (2017) Novel naïve Bayes classification models for predicting the chemical Ames mutagenicity. Toxicol In Vitro 41:56–63
Bechini A, Marcelloni F, Segatori A (2016) A MapReduce solution for associative classification of big data. Inf Sci 332:33–55
Lopez V, del Río S, Benítez JM, Herrera F (2015) Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data. Fuzzy Sets Syst 258:5–38
Xin J, Wang Z, Luxuan Q, Wang G (2015) Elastic extreme learning machine for big data classification. Neurocomputing 149(Part A):464–471
Nair LR, Shetty SD, Shetty SD (2017) Applying spark based machine learning model on streaming big data for health status prediction. Comput Electr Eng 65:393–399
Shalaginov A, Franke K (2017) Big data analytics by automated generation of fuzzy rules for network forensics readiness. Appl Soft Comput 52:359–375
Scardapane S, Wang D, Panella M (2016) A decentralized training algorithm for echo state networks in distributed big data applications. Neural Netw 78:65–74
Ulfarsson MO, Palsson F, Sigurdsson J, Sveinsson JR (2016) Classification of big data with application to imaging genetics. Proc IEEE 104(11):2137–2154
Maćkiewicz A, Ratajczak W (1993) Principal components analysis (PCA). Comput Geosci 19(3):303–342
Yuan Y, Zhang M, Luo P, Ghassemlooy Z, Lang L, Wang D, Zhang B, Han D (2017) SVM-based detection in visible light communications. Optik 151:55–64
Maitrey S, Jha CK (2015) MapReduce: simplified data analysis of big data. Procedia Comput Sci 57:563–571
Jenhani I, Amor NB, Elouedi Z (2008) Decision trees as possibilistic classifiers. Int J Approx Reason 48(3):784–807
Muro C, Escobedo R, Spector L, Coppinger R (2011) Wolf-pack (Canis lupus) hunting strategies emerge from simple rules in computational simulations. Behav Process 88:192–197
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61
Vosooghifard M, Ebrahimpour H (2015) Applying grey wolf optimizer-based decision tree classifier for cancer classification on gene expression data. In: 2015 5th international conference on computer and knowledge engineering (ICCKE), Mashhad, pp 147–151
Gandomi AH, Yang X-S, Talatahari S, Alavi AH (2013) Firefly algorithm with chaos. Commun Nonlinear Sci Numer Simul 18(1):89–98
Karaboga D, Basturk B (2008) On the performance of artificial bee colony (ABC) algorithm. Appl Soft Comput 8(1):687–697
Marini F, Walczak B (2015) Particle swarm optimization (PSO). A tutorial. Chemometr Intell Lab Syst 149:153–165
McCal J (2005) Genetic algorithms for modelling and optimisation. J Comput Appl Math 184(1):205–222
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Game, P.S., Vaze, V. & Emmanuel, M. Optimized Decision tree rules using divergence based grey wolf optimization for big data classification in health care. Evol. Intel. 15, 971–987 (2022). https://doi.org/10.1007/s12065-019-00267-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-019-00267-w