Deep Learning for Health Informatics: A Secure Cellular Automata

Health informatics has gained a greater focus as the data analytics role has become vital for the last two decades. Many machine learning-based models have evolved to process the huge data involved in this sector. Deep Learning (DL) augmented with Non-Linear Cellular Automata (NLCA) is becoming a powerful tool with great potential to process big data. This will help to develop a system that facilitates parallelization, rapid data storage, and computational power with improved security parameters. This paper provides a novel and robust mechanism with deep learning augmented with non-linear cellular automata with greater security, adaptability for health informatics. The proposed mechanism is adaptableandcanaddress many open problems in medical informatics, bioinformatics, and medical imaging. The security parameters considered in this model are Confidentiality, authorization, and integrity. This method is evaluated for performance, and it reports an average accuracy of 89.32%. The parameters precision, sensitivity, and specificity are considered to measure to measure the accuracy of the model. accuracy of 94.78% with an error rate of less than 5.2%. The accuracy of the model to predict diabates also tends to increase with the number of epochs. After reaching 60 epochs, our proposed classifier reports the highest accuracy of 96.7 with an error rate of less than 10.6%.

Health informatics has gained a greater focus as the data analytics role has become vital for the last two decades. Many machine learningbased models have evolved to process the huge data involved in this sector. Deep Learning (DL) augmented with Non-Linear Cellular Automata (NLCA) is becoming a powerful tool with great potential to process big data. This will help to develop a system that facilitates parallelization, rapid data storage, and computational power with improved security parameters. This paper provides a novel and robust mechanism with deep learning augmented with non-linear cellular automata with greater security, adaptability for health informatics. The proposed mechanism is adaptableandcanaddress many open problems in medical informatics, bioinformatics, and medical imaging. The security parameters considered in this model are Confidentiality, authorization, and integrity. This method is evaluated for performance, and it reports an average accuracy of 89.32%. The parameters precision, sensitivity, and specificity are considered to measure to measure the accuracy of the model. Copy Right, IJAR, 2020,. All rights reserved.

…………………………………………………………………………………………………….... Introduction:-
Cellular automata augmented with deep learning are one of the exiting trends in Machine Learning. The foundations of C.A. with complement and un-complemented rule transitions, together with convolution neural networks (CNN) have a strong mathematical foundation and architecture to address challenges in data evolved through health.Clinical imaging can create highlights that are progressively refined and hard to expound in graphic methods. Verifiable highlights could decide fibroids and polyps [1], and describe abnormalities in tissue morphology, for example, tumors [2]. In translational bioinformatics, such highlights may likewise decide nucleotide successions that could tie a DNA or RNA strand to a protein [3]. A fast flood of enthusiasm for profound learning as of late as far as the number of papers distributed in sub-fields in wellbeing informatics, including bioinformatics, clinical imaging, inescapable detecting, and clinical informatics.
Pradipta Maji [4], [5] has explored the use of C.A. in design grouping with certain esteemed information. A genetic algorithm is used to implement Fuzzy Cellular Automata, which is a special class of C.A. Pradipta Maji et al. has proposed a hypothesis and utilization of C.A. for design arrangement [6]. A genetic algorithm is used to develop fuzzy MACA. The same authors [7] have additionally proposed the mistake rectifying ability of cell automata dependent on cooperative memory. The ideal C.A. is advanced with the definition of a reenacted toughening program, which can be helpful in VLSI innovation. We have reviewed various types of CA [8], [9] that can be applied for this technique.

880
Deep Learning is productive when massive data is available for training, and these models have solved many complicated, dynamic real-time problems with higher accuracy with time. CNN is a unique class of neural networks [3] that processes known data, which has grid topology. CNN has many applications, and it operates on a mathematical operator, which is called convolution. It uses many linear operators, represented in matrix form, and then extracts the features of the samples. We propose a distinctive architecture that processes DNA sequence, and operates directly on the characters and uses simple pooling operations & convolutions, which is termed as CNN* augmented with the cellular automata rules to identify these diseases. The main challenge in this research is mapping of the Medical Informatics characteristics to CNN* and proceed to train /test the classifier [18]. We have referred various mechanisms in the literature that addresses the open problems in medical informatics. After thorough literature, we found medical informatics, bioinformatics, and medical imaging is the most important areas in Health Informatics [16]. In medical informatics, the vital applications are heart diseases and analysis of human behavior from the health records stored electronically. In bioinformatics, the critical applications we identified are promoter prediction, gene prediction from genomic data. In medical imaging, we found skin cancer, diabetes prediction from clinical images are vital problems, as shown in table 1. An extensive literature survey was done on the problems cited above. After this step, we understood that DL with CA [17] could process both images and text to process input related to health informatics.

Design of HI-DL-CA (Health Informatics-Deep Learning-Cellular Automata)
The general architecture of HI-DL-CA is shown in fig 1. The input for the classifier is a set of datasets taken from The Uniform Hospital Discharge Data Set (UHDDS), Data, and Tools of the National Center for Health Statistics. C.A. rules initially process the input as per the application requirement. When HL-DL-CA is trained to treat genomes, the data is processed in the form of three, as the codons are in the multiples of three. The encoding, in general, will be done by a non-linear C.A. method, which was depicted in fig 2. The input is forwarded to CNN(Convolution Network), which was illustrated in fig 3 to predict the output. The transitions will happen until it reaches a state termed as the attractor basin, which has the identified behavior depends on the application. Many rules such as 108,162,252,255,256 etc. can be applied based on the type of input and implementation.
Confidentiality is termed as protection of our information from unauthorized people, which can be guaranteed through proper encoding and encryption mechanisms, which was taken care of in the design. We have used AES (Advanced Encryption System) to achieve Confidentiality. Integrity protects our data from being tailored by the unauthorized & untrusted parties. Repeated Hashing is implemented to provide integrity. Availability is termed as guaranteeing the authorized parties to access the system and information when required. This is addressed by building a robust architecture that can resist DDoS attacks.
The working of CNN is shown in fig 3. Initial convolution layers will process general characteristics, and when the iterations happen deeper go, they will treat more complex features very easity.CA strengthens the filters we used during training and testing-batch normalization aim at improving the stability, speed, the performance of CNN. Activation functions augmented with C.A. rules are used to induce non-linearity into the system, and these are located in dense layers. This is mainly used to standardize the batch of inputs and reduces the number of epochs for training. The minimum, mean and maximum values of the above parameters are extracted from the dataset and processed them for the prediction. Each convolution uses 4 X 4 kernel, followed by 3X3, followed by 2 X2. After processing, the datasets collected are classified as per the application requirement.With the above discussion, we are confident HI-DL-CA provides a secure mechanism for health informatics. The implementation of the proposed mechanism is discussed in the next section.

Implementation and Comparison of HI-DL-CA
The input for the classifier is a set of datasets taken from The Uniform Hospital Discharge Data Set (UHDDS), Data, and Tools of the National Center for Health Statistics, as discussed in the earlier sections. The genomic data, clinical images, digitalized health records of patients are extracted and processed to verify the validity of our developed system. The one advantage of HL-DL-CA is it is trained to process text ie. genomic input in terms of DNA sequence or Amino Acid sequence, and also, the second version can process images like health records, X rays, etc.

HI-DL-CA for Medical Informatics
As discussed earlier, we have identified two potential problems in medical informatics, i.e.,heart diseases and analysis of human behavior from the health records stored electronically. We have applied our developed classifier on these two problems identified. The classifier has processed the images of people that are suffering from the heart attack and health records stored electronically. For evaluating the developed classifier accuracy, sensitivity, specificity, and precision are considered.

884
The model accuracy prediction and error of heart attack is illustrated in fig 4. The accuracy of the model tends to increase with the number of epochs. After reaching 60 epochs, our proposed classifier reports the highest accuracy of 89.95% with an error rate of less than 6%. The accuracy of the model to predict the nature of employees from health records also tends to increase with the number of epochs. After reaching 60 epochs, our proposed classifier reports the highest accuracy of 84.95% with an error rate of less than 12.3%. The performance of our classifier to predict heart attack is compared with the existing literature, which was reported in fig 5. We have identified four best mechanisms Significant Patterns(S.P.) [6], Association Rule Mining(ARM) [7], Big Data Analytics(BDA) [8], and Fuzzy C Means(FCM) [9] to compare the performance. We found FCM report an accuracy of 83.6, which is better among the existing literature, and HI-DL-CA indicates an accuracy of 89.69%. The specificity, sensitivity, and precision of our approach to standard approaches were reported in table 2.

HI-DL-CA for Bioinformatics
In continuation of the earlier discussion, HL-DL-CA was trained and tested to process genomic data to address problems in bioinformatics. For example, when protein-coding regions are to identified th input is a DNA sequence, when the protein structure is to be identified, the input is an Amino Acid sequence and so on. The architecture of HL-DL-CA is so versatile and robust to process any information for accurate prediction.
The model accuracy prediction and error promoter prediction is illustrated in fig 6. The accuracy of the model tends to increase with the number of epochs. After reaching 60 epochs, our proposed classifier reports the highest accuracy of 92.36% with an error rate of less than 7.2%. The accuracy of the model to predict genes also tends to increase with the number of epochs. After reaching 60 epochs, our proposed classifier reports the highest accuracy of 89.27 with an error rate of less than 14.6%.
The performance of our classifier to predict promoter is compared with the existing literature, which was reported in fig 7. We have identified three best mechanisms Neural Networks(N.N.) [10], DNA energies(DNAE) [11], and Support Vector Machin(SVM) [12] to compare the performance. We found SVM report an accuracy of 87.89, which is better among the existing literature, and HI-DL-CA indicates an accuracy of 92.36%. The specificity, sensitivity, and precision of our approach to standard approaches were reported in table 3. 885

HI-DL-CA for Medical Imaging
As discussed in secion 3.1, HL-DL-CA was trained and tested to process clinical images to address problems in medical imaging. We hav identified skin cancer and diabetes as potential prolemsin medical imaging.Thearchitectur of HL-DL-CA is so versatile and robust to process any number of images for accurate prediction.
The model accuracy prediction and error skin cancer prediction is illustrated in fig 8.

Conclusion:-
We have successfully developed a robust, secure, and adaptable mechanism that provides high accuracy, specificity, precision, sensitivity, availability, integrity, and Confidentiality for majority applications of health informatics. HI-DL-CA reports an average accuracy of 89.95% while addressing the problems in Medical Informatics. The proposed classifier indicates an average accuracy of 92.36%,94.78%, while solving the problems in Bioinformatics and Medical Imaging, respectively. The analysis of X-rays pertaining to the patients affected by COVID-19 can be done by using this framework that can predict the death rate variations. This framework can be improved by considering more robust and secure parameters that can attract more people to use these systems. References:-