SASDL and RBATQ: Sparse Autoencoder With Swarm Based Deep Learning and Reinforcement Based Q-Learning for EEG Classification

The most vital information about the electrical activities of the brain can be obtained with the help of Electroencephalography (EEG) signals. It is quite a powerful tool to analyze the neural activities of the brain and various neurological disorders like epilepsy, schizophrenia, sleep related disorders, parkinson disease etc. can be investigated well with the help of EEG signals. Goal: In this paper, two versatile deep learning methods are proposed for the efficient classification of epilepsy and schizophrenia from EEG datasets. Methods: The main advantage of using deep learning when compared to other machine learning algorithms is that it has the capability to accomplish feature engineering on its own. Swarm intelligence is also a highly useful technique to solve a wide range of real-world, complex, and non-linear problems. Therefore, taking advantage of these factors, the first method proposed is a Sparse Autoencoder (SAE) with swarm based deep learning method and it is named as (SASDL) using Particle Swarm Optimization (PSO) technique, Cuckoo Search Optimization (CSO) technique and Bat Algorithm (BA) technique; and the second technique proposed is the Reinforcement Learning based on Bidirectional Long-Short Term Memory (BiLSTM), Attention Mechanism, Tree LSTM and Q learning, and it is named as (RBATQ) technique. Results and Conclusions: Both these two novel deep learning techniques are tested on epilepsy and schizophrenia EEG datasets and the results are analyzed comprehensively, and a good classification accuracy of more than 93% is obtained for all the datasets.


I. INTRODUCTION
T HE physical activities of the nervous system can be com- prehensively reflected by the EEG signals [1].If there is any change in the brain function caused due to neurological disorders, then it can be detected by EEG signals.In the field of medicine, an objective basis for diagnosing certain disorders is provided by the information processing of EEG signals, thereby enabling the clinicians to provide effective treatment for the particular brain disorder.Earlier, a manual detection and analysis of the EEG waveforms was done and due to its intensive labor and long-time consumption, automated classification of EEG signals to diagnose the neurological disorder came into existence [2].Therefore, classification of EEG signals is quite a vital task with respect to the identification, diagnosis and even prevention of brain related disease.In this paper, the classification of epilepsy EEG signals and schizophrenia EEG signals are dealt in much detail.Epilepsy is a chronic disease characterized by sudden and repeated seizures [3].Due to various initiating locations and transmission modes of the abnormal electrical activities in brain, different clinical manifestation occurs such as loss of consciousness, limb convulsions, behavioral problems etc [4].The most prevalent technique to examine the brain activities in epileptic patients is with the help of EEG.For epileptic patients, the EEG signals of their brain activity are split into interictal, pre-ictal and ictal states [5].An unusual pattern is exhibited in the EEG signals where the seizure occurs.A distributed pattern is also sometimes exhibited in the EEG signal where the seizure occurs.A distinctive pattern is also shown by the EEG signals of interictal state and preictal state.Therefore, to differentiate these epileptic states, these patterns in the EEG signals are highly useful so that the occurrence of a seizure can be known thereby reducing the deadly effects it has on the patients [6].Seizure detection and classification has been studied for the past two decades with the help of machine learning and deep learning techniques, and a good survey about it can be found in [7], [8] enabling the authors not to repeat the past works again and again.However, the most important ideas incorporating machine learning and deep learning since the past four years is discussed here for the better understanding of the readers.A transfer learning along with semi-supervised learning for seizure classification from EEG signals was proposed by Jiang et al., where the average accuracy was shown to be higher than 95% in most cases [9].A tunable Q wavelet transform dependent on multiscale entropy was proposed for automated classification of epileptic EEG signals where the highest accuracy of even 100% was achieved in few cases [10].A local mean decomposition (LMD)-based feature analysis with Support Vector Machine (SVM) was utilized by Zhang and Chen where the classification accuracy reached an accuracy of 98.10% [11].In the year 2018, for automated classification of epilepsy from EEG signals, deep learning approaches was proposed in [12], [13] and a morphological component analysis based SVM classification was proposed in [14], and these three approaches produced a high classification accuracy of more than 95% as per the consideration of their problem requirement.A scalogram based convolution network from EEG signals was proposed in [15], a matrix determinant-based approach was utilized in [16], cross-bispectrum analysis for seizure detection in [17], a novel random forest model with grid search optimization in [18] are some of the famous works in 2019 and almost all the works have achieved a classification accuracy of 90% to 100% depending on type of case study.In the year 2020, a novel convolutional based neural network model [19], improved Radial Basis Function (RBF) analysis [20], Power Spectral Density (PSD) based deep CNN [21], imagined EEG signal analysis through fully convolutional networks [22], empirical mode decomposition analysis along with its derivative [23], a bat algorithm based SVM [24] are some of the famous works for seizure classification from EEG signals with almost all the works reporting a good classification accuracy of more than 90%.In 2021, a Jacobi polynomial transform based technique with Least Square SVM (LS-SVM) [25], an adaptive synthetic sampling approach [26], a fractal-based seizure detection technique [27], Principal Component Analysis (PCA) based Genetic Algorithm (GA) [28], significance of channel selection techniques [29] are utilized for seizure classification from EEG signals where it reported a classification accuracy of more than 90% for most of the classification cases.In 2022, sparse analysis with deep and transfer learning models were developed with an ensemble cum nature inclined classification for epilepsy classification reporting classification accuracies of more than 90% for epilepsy classification [30].
As the paper discusses schizophrenia classification also from EEG signals, recent literature about it is also discussed in the paper as follows.Schizophrenia is a serious mental disorder where people interpret reality in an abnormal manner Schizophrenia results in a combination of delusion, hallucination, and disordered thinking thereby the daily functions are severely impaired [31].Therefore, schizophrenia involves a range of problems with cognition, emotion, and behaviour.An exact cause of schizophrenia is not known, but a combination of brain chemistry, genetics and environmental factors may contribute to the development of this disease [32].EEG signals are a great boon to analyze this disorder and some of the famous works are utilized in this field are as follows.For schizophrenia EEG analysis, the EEG series splitting reported an accuracy of 92.91% [33], deep convolutional neural networks reported 98.07% for non-subject based testing and 81.26% for subject based testing [34], spectral based analysis reporting 96.77% [35], swarm computing techniques with classifiers reporting 92.l7% [31], Short Time Fourier Transform (STFT) with CNN reporting 97.00% [36], Partial Least Squares technique reporting 98.77% [37], multivariate Empirical Mode Decomposition (EMD) reporting 93.00% [38], continuous wavelet transform (CWT) with CNN reporting an accuracy of 98.60% [39], a simple CNN reporting 98.96% [40], sparse depiction with nature inclined classification and deep cum transfer learning reporting 98.72% [30] and Collatz pattern reporting an accuracy of 99.47% [41] are some of the most famous works proposed recently.In this work, the key contributions are as follows and no previous works have been reported in literature using the two developed novel deep learning models.
i) Initially a sparse autoencoder with swarm based deep neural network using PSO was developed for classifying epilepsy and schizophrenia datasets.ii) Secondly, reinforcement learning based on Q-learning was implemented successfully to classify epilepsy and schizophrenia datasets.The organization of the work is as follows.Section II explains the development of the SASDL, and Section III explains the RBATQ model.Section IV explains the results and discussion and Section V gives the conclusion.

II. DEVELOPMENT OF SASDL MODEL
An autoencoder model is developed to mitigate the dimensionality of the input [42].A feedforward neural network is utilized by this form of unsupervised learning and the autoencoder has both encoding and decoding plan An input x is usually trained and x is reconstructed to be quite similar to the input x as much as possible.Many kinds of autoencoders are available in literature such as sparse autoencoder, denoising autoencoder, stacked autoencoder etc [42].When there is a huge data space, the reconstruction of the raw data by the autoencoder can fail as it might fall into replication of the tasks.The sparse autoencoder has usually lower output dimensions and it persuades the autoencoder in reconstructing the raw data from the most useful features instead of replicating it once again.In this study, a sparse autoencoder is chosen which helps to extract the highly useful patterns which would have a very low dimensionality.These feature vectors are once again selected by the PSO/CSO/BA and finally fed into a simple deep neural network which comprises of two hidden layers along with a Softmax output layer.The input vector to the PSO/CSO/BA is given from the bottleneck of the sparse autoencoder.The bias units are the neurons termed as (+1) and these are added to the feed forward neural network with the help of cost function.In order to get a most preferable reconstruction of the input x, this step is highly useful, and it can be achieved without overfitting.The cost function of the autoencoder comprises of three steps.Assuming a dataset with a total of N training samples (x 1 , x 2 , . . ., x n ), where the i th input is indicated by x i .The reconstruction of the input x i is trained by the developed SAE with the help of function h W,b (x i ) so that its proximity to x i is very close.The squared error, sparsity term and the weight decay are the three important sections of the cost function.The weight decay aids to avoid overfitting.For all N training samples, the mean square error along with the weight decay and sparsity term is expressed as: where the sparse penalty term is represented by β, KL indicates the Kullback-Leibler divergence.The value of λ should be carefully chosen because a low value of it leads to overfitting and a high value of λ leads to underfitting.Here ReLU is chosen as the activation function represented as a, which expresses the average activated value of the hidden layer, and it is represented as: where p represents the sparsity parameter.The calculation of the sparsity term is usually done to make pj look identical and close to p as much as possible.Activation and deactivation of neurons on the hidden layer is done by this parameter.

A. PSO
A famous population-based algorithm utilized to solve optimization problems is PSO [43].The number of particles constitute the total population, and each particle indicates a candidate.The best solution is searched for by means of updating the velocity and particle vectors as per the equation: The velocity of the particle i in the d th dimension is represented as v id .The position of the particle i in the d th dimension is represented as x id .In the d th dimension, P id represents the local best and P gd represents the global best.The random numbers between 0 and 1 is represented by r 1 , r 2 .The w represents the inertia weight and c 1 , c 2 represents the acceleration coefficient for both exploitation and exploration purposes.Due to its versatility as proven in literature, PSO is chosen here in this work and is shown in Algorithm 1.

B. CSO
When dealing with CSO, the following tree main rules are used in expressing the cuckoo search process [44].Firstly, one egg is laid by a cuckoo at a particular time and its egg is dropped in a randomly chosen nest.Secondly, only the best nests which possess high quality eggs is progressed and carried on to the next generation.Thirdly, with a fixed number of available host nest, the host birds discover the egg laid by the cuckoo with a probability p a ∈ [0, 1].The host bird can then decide to either eliminate the egg or even abandon the nest completely.The algorithm of the cuckoo search is developed using these three rules and is shown in Algorithm 2. A levy flight is generally implemented when new solutions x (t+1) are generating for a cuckoo 'c' and is expressed as: where α > 0 denotes the step size.Generally, α = O(L/10) is utilized in most cases, where L denotes the characteristic scale of the problem of interest.For a random walk, the ( 5) is projected as a stochastic equation.Depending on the current location and the transition probability, the random walk in a Markov Chain is modeled.The entry wise multiplications are expressed by the product ⊕.To explore the search space here, the random walk through Levy flight process is more efficient as it has a longer step length.With the help of a Levy distribution, a random walk where random step length is obtained is provided by the Levy flight and is projected as:

C. BA
In this process, for a typical bat algorithm, the following idealized rules are utilized [45].In order to sense the distance, echo location is used by the bats and the difference between the prey and the different background barriers are known by the bats.In a random manner the bats can fly with a particular velocity v i at a position x i .The wavelength of their emitted pulses is adjusted quickly.Based on the proximity of the target, the rate of pulse emission r ∈ [0, 1] can be adjusted.The loudness varies from a large positive value A 0 to a minimum value A min , though in many ways the loudness can vary.For simplicity purposes, the following approximations can be utilized.Generally, the frequency factor f in a specific range [f min , f max ] correlates to a specific wavelength range [λ min , λ max ].For an easy implementation, any wavelength can be used depending on the specific problem.By means of adjusting the frequencies the range of the wavelength can be adjusted.While fixing the wavelength λ, the frequency too can be varied as λ and f are closely related.For simplicity reasons, it is assumed that f ∈ [0, f max ].It is well known that higher frequencies possess short wavelength and can travel only a shorter distance.The typical range is only a few meters for bats and the rate of pulse is in the range of [0, 1], where 1 implies the highest pulse emission rate and 0 implies no pulses and the procedure of it is shown in Algorithm 3.

D. Overall Framework of the Work
The overall framework with testing and training using the SASDL is depicted from Figs. 1 and 2. Initially, the dataset is split into a training set and test set.The training set is passed to SAE and the bottleneck output of the SAE is fed to the PSO/CSO/BA and then the respective output of it is fed to the DNN.The PSO/CSO/BA is used to select the best particles with the help of Algorithms 1, 2 and 3 respectively.As far as the PSO is concerned, the inertia weight is considered as 0.64.The acceleration coefficient c 1 and c 2 is considered as 1.524 and the population size is set as 30 and the total number of generations is assigned as 30.All the values were finally chosen after several trial and error-based experimentation efforts.
As far as the CSO is concerned, to fasten up the local search, the generations of the new solutions by using Levy walk is utilized.The parameters used in our experiments are as follows, nests n = 20, α = 1.5 and p α = 0.45.
As far as BA is concerned, the choosing of the parameters requires some trial-and-error experimentation in this process.By means of randomization, every bat should possess different values of both pulse emission rate and loudness.In this experiment, the initial loudness A 0 i is considered as 1 and the initial emission rate r 0 i is chosen as 0.5 as the values can be between If there is an improvement in the new solution, there will naturally be an updation of the loudness and emission rates, implying that the bats are progressing towards reaching the optimal solution.In our experiment, the value of n is chosen as 40 virtual bats.

III. DEVELOPMENT OF RBATQ MODEL
To analyze the decision process of reinforcement learning, in this paper, three deep learning techniques are utilized such as Bidirectional LSTM, attention mechanism along with Tree LSTM.To get the control policy, Q-learning algorithm is utilized here in this work.

A. Reinforcement Learning (RL)
In order to learn the control policies of the agent in an efficient manner, the most commonly used framework is RL, and it is done by means of active interaction with its environment [46].
State: Three states such as the initial state s 1 , transition state s 2 along with the end state s e is present in the internal state S of the environment.Representing the state directly from the signal is quite a difficult task as there are no appropriate measures to assess it and therefore to extract the features of signal, deep learning techniques are used which helps to indicate the circumstances in the decision process.Initially, to realize the feature extraction, a bidirectional LSTM [47] is used and to generate the initial state, attention-based methods [48] are used and it is represented as s 1 = Att(X; θ 1 ).To create the transition state s 2 , Tree-LSTM [49] is utilized here, and it is represented as s 2 = T ree(X; θ 2 ).X indicates the features of the input signal and the state parameters are expressed by θ 1 and θ 2 respectively.
Action: In the environment, there are quite a collection of predefined actions denoted by A, such as action a 1 , action a 2 , action a 3 respectively.The initial decision decides to consider a 1 or a 2 , and the next decision decides to consider a 3 or a 4 .For every action, the reward obtained is represented by R = r 1 , r 2 , r 3 , r 4 .In a state S, an action 'a' is usually considered by the agent and a reward 'r' is received from the environment.The transition adaption of the decision procedure is chosen accordingly.
Transition and Reward function: The agent considering a 1 at s 1 is then transmitted to s e by means of effectively utilizing a state transition tuple (s 1 , a 1 , r 1 , s e ).An agent usually receives a reward r 1 if the judgement of a 1 is correct.If the judgement of a 1 is incorrect, then in order to push the utilizing judgement of the initial decision, r 1 can be set accordingly.The rest of the state transition tuples and its respective reward function can be assigned in a similar manner.

B. BiLSTM Layer
Every LSTM component in the BiLSTM layer comprises of three multiplicative gates such as input gate i t , forget gate f t and output gate o t .The proportion of information can be controlled by these gates and helps it to progress on to the next time step.In each LSTM unit, a memory cell c t is also kept which helps to analyze the preceding state thereby the features of the current input signal can be well memorized.For every LSTM unit, the data sources are as follows: the feature vector x t at time t, hidden state vector h t−1 and h t+1 (before and after time t, along with the cell vector c t−1 ).The implementation of forward passes are as follows: where the weight matrices are represented by W , bias vectors are represented as b.The subscripts indicate the meaning as per the name suggestion as it is commonly represented in BiLSTM concept [47].The logistic function is indicated by σ.The execution of the backward passes with respect to time are carried out in a same fashion as the forward passes.At a time t, the hidden state vectors of two directions h t and h t are computed simultaneously in the BiLSTM layer, therefore past features and future features can be efficiently utilized in a specific time frame.The hidden state vectors of two directions h t and h t is passed to a Softmax layer at a particular time t and it is represented as: Here the weight matrices are expressed by W and the bias vector is represented by b.Attention mechanism is applied to the BiLSTM.

C. Tree LSTM
This concept was implemented in the field of natural language processing (NLP) however, the idea has been tried to biosignal processing for the first time in this paper.The development of the tree LSTM starts from its leaf node, and it is done in a recursive manner up to the root [49].On the hidden state vector of the antecedent element, the non-linear transformation is carried out so that s 2 is generated.It serves as an important transition predicament in the decision process and therefore s 2 is denoted as: s 2 = T ree(X; θ 2 ) where θ 2 indicates all the essential criterion in the Tree-LSTM.Once the transition state s 2 is generated, it is progressed to a Softmax output layer so that y r is obtained which indicates the probability of various kinds for a relation mention.A category with the highest probability is chosen, so that a 3 or a 4 can be determined easily.
Here the weight metric is represented by W and the bias vector is specified by b.A softmax layer is utilized at every dependency tree so that the category for the root node is predicted when the given inputs X are discovered at its respective children nodes.

D. Q-Learning
An approved form of reinforcement learning technique is Q-learning algorithm [50].For the agent, an optimal state-action value function Q(s, a) can be easily used to learn it.By means of consultation of Q(s, a), the agent considers an action a in state s, which is nothing but the simple estimation of the action's anticipated long-term reward.By means of analyzing a sequence of actions, some cumulative rewards can be maximized.For every state-action pairs, it is quite difficult to obtain Q(s, a) as the state space is infinite in the decision process.Therefore, using a novel network, Q(s, a) is approximated which can specify Q(s, a) as a parameterized outcome represented as Q η (s, a) = MLP (ϕ(X; θ), a, η).s 1 = Att(X; θ 1 ) is referred by s 1 = ϕ(X; θ 1 ) and s 2 = T ree(X; θ 2 ), where θ is calculated by means of pretraining the deep learning models.The parameter in the neural network is represented by η and it is learnt by implementing the famous stochastic gradient descent step with the help of RMSprop.The degree of approximation is measured with respect to the least squares error in order to estimate the real value function Q π as follows: where E represents the least square value.Instead of the real value function Q π (s, a), the estimated value function Q η (s, a) is used by the Q-learning.In the middle of the estimation Q η (s, a) and the expectation Q π (s, a), the discrepancy is reduced when the parameters are updated during every epoch.There is a continuous updation of values when the agent progresses from a random Q η (s, a) by means of utilizing the decisions and obtains the suitable reward.By carefully selecting the actions with the highest Q η (s, a ), the agent can expand its future rewards accordingly.Ultimately, the control policy π is obtained by the Q-learning algorithm.When the training procedure is carried out, BiLSTM, attention layer along with Tree-LSTM are Implement Gradient descent step: Updation Process: End for End for End for pre-trained initially.All the parameters in BiLSTM are indicated as θ 0 , all the parameters in attention layer are specified as θ 1 and all the parameters in Tree-LSTM is indicated as θ 2 are these are the main training parameters used.Deep learning is used to represent the features and RL is used to combine these three tasks in the final decision process.The standard conventional pipeline architectures fail to enable the information to flow in a sequential manner, but this RL method combines all the tasks in a sequential manner and allows to make decisions too.The decisions may have problems initially but after several epochs, a good stability can be obtained.A global updation of the parameters s is done in this architecture and therefore an eventual convergence is achieved later.Hence the feedback from decision-making can be obtained easily by the RL method thereby enabling the data to progress easily in the global architecture.The Q-learning training procedure is expressed in Algorithm 4.

IV. RESULTS AND DISCUSSION
The proposed deep learning models has been initially evaluated on the University of Bonn dataset where it deals with   E, B-E, C-E, D-E, AB-E, AC-E,  CD-E, ACD-E, ABCD-E.As far as the schizophrenia dataset is considered, it is just normal case versus schizophrenia case.All the explicit datasets for it are given in the reference [51], [52].In the epilepsy dataset, 100 single channel EEG recordings are present which has a sampling rate of 173.61 Hz along with a time duration of 23.6 seconds.The sampling of these time series is done into 4097 data points and then all these 4097 data points are further split into 23 chunks, here about 2300 samples are present in each category.For the deep learning techniques, the 2300 EEG signals are randomly divided into ten non-overlapping folds due to the adoption of a 10-fold cross validation technique utilized here for evaluation.When dealing with schizophrenia datasets, each channel has about 225,000 samples and therefore the data is specified into a matrix format of [5000 × 45].As it has about 19 channels, it is specified exactly as [5000 × 45 × 19].When the implementation of deep learning techniques happens, the schizophrenia EEG samples are randomly divided into ten non-overlapping folds due to the adoption of a 10-fold cross validation technique here.The dimensionality representation of the input is reduced by SAE where the size of the input is about (4097 × 100) for epileptic dataset and (5000 × 45) for schizophrenia dataset.It is reduced to about (2500 × 50) for epileptic dataset and (5000 × 15) for schizophrenia dataset.These useful features are provided by the bottleneck of the SAE that is fed to the PSO/CSO/BA.The size of the bottleneck comprises of 9000 hidden units.After it is passed to PSO/CSO/BA, a total of 4500 features are obtained.The classifier comprises of two hidden layers and an output layer where the sizes of units are expressed as 2250, 500 and 2 respectively.To specify the probability of each class, Softmax regression is utilized in the output layer.In between the fully connected neural networks, dropout is utilized to prevent the overfitting.In between the two classes, the maximum probability is chosen as the final decision of the classifier.To compute the cost function of the classifier, cross entropy is utilized and then a weight decay term was added to it subsequently.The cost function is minimized by the SAE and the PSO/CSO/BA selects the most important features as it is given as input to the DNN.The completion of the training process is done in about 50 iterations and the batch size was set as 10.The value of the sparsity parameter p is chosen as 0.08, the weight decay λ is set as 0.01 and the sparse penalty term β was chosen as 4 respectively.To adjust the classifier parameters, fine tuning of the deep neural network classifier was done on the last 20 iterations so that the cost function of the Softmax was minimized.For parameter updation, Adam optimizer is used.The evaluation of the model was done using a 10-fold cross validation technique.As for the RBATQ deep learning model, the hyperparameters implemented are as follows.The state size for all the LSTM units is set as 250 and the dimension of the hidden layer is fixed as 100.The non-linear function utilized is tanh.The dropout rate is set as 0.75, initial learning rate is 0.002.Mini batch size is set as 25 and the constraint of maximum norm regularization is set as 3.The performance metrics analyzed here are sensitivity, specificity, and accuracy and is tabulated in Table I.
On analyzing Table I   The Good Detection Rate (GDR) and Error Rate Analysis for the Deep learning models is plotted in Figs. 3 and 4 respectively.As inferred from Fig. 3 for the proposed SASDL-PSO model produces a high GDR and then it is followed by the proposed SASDL-CSO model and the SASDL-BA model.The proposed RBATQ model and the ordinary SAE-DNN model produce a comparatively low GDR when compared to the other classifiers.As inferred from Fig. 4, a low error rate is obtained for the proposed SASDL-PSO model.A high error rate is obtained for the SAE-DNN model, and the remaining three models too have a slightly higher error rate than the proposed SASDL-PSO model.

A. Comparison With Previous Works for Epilepsy Bonn Dataset and Schizophrenia Dataset
Though thousands of papers are published online every year in epilepsy and schizophrenia classification, a few selected important and recent works which have analyzed many combinations of the epilepsy problem has been considered and the results have  II and III.The best result of 98.55% has been obtained for the A-E problem with the proposed SASDL-PSO model, 98.03% for B-E problem with the SAE-DNN model, 98.11% for the C-E problem with the proposed SASDL-PSO model, 97.52% for D-E problem with the proposed SASDL-PSO model, 98.4% for AB-E problem with the proposed SASDL-PSO model, 98.55% for CD-E problem with the proposed SASDL-PSO model, 97.25% for AC-E problem with the proposed SASDL-PSO model, 97.89% for ACD-E problem with the proposed SASDL-PSO model, 98.48% for ABCD-E problem with the proposed SASDL-PSO model, and 98.07% for BCD-E problem with the proposed SASDL-PSO model.For schizophrenia classification, the best result of 97.95% is obtained with the proposed SASDL-PSO model.The proposed results have more or less reached the similar results when compared to the previous state of the art results, sometimes giving more classification accuracy than the previous results and sometimes giving less classification accuracy than the previous results by a minor margin.The main intention of this work is to analyze a swarm based deep neural networks along with a Reinforcement based Q-learning for epilepsy and schizophrenia datasets and the results are projected.

V. CONCLUSION
To study and analyze the neuronal dynamics within the human brain, the most standard tool utilized by the researchers and clinicians is EEG.For the EEG dependent analysis of various neurological disorders, visual inspection of these huge datasets is very difficult.Therefore, feature extraction techniques and automated classification schemes have been developed in the past.With the advent of deep learning, manual feature extraction is not necessary as it is aided by the deep learning process itself.In this paper, two novel deep learning techniques one with the help of swarm intelligence and another with the help of Reinforcement learning such as SASDL and RBATQ are proposed in this paper and tested for two EEG datasets such as epilepsy dataset and schizophrenia dataset.The highest classification accuracy of 98.55% was obtained with the proposed SASDL-PSO method and 97.75% was obtained with the proposed RBATQ method for epilepsy dataset.The highest classification accuracy of 97.95% was obtained with the proposed SASDL-PSO method and 94.97% was obtained with the proposed RBATQ method for schizophrenia dataset.Future works aim to develop more interesting deep learning models to classify the EEG datasets with a high classification accuracy.Moreover, these developed deep learning models are planned to be implemented for other biosignal datasets such as Electrocardiogram (ECG), Photoplethysmogram (PPG), Electrooculogram (EOG) etc for the diagnosis of various medical disorders.

Algorithm 1 :
PSO Implementation to the DNN.Input: Population Size P opul size , generation gen popul ← Initialize the particles randomly until the total number of particles reach P opul size ; g best,i ← Empty, 0 while i < gen do for particle p in popul do p ← Position updation of p using standard PSO operation.fitness ← Compute the fitness for p using the standard fitness evaluation critic Fitness updation of p by fitness if fitness > fitness of the personal best then Update the personal best of p with the p; end if end for g best ← Best particle updation among the current g best and pop i ← i + 1; end while Return g best Post process it by sending it to the DNN.

Algorithm 4 :
Q-Learning Training Procedure for the Proposed RBATQ Method.Start BiLSTM, Attention mechanism and Tree-LSTM with random parameters η = 0 Pre-training of BiLSTM, Attention mechanism and Tree-LSTM process For every epoch = 1,2 do For every input signal X do Utilize deep learning model for automated feature extraction of X and produce S 1 and S 2 For t = 1,2 do r, s = reward and state after considering the action π(s) a = π(s ) epilepsy classification.Then the proposed deep learning models has been evaluated on the dataset obtained for Institute of Psychiatry and Neurology, Poland.As far as the Bonn dataset is concerned, the epilepsy datasets are categorized into A, B, C, D and E sets.The normal category dataset is present in set A and set B, the interictal category dataset is present in set C and set D and the ictal category dataset is present in set E. The classification problems discussed here are A-
. ., x d ) T and the initial population generation with host nests x c while (t < MaxGeneration) Random generation of a solution by Levy flight Evaluation of the fitness F c by the cuckoo Random choosing of the nest among n, say d if (F c > F d ) New solution replaces j end if Abandon a fraction (p α ) of worse nests Generate new nests and its respective solutions Project only the best or quality solutions Analyze the current best by ranking the solutions end while Post processes it by sending it to the DNN.
Input : Bat population initialization x i and v i (i = 1, 2, . . ., n) Frequency initialization f i , pulse rate initialization r i , loudness initialization A i while (t < Max number of iterations) Generation of new solutions by frequency adjustment Velocity updation is done The location/solutions updation is also done if (rand > r i )

TABLE I
PERFORMANCE ANALYSIS OF THE PROPOSED DEEP LEARNING TECHNIQUES , it is inferred that for the epileptic dataset (A-E), a good classification accuracy of 97.75% is obtained when utilizing RBATQ model, 98.55% accuracy with SASDL-PSO model, 98.50% accuracy with SASDL-CSO model, 98.26% accuracy with SASDL-BA model and 96.52% for SAE with DNN model.For the epileptic dataset (B-E), a good classification accuracy of 97.42% is obtained when utilizing RBATQ model, 97.85% accuracy with SASDL-PSO model, 97.64% accuracy with SASDL-CSO model, 97.55% accuracy with SASDL-BA model and 98.03% for SAE with DNN model.For the epileptic dataset (C-E), a good classification accuracy of 94.57% is obtained when utilizing RBATQ model, 98.11% accuracy with SASDL-PSO model, 96.62% accuracy with SASDL-CSO model, 96.18% accuracy with SASDL-BA model and 95.8% for SAE with DNN model.For the epilep-

TABLE II COMPARISON
WITH PREVIOUS RESULTS FOR THE EPILEPSY BONN DATASET TABLE III COMPARISON WITH PREVIOUS RESULTS FOR THE SCHIZOPHRENIA DATASET been compared with them and reported in Tables