Depth of anesthesia prediction via EEG signals using convolutional neural network and ensemble empirical mode decomposition

: According to a recently conducted survey on surgical complication mortality rate, 47% of such cases are due to anesthetics overdose. This indicates that there is an urgent need to moderate the level of anesthesia. Recently deep learning (DL) methods have played a major role in estimating the depth of Anesthesia (DOA) of patients and has played an essential role in control anesthesia overdose. In this paper, Electroencephalography (EEG) signals have been used for the prediction of DOA. EEG signals are very complex signals which may require months of training and advanced signal processing techniques. It is a point of debate whether DL methods are an improvement over the already existing traditional EEG signal processing approaches. One of the DL algorithms is Convolutional neural network (CNN) which is very popular algorithm for object recognition and is widely growing its applications in processing hierarchy in the human visual system. In this paper, various decomposition methods have been used for extracting the features EEG signal. After acquiring the necessary signals values in image format, several CNN models have been deployed for classification of DOA depending upon their Bispectral Index (BIS) and the signal quality index (SQI). The EEG signals were converted into the frequency domain mapping with high efficiency and reliability. The best trained model gives an accuracy of 83.2%. Hence, this provides further scope and research which can be carried out in the domain of visual mapping of DOA using EEG signals and DL methods.


Introduction
One of the most vital parts of surgical procedures is anesthesia which is very essential to monitor the depth of anesthesia (DOA). Measuring and monitoring of DOA still poses a challenge for doctors and researchers. Accurate analysis and prediction of anesthesia levels in a patient during a surgery facilitates drug administration, preventing awareness and anesthesia overdose thus improving patient outcome. Resorting to traditional anesthesia monitoring requires experienced anesthesiologists through the patient's physiological response.
These methods might provide inaccurate results as they are highly oriented towards the experience of the doctor or the anesthesiologists and do not considering the external factors like noise interference with the actual signal values. As per the study conducted by the World Health Organization [1], it shows that the mortality rate due to anesthesia overdose is significantly high in surgical complications. This indicates that there is a need for an improved monitoring system for surgical procedures to improvement patient's care. There has been considerable amount of research work being carried out for establishing relation between DOA and various features which describe the level of anesthesia in a patient. As proposed by the authors of [2], the spontaneous change in the brain's electrical activity during the transition of the different levels of anesthesia can be recorded using electrodes placed on the scalp (i.e., electroencephalography (EEG)).
In the modern era the classification of DOA, EEG spectrum has gained momentum using various feature extraction methods. Although it cannot be assumed that EEG based DOA classification is the optimum method, however research work in this field seems to be quite promising. Since the EEG spectrum is observed to provide substantial information about the anesthesia level, different analysis methods have been adapted and deployed. These include time-frequency domain and wavelet transform (WT) [3]. A comparison between Short Time Frequency Transform (STFT) and continuous wavelet transform (CWT) study was carried out in [4], which shows that STFT is more efficient in real-time process while CWT produced high resolution and high performance which can be used in clinical settings. Authors of [5], used the nonlinear property of the EEG signals and the nonlinear chaotic parameters to identify the anesthetic depth levels. It is observed from the results that Elman network yields an overall accuracy of 99% in detecting the anesthetic depth levels. Hutt [6] has deployed a linear neural population model which predicts the concentration of anesthetic propofol using the power spectrum of EEG signals. Zhang et al. [7] have adopted spatio-temporal patterns in the electroencephalogram (EEG) using Lempel-Ziv analysis. Various pattern recognition methods for different cognitive task classification were carried out in [8] with an accuracy of 93% using machine learning algorithms. However, previous works utilizing high performance GPU are carried out over a limited data set using specific task-oriented features for classification. This results in accuracy compromise and inefficiency to resolve the internal differences for individual patient's characteristics.
Achieving better performance has a tradeoff between time and convoluted methods for feature extraction. Therefore, for real-time processing and monitoring simple feature extraction methods are required with shorter computational time. This will enhance the patients' experience and results in higher accuracy for a larger dataset. In this work, simple feature extraction method is used for EEG signals and CNN based classifiers for DOA.
Previous research carried out in biomedical engineering focused on epilepsy [9], emotion recognition [10], sleep [11] and motor imagining [12]. To bring more light into the field of anesthesia level analysis, deep learning algorithm is proposed for DOA monitoring. It is vital to monitor the patient's DOA during general anesthesia surgery. If the level of anesthesia is too low during the surgical operation, the patient will have a slight awareness or feel slight pain resulting in some postoperative memory impairment [13]. Moreover, long-term maintenance of deep anesthesia can lead to other complications in patients, hence anesthesia management is very important [14]. With the already research done in [15], it is evident that there is a correlation between the brain wave activity at different frequency components of the EEG signals and different phenomena. From a clinical point of view, the raw EEG has usually been described in terms of frequency bands: gamma (greater than 30 Hz), beta (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30), alpha (8)(9)(10)(11)(12), theta (4)(5)(6)(7)(8), and delta (less than 4 Hz). With the induction of anesthetics there is a significant drop in the activity of the high frequency beta and alpha bands while there is an increased response observed in low frequency band during deep anesthesia level [16]. This in turn creates a feature comparison between EEG and activity intensity during anesthetics induction period, the maintenance period and the recovery period in the time-frequency domain image. With the recent advances in artificial intelligence, computer vision and computer hardware, CNN models are preferred over traditional machine learning algorithms and traditional ANN's. This is the motivation behind proposing EEG the signal spectrum which is similar to clinical approach. Since EEG has been converted to a spectrum, it is quite easy to use the colormap to generate images for CNN algorithm to proceed.
Various studies from [17], shows that CNN based classification models surpass the traditional classification models. With a large data set and adequate hardware setup it becomes easier to implement CNN models and can solidify the research in measuring DOA.
Although CNN is known to be a complex algorithm, they offer better accuracy for larger data sets and are simpler to implement and analyze. This is the motivation for assessing DOA based on the EEG spectrum. Although raw EEG signals are in themselves not sufficient to provide much information about the brain activity of a patient therefore there is a need to extract the characteristics of the EEG signals which will assist in classifying the DOA. Although processing raw EEG signals is quite a challenging task as they have low signal to noise ratio (SNR) and often the brain activity measurement is often buried under multiple resources of hidden information, environmental, physiological and activity-specific artifacts. Various noise reduction methods and filtering methods have been discussed earlier to extract the true brain activity. EEG signals are also non-stationary and have their statistics varying across time. As a result, poor accuracy maybe observed for smaller training data and user data might get different results at different times for the same patient, hence it is quite essential to gather sufficient data to overcome this discrepancy. A lot of work has been done to handle inter subject variability of EEG signals. For generating a time-frequency domain analysis, short-time Fourier transforms (STFT) [18] has been used for visualizing the non-stationary property forth EEG signals in different cognitive states. Authors of [19] have used STFT and auto regressive modeling to effectively detect the burst suppression caused by different anesthetics level. Various other works in [20] were carried out to generate the power spectrogram conversion on the EEG signals. All these reflect the success of STFT to determine the practical fluctuations in the brain activity with the changes in anesthetic features. However, another accurate method for feature extraction is EEMD which represents the signal at different frequency bands. The success of this method has been presented in [21], where they have used EMD for EEG feature extraction. Ji et al. in [22], have implemented both DWT and EMD for EEG feature extraction. Authors of [23] suggested the use of Multivariate Empirical Mode Decomposition (MEMD) provides a more robust approach to noise. As a result, in this work, EEMD method for is used to extract features of the EEG during different stages of anesthesia. EEMD solves the shortcoming of EMD by reducing the inter mode mixing and wide frequency band coverage of EMD signals.

Signal acquisition
This study has been approved by the Research Ethics Committee, National Taiwan University Hospital (NTUH) in Taiwan. Furthermore, written informed consent was received for permission by the patients. In total, data is collected from 50 patients ranging between the ages of 23 to 72 years who underwent ear, nose and throat surgery at NTUH as shown in [24]. The research consists of four major areas which can be divided into Signal Acquisition, Pre-processing, Feature Extraction and Prediction. Figure 1 shows the proposed methodology carried out in this study. The datasets are collected for complete surgery of general anesthesia which includes an average of 2.5 hours of raw EEG signals and anesthesia record sheets for operations in anesthesiology department, NTUH. The datasets are processed at 5 seconds intervals which generate about 15,400 samples that is sufficient for experimentation. Phillips IntelliVue MP60 physiological monitor is used to acquire the signals, it includes the Bispectral Index (BIS) Quatro Sensor module, and a portable computer are for data-logging [25]. Other vital signs like heart rate, blood pressure, SPO2 are also logged using the MP60 monitor. In addition, raw EEG, ECG and PPG signals are also logged. The BIS monitor gives a dimensionless numeric variable which ranges from 0-100 when the patient is under surgical operation for assessing the anesthesia level. The BIS monitor displays the EEG signal and the values of BIS ≤ 40, 40 < BIS ≤ 60 and BIS > 60 correspond to DOA as Anesthesia Deep, Anesthesia OK and Anesthesia Light as shown in Table 1. Additionally, the monitor displays the Signal Quality Index (SQI) which is calculated based on the impedance and artifacts. The SQI ranges from 0-100. Poor signal quality is defined as SQI < 50. The monitor is serially connected to NPort through UART (Serial Communication) and uses TCP/IP protocol for data transmission. The NPORT transmits the received data wirelessly to the repeater. The data received from the repeater is then transmitted to the PC. The connection is verified using ping and handshake; when the connection is released, the transmission stops. Figure 2 shows a block diagram of the signal acquisition process.  On processing of the EEG signal, it is observed that there is non-uniformity in the data for each DOA level hence it is often said that medical data sets are unbalanced or biased. It is important to acquire similar dataset size for each DOA for comparison purposes. Another way to minimize the data unbalance is to use the same number of data for each level of anesthesia. For classifying the EEG signals into their respective anesthesia levels, average BIS values are used for categorization as shown in Table 1. The average BIS value between 40 and 60 is classified as anesthetic OK (AO) thereby this can be considered "suitable for surgery", BIS level less than 40 is considered as anesthetic deep (AD) indicating that the DOA value is low, and BIS ranging between 60 and 100 is classified as anesthetic light (AL) indicating that the DOA value is light and may only be preferable for certain types of operational procedures. It is quite natural that the EEG signal is under the influence of convoluted environmental factors (i.e., electrode off, external frequency interference, etc.); this specific case signal can then be classified as signal polluted (SP) or noise.

Pre-processing
Using the CNN model for training, the input requires being in the form of a 2D array or in image format so one of the first tasks before training was to collect the images from the raw EEG data which will be able to provide results using deep learning methods. It is a vital point of this study to process the raw EEG signals into matrix like format with its associated DOA level (i.e., label data), which is crucial as there are many complications associated with the EEG processing signals as discussed earlier.
The raw EEG signal is processed and to the three categories AL, AO, and AD. The EEG spectrum shows changes in features whenever there is a change in the brain activity using the EEMD method. The multi component EEG signal is converted into the Intrinsic mode functions (IMFs). The different wave frequencies of EEG signal shows variations with Depth of Anesthesia, so there is a need to clearly distinguish the IMFs for different wave frequencies. EMD has its drawbacks in identifying closely spaced spectral bands and components appearing intermittently in the signal when decomposing, which is called mode mixing and it can be reduced by adding white noise to the signal (known as EEMD). The effect of decomposing the signal using EEMD is that the added white noise cancels each other in the final mean of corresponding IMFs. This method proves to be very useful in extracting the constituent components of the signal. EEMD proves to be very robust and reliable for feature extraction of non-stationary signals which in our case is the EEG signal. Since the EEG signal comprises of high to low frequency waves, from Beta (β)-waves ranging from 12-35 Hz, alpha (α)-waves ranging from 8-12 Hz, Theta (θ)-waves ranging from 4-8 Hz, Delta (δ)-waves ranging from 0.5-4 Hz and the frequency beyond 40 Hz can be classified as noise. Each of the different constituent frequency components of the EEG signal shows variation with DOA levels, hence it is used as the classification characteristic. Now, to generate the EEG signals spectrum plots, EEMD has been used which gives the spectral images of all the four constituent frequency components of the brain wave and then these images we divided according to the DOA levels which includes AO, AL, AD.
The raw EEG signals were sampled at an interval of 5 seconds and sampling frequency of 125 Hz. With the appropriate window size, the signals are processed to generate EEG signals with respect to time. As known, EEG signals are non-stationary waves and its characteristics are better observed in the frequency domain, so the signals are decomposed using EEMD method to get the characteristics of the EEG signal at different frequency values. It is often noticed that the signal might be compromised because of noise induced in which occurs mainly because of external factors like loss of signal during collection phase or there might be cross connection of hardware setup. Using EEMD method, the noise signal can be fragmented out and use the necessary frequency bands according to the analysis. Previously implemented methods for spectral analysis show that there are chances of feature distortion of the EEG signal. Using STFT for spectral analysis also has the problem of selection of window type but using EEMD seems to overcome this as it uses the original signal to decompose into the different constituent signals which make up the original signal. Accordingly, the EEG data is filtered and decomposed and segmented as mentioned above. For all 50 patients, the data is processed and a new DOA index reflecting the three consciousness levels is obtained using the CNN method. For the signal pre-processing, a window of 5 seconds is considered because BIS value can provide from output monitor every 5 second. In addition, the sampling frequency of the raw EEG signal is 125 Hz from Phillips IntelliVue MP60 physiological monitor so we can obtain 625 sample points for every 5-second corresponding to BIS value. Therefore, each 5 seconds of 625 sample points of raw EEG signals are used for further preprocessing of EEMD and power spectrum analysis. Since the raw EEG signals are not adequate to train a prediction model, further feature extraction process has to be carried out.
For getting differentiable features we have using the EEMD method to get the characteristics of the non-stationary signals at different frequency values. After the generation of the respective Intrinsic Mode Functions using EEMD, the IMFs are analyzed IMFs in the time-frequency domain using power spectrogram. The spectrograms were converted into jpeg format to be used for the CNN model training. Figure 3 shows a flowchart of the steps taken starting from pre-processing to the feature extraction and the prediction process. Figure 3. Flowchart for pre-processing, feature extraction and prediction of the EEG signal into different levels of anesthesia using CNN models.

Feature extraction
The EEMD method used in this study derives the simple intrinsic characteristics dynamically without any prior knowledge of the system. The first step of EEMD comprises the addition of an independent uniformly distributed and zero mean white noise with matching intensity of the noise in the signal and apply EMD to generate a set of IMFs. The above step is iterated for N times to generate the ensemble of the IMF sets and then the ensemble is averaged to receive a set of IMFs.  The basic working of the EEMD can be concluded as follows: (1) The original signal is assigned to x(t).
(2) The local maxima and minima of x(t) is calculated.
(3) The upper and lower envelope is generated using cubic spline interpolation between the local maxima and minima: fmax(t) and fmin(t).
(4) Mean value of the envelope is subtracted from x(t).   The original is then subtracted from r (t) and Steps (1)-(7) are repeated until x (t) cannot be decomposed. The original signal is then expressed as: where n is the total number of IMFs and res (t) is the residual component. After the application of the EEMD method over the raw EEG signals for the 50 patients' data to obtain the desired Intrinsic Mode Functions (IMF) plots for the different brain waves and the noise signal shown in Figure 4. Figure 5 shows a representation of Fourier transform of different IMF at different frequencies. For the analysis, the first four IMF values are used. The IMF value ranging from 0-52 Hz is considered as the polluted signal or noise, the IMF value in the interval of 10-33 Hz is considered is the Beta (β)-wave while the signal varying from 7-11 Hz is the alpha (α)-wave and the signal with the interval of 3-7 Hz is the Theta(θ) wave. The remaining IMFs are neglected as their contribution to the original signal is minimal and therefore only certain frequency signals contribute to the generation of the original signal.
The frequency-time domain analysis using the power spectrogram gives a relationship between the time-frequency domain and the respective IMF plots. As a result, the raw EEG signals were converted into two -dimensional matrices which gives the sample points on the vertical axis and the horizontal axis gives the frequency corresponding to the sample points. The raw signal was processed for an interval of 5 seconds and each plot generated by EEMD was classified as AO, AL, AD according to the average BIS value as mentioned above. Figure 6 shows the EEMD spectrum according to the DOA. These plots result in visual mapping of the brain activity without the actual use of any external setup and also reduce the cost and time of computation. It is worth noticing that by taking the above steps, the raw input values are digitized in accordance with the BIS values suggested by experienced anesthesiologists.

Convolutional neural network
As we are dealing with images belonging to anesthesia spectrum classification, it is reasonable to use CNN model for predicting the DOA levels in the EEG power spectrogram. However, it is very essential that the best fit model is selected for the classification because most of the conventional models were trained over a data set consisting of general objects. Previous work shows that the spectrum analysis of EEG signals using various models like CifarNet, AlexNet and VGGNet model, which effectively proves the advances in the field of computer vision. In this research, AlexNet, VGG16Net, VGG19Net, InceptionRESV2 5 layered, 6 layered and finally 10 layered convolution layers are deployed as shown in Figure 7. Research conducted in [26][27][28][29] shows that better accuracy is attained using complex neural networks. However, training complex CNN has high computational time and GPU capacity. After using various models, it is shown that models with pre-trained weights of the ImageNet dataset give less accurate prediction results. But considering the GPU capacity and the sample data, a simple convolutional neural network can be used. CNN with 5 layers is used for simple analysis with much smaller 3 × 3 filters in each convolutional layer and combined them as a sequence of convolutions. Further improvement was made using the AlexNet model, 6 layered deep CNN and a 10 layers deep CNN model which uses multiple smaller kernel sized filters stacked up one after the other. VGG16 and VGG19 is 16 layers deep and 19 layers deep respectively with 3 × 3 sized filters used at different stages of convolutional layer instead of a larger sized filter used at a single point of the model. The back end for the model training was the TensorFlow framework and various python libraries were used for our analysis. For training the five layers CNN, an input RGB image of 128 × 128 × 3 format and the output is three classes. The other models were trained with 128 × 128 size as an input image except for the VGG16 and VGG19Net model. By changing the dimension of the input size and using multiple non-linear layers to increase the depth of the network increases the capacity of the model to differentiate between the complex features of the input EEG spectrum images with lower cost. With a given smaller receptive field of the effective area size of input image where output depends, the multiple nonlinear layers can increase the depth of the network which enables it to learn more complex features with a lower cost. VGG16 training the input and output size of the image was set to 224 × 224 × 3 while dropouts were added in between very dense layers with minor changes in the activation function (i.e., Relu, tanh). Simple processing and minimal optimization, mathematical operations for multi-dimensional array are used for achieving the best fit. The learning rate for the models varied from 0.01 to 0.0001 and the various batch sizes of 32, 64, and 128 were used to attain the best fit for the model. Each of the models was trained over different epochs varying from 50 to 200 to get smoother accuracy curves. Our study focuses on comparison being made on the different CNN model and simple feature extraction method for DOA classification. All the models were trained over the same dataset and a comparative analysis was carried out.

Prediction performance
Initially a small sample size is used for our analysis and the models were compared. For a data set belonging to 25 patients, Table 2 shows an accuracy of 74% for the 5 layers deep CNN, an improved accuracy of 81.2% using 6 layers CNN, while the best accuracy of about 87.8% using 10 layers deep CNN model. The AlexNet model gave an accuracy of 75.6% while VGG16 and VGG19 gave 76.7 and 74.3% respectively. While considering the distribution of the image set, the data is divided it into three sections namely training set, validation set and test set. 70% of the data was used as the training set while 20% of the images are used for validation and 10% of the images for testing. The GPU used in this work is Nvidia T4s which reduces the computational time to one-tenth. The time taken for complete training of the VGG16Net takes around 2.3 hours while the 5 layered CNN takes 1.2 hours. The images are complex and high-pixel input so having a deep convolutional layer provides an advantage to detect the features of the power spectrogram plots. As discussed earlier, the features of the EEG signal are changing with time so we can get better accuracy with larger sample size. At the beginning of training, the training steps and batch size are optimized according to the small sample size but the parameters have to change constantly to get better accuracy. In this way the CNN models provide the flexibility to change the layer size, the sub-layer size, the kernel size, learning rate and batch size to attain maximum accuracy of the training model.
Another parameter that we need to monitor is the maximum epoch which depends on the size of the data set and is determined by the model's ability to reach the steady state. The model is configured with the correct settings and deployed for training the entire dataset of 50 patients. During the training, a ModelCheckpoint is used from the callback function which is a major task to get the best results. The best model is saved at every epoch so that the model weights can later be used for the testing stage. After including all the images for training, Table 3 shows the accuracy for the 5 layered CNN is 72.5% while an accuracy of 74.6% is achieved with AlexNet whereas VGG16Net and VGG19Net gives an accuracy of 80.1 and 77.4% respectively. The best accuracy of 83.2% is observed for a model with 10 layers. All the models are trained following early stopping criteria and batch-size of 128 was selected for AlexNet and VGG16Net while 64 batch-size is used for 5 layers CNN model. The datasets used in this study maintain balanced distribution in the training and testing procedures to avoid over and under fitting of the classes. As a result, it can be concluded that the proposed method provides a robust and reliable benchmark for DOA level classification. The loss of the models can be observed using the confusion matrix. The confusion matrix is different for each model which is because of the variability of the EEG spectrum features and the CNN model helps to overcome the individualism and offers more precision and consistency. The highest error is observed for AL level while the AD level gave the best accuracy with least error. This means further fine tuning or better models can be used to minimize the error and help to analyze the features of the EEG signal efficiently. Although classifying the DOA level according to the BIS values of 40-60 and below 40 is not a standard way of classifying DOA levels. There is no boundary for classification, hence there are some anomalous behaviors for AL and AD classes.  It is observed that better accuracy is obtained using simple CNN models while models with more layers produce less accuracy in comparison to the 5 layers deep CNN. This trend can be explained as a result of the use of pre-trained weight of the VGG16Net which uses the ImageNet weights for classification might not be able to capture the features of the EEG spectrum images thereby giving less accuracy. Generally, it can be said that the classifications model predictions are in accordance with the expected, gives reasonable error rate and the DOA level prediction using CNN and EEMD feature extraction method was successful.

Discussion
The above results validate our approach and open up potential fields of research that were carried out in the time frequency domain. The conversion of the raw EEG signals into spectral plots using the EEMD method for DOA evaluation proves to have certain advantages over the already existing methods. In this study, simple preprocessing methods are used without the use of trivial conventional hardware setup. With the proposed framework, convoluted mathematical calculations and segmentation of the DOA level are physically avoided. This saves a lot of time, cost and even the use of experts to some extent. The work still needs to be done to reach a stage where machines or computer vision might be able to replace the anesthesiologists for classification tasks. With the help of the different CNN models, the reliability of the proposed method for DOA level classification can be verified. The application field is not unique and limited to a particular area, but its uses are widespread. Furthermore, different layers of CNNs are used to explore the effectiveness of the proposed method. Few of the advantages of the proposed work are: 1) All the models and processing of the EEG signals were done without the use of external hardware and manual support. 2) The use of EEMD method provides more robustness, reliability and helps us to overcome the challenging problem of the inter mode mixing of the EEG signals. Although the use of our proposed work seems to be quite promising, there are certainly few shortcomings in our findings. Firstly, because of the availability of a biased medical dataset our model might not perform as expected. Secondly, because of the use of different types of anesthetic drugs we cannot differentiate the DOA depending on the type of drugs used and the model can show poor performance. The CNN models tend to show poor performance during the transition of state in particular the model falls short to predict the AL class correctly as shown in Figure 8. The reason being that the transition states are often quite unstable and rapid change of brain activity might be observed which might go unnoticed by our CNN based models.
In this work, the conventional deep learning models are trained to analyze the object as an image, however, in this study, the used features do not consider the time factor. That means the time-frequency plots have no meaningful temporal dynamics that are consistent across the 5-second segments. Hence, the next research step is to develop specific deep learning models which consider temporal dynamics. Currently, there is a CNN model called recurrent neural networks (RNN) or long short-term memory (LSTM) models can deal with problems in terms of time and continuity. Combining CNN and RNN (or LSTM) multi-layer models, image features and sequence features can be extracted for training. Recently, Convolutional/long short-term memory/fully connected deep neural networks (CLDNN) is a deep learning algorithm proposed by Google Inc [30], which uses multi-layer convolution to extract image features and then uses the LSTM layer for vocabulary tasks analysis improved by 4~6% compared with only LSTM. With the help of recent advances in computer vision, it is possible to decide on the model parameters, neural network layer, optimization technique, activation functions, batch normalization and fine-tuning to improve the accuracy. From the training results in Table 4, it can be seen that deep CNN can accurately identify the DOA features in the EEG signals of the entire anesthesia patient, and the wrong classification results are within acceptable error. The results acquired support the use of CNN framework for DOA level estimation as many of the models showed accuracy over 80% and the accuracy plot of the best trained model for a dataset of 25 and 50 patients as shown in Figure 9. With these kinds of frameworks, the amount of anesthesia that the patient has to be infused can be decided depending on the type of surgical procedure without any delay. The scope of CNN models is very wide in this field as they are very efficient and have universal application. One of the most interesting attributes of CNN is its ability to learn by itself, even if the data is non-uniform a deep layer CNN will be able to extract the optimal weights for feature training and these weights can later be used to train on similar input images. Even though the advantages of CNN are quite significant, improvement is still required to improve the DOA classification stability and accuracy. In future work, adding more features can help in categorizing the anesthesia levels such as ECG signals and the PPG of a patient and establishing a relation between these features and DOA. With the help of highspeed GPU even more deep layered CNN models can be used such as VGG19, InceptionNet, GoogleNet, ResNet. Use of additional data will help us to introduce more reliability into the models and help them understand the varying nature of the EEG signals. Another approach is the use of CNN which might be able to provide desired accuracy for a limited patient dataset. Despite many CNN based methods have been proposed for EEG classification [31,32], the issue of the big data size is still standing. However, it is difficult to generate big dataset from hospitals or companies except consumer internet company (e.g., Google, Facebook, Amazon, etc.). In addition, the big dataset required to train CNN model will require more CPU and GPU power and memory. This is the reason behind the success of deep learning CNN utilization in big technology companies owning supercomputer. In considering the small dataset, there are several methods to solve this problem, such as transfer learning [33], generative adversarial networks (GANs) [34], and semi -supervised learning [35]. However, all these methods have black-box problems which mean we do not know why the results are like this or that. Recently, they are many discussions on explainable AI. From our point of view, Lalitha's method [5] is a good approach for an explainable AI. In their paper, they used the nonlinear property of the EEG signals (i.e., nonlinear chaotic parameters, such as correlation dimension (CD), Lyapunov exponent (LE) and Hurst exponent (HE)) to identify the anesthetic depth levels. These three nonlinear parameters are like the feature extractions in CNN located in the convolutional layers of CNN model. The problem is we do not know what is going on for these convolutional layers to do these feature extractions which is like a black-box. However, feature extractions using Lalitha's method have physical meaning. In our previous study [23], similar method is used but feature extraction was based on entropy due to chaotic nature of the EEG signal. Such methods are all based on the use of ANN to model DOA which usually report good accuracy. The problem associated with this method is the need to understand the domain knowledge of the system. Then, based on physical or engineering principle, you need to find a good parameter to represent them. For CNN, you do not need any background for the system, just use the toolbox from CNN model and you can model them if you collect a big data plus big supercomputer. Which approach is better? It is still debatable. However, if we think how human being learn everything. It seems more toward to Lalitha's method. In this study, fairly simple feature extraction methods are used (i.e., EEMD plus power spectrum analysis) with CNN classifier. Reducing the CNN layers will have an effect on reducing the training time and memory size for weights storage. In addition, the smaller feature will make it less opaque towards achieving explainable AI.

Conclusions
A significant research has been done in the estimation of DOA and several breakthroughs have been noticed in this field. It has been observed that a large number of assessments of the DOA were based on conventional manual processing of the EEG and a few works showed the visual mapping of the attributes of the EEG using the time-frequency domain. A lot of research has been seen on the use of raw EEG signals as time series input for RNN model training. The recently gained momentum in computer vision facilitates the use of CNN models with the capability to assess the DOA level. Use of the EEMD method for feature extraction opens up a new scope for research and this approach is rarely experienced. Using EEMD provides sturdiness and makes feature extraction easier and without the need of manually selecting the EEG signal characteristics and classifying them. With the use of different classes such as the AL, AO and AD we make the task of the anesthesiologist to monitor and evaluate the state of a patient's brain during different anesthetics drug infusion into a patient and take rudimentary course of action to prevent the disparity caused by the different drug usage and the overdose of anesthesia. This further enhances the patient's condition as there may be chances of psychological trauma if the level of anesthesia is not monitored properly during the transition state. This research work shows that there is a significant correlation between EEG and DOA. The use of EEMD method introduces a novel approach to extract and analyze the EEG features with nominal feature engineering provides an opportunity to establish safer surgical procedure with the use of simpler DOA predictive devices.