Electroencephalography-Based Depression Detection Using Multiple Machine Learning Techniques

The growth of biomedical engineering has made depression diagnosis via electroencephalography (EEG) a trendy issue. The two significant challenges to this application are EEG signals’ complexity and non-stationarity. Additionally, the effects caused by individual variances may hamper the generalization of detection systems. Given the association between EEG signals and particular demographics, such as gender and age, and the influences of these demographic characteristics on the incidence of depression, it would be preferable to include demographic factors during EEG modeling and depression detection. The main objective of this work is to develop an algorithm that can recognize depression patterns by studying EEG data. Following a multiband analysis of such signals, machine learning and deep learning techniques were used to detect depression patients automatically. EEG signal data are collected from the multi-modal open dataset MODMA and employed in studying mental diseases. The EEG dataset contains information from a traditional 128-electrode elastic cap and a cutting-edge wearable 3-electrode EEG collector for widespread applications. In this project, resting EEG readings of 128 channels are considered. According to CNN, training with 25 epoch iterations had a 97% accuracy rate. The patient’s status has to be divided into two basic categories: major depressive disorder (MDD) and healthy control. Additional MDD include the following six classes: obsessive-compulsive disorders, addiction disorders, conditions brought on by trauma and stress, mood disorders, schizophrenia, and the anxiety disorders discussed in this paper are a few examples of mental illnesses. According to the study, a natural combination of EEG signals and demographic data is promising for the diagnosis of depression.


Introduction
A frequent mood illness called depression can result in a constant feeling of melancholy, a loss of interest, and memory and attention problems. Cognitive impairment and longlasting, profound affective depression are common in depressed patients. In addition, paranoia and illusions may occur in certain people in severe instances [1]. As a result, it is crucial to diagnose depression when it is still treatable and can even save a patient's life [1,2]. The mechanisms behind protracted unpleasant moods and depression are currently the subject of intense research into the human brain.
A scale-based interview conducted by a psychiatrist is the most common technique for diagnosing depression. EEG coherence is a strong indicator of integrated neuronal correlation when analyzing the linearly dependent relationships between the bandwidths of EEG signals collected from brain areas or working electrodes [3]. This measure generates a symmetrical two-dimensional matrix. The presence of high coherence between two EEG signals indicates coherent neuronal oscillations, inferring interconnection between neural populations. Low convergence, on the other hand, demonstrates independent activity. EEG coherence has proven to be a highly effective method for analyzing the brain activity of individuals with depression [3], Alzheimer's disease (AD) [4] and Parkinson's disease [5] using EEG coherence. The outcomes of the existing approaches for diagnosing depression, however, depend on the psychiatrist's expertise and involve a lot of work.
Furthermore, because of the stigma associated with the condition and its nature, depressed people are less inclined to seek care. As a result, many individuals with depression receive incorrect diagnoses and inadequate care, delaying their recovery. Therefore, a growing area of research is discovering practical and reliable ways to identify depression. With the latest innovations in sensor and mobile technologies, analyzing physiological data for diagnosing mental diseases opens up a brand-new opportunity for a precise and objective tool for anxiety identification. Along with much other clinical information, the EEG exhibits deep personal human cognitive function [6,7]. The EEG recorded the monotonous, spontaneous electrical impulses of cells in the brain on the scalp. Since the discovery of the monkey brain and the first recording of the human EEG signal, scientists have investigated the association between brain function and mental disorders utilizing EEG data [8].
The EEG-based depression detection system is depicted in Figure 1, where raw EEG signals processed via CNN follow EEG recording. The signal is then sent to LSTM, also known as sequence learning, and finally to automated recognition.
Diagnostics 2023, 13, x FOR PEER REVIEW 2 of 52 correlation when analyzing the linearly dependent relationships between the bandwidths of EEG signals collected from brain areas or working electrodes [3]. This measure generates a symmetrical two-dimensional matrix. The presence of high coherence between two EEG signals indicates coherent neuronal oscillations, inferring interconnection between neural populations. Low convergence, on the other hand, demonstrates independent activity. EEG coherence has proven to be a highly effective method for analyzing the brain activity of individuals with depression [3], Alzheimer's disease (AD) [4] and Parkinson's disease [5] using EEG coherence. The outcomes of the existing approaches for diagnosing depression, however, depend on the psychiatrist's expertise and involve a lot of work. Furthermore, because of the stigma associated with the condition and its nature, depressed people are less inclined to seek care. As a result, many individuals with depression receive incorrect diagnoses and inadequate care, delaying their recovery. Therefore, a growing area of research is discovering practical and reliable ways to identify depression. With the latest innovations in sensor and mobile technologies, analyzing physiological data for diagnosing mental diseases opens up a brand-new opportunity for a precise and objective tool for anxiety identification. Along with much other clinical information, the EEG exhibits deep personal human cognitive function [6,7]. The EEG recorded the monotonous, spontaneous electrical impulses of cells in the brain on the scalp. Since the discovery of the monkey brain and the first recording of the human EEG signal, scientists have investigated the association between brain function and mental disorders utilizing EEG data [8].
The EEG-based depression detection system is depicted in Figure 1, where raw EEG signals processed via CNN follow EEG recording. The signal is then sent to LSTM, also known as sequence learning, and finally to automated recognition. The current widely used method for examining functional brain interconnection employs network analysis assessment to transform a functional brain matrix into a gradient. After accomplishing a categorization of the structural features of the graph, the clustering coefficient and attribute path length, two index values that interpret a chart and correlate to the two significant aspects of functional brain entity, such as functional splitting and The current widely used method for examining functional brain interconnection employs network analysis assessment to transform a functional brain matrix into a gradient. After accomplishing a categorization of the structural features of the graph, the clustering coefficient and attribute path length, two index values that interpret a chart and correlate to the two significant aspects of functional brain entity, such as functional splitting and connectivity [9], are utilized to distinguish between people suffering from neurological abnormalities and normal individuals. These two indices can also capture the network's significant attributes accurately. Random network topology and small-world network architecture distribution have been demonstrated repeatedly in Alzheimer's dis-most common brain state during waking hours when an individual is actively listening and thinking. Gamma waves, which have the highest frequency, are associated with cognitive processing, attention, and perception [24].
A typical approach for classifying normal and depressive EEG signals is shown in Figure 2. EEG studies of depression often use the properties of EEG signals to analyze the data. The local feature extraction module, also known as the signal processing module, comes before the LSTM and the classification module, which all proceed to the CNN. The successful development of a multifaceted, three-electrode EEG monitoring system by Intelligent Solutions Lab [25] is part of the current ability to contribute to this field. It was used to build a database of depressed patients and healthy controls.
primary rhythm, which is present for most of their lives. After the age of thirteen, it tends to dominate resting EEG traces. The beta wave is a normal rhythm commonly observed in individuals who are alert or anxious, as well as those with their eyes open. This state is associated with analytical problem-solving, decision-making, and processing information about the surrounding environment. It is the most common brain state during waking hours when an individual is actively listening and thinking. Gamma waves, which have the highest frequency, are associated with cognitive processing, attention, and perception [24].
A typical approach for classifying normal and depressive EEG signals is shown in Figure 2. EEG studies of depression often use the properties of EEG signals to analyze the data. The local feature extraction module, also known as the signal processing module, comes before the LSTM and the classification module, which all proceed to the CNN. The successful development of a multifaceted, three-electrode EEG monitoring system by Intelligent Solutions Lab [25] is part of the current ability to contribute to this field. It was used to build a database of depressed patients and healthy controls. MODMA, a multi-modal open dataset for analyzing mental illnesses, collects EEG signal data. The EEG dataset contains data from an advanced wearable 3-electrode EEG collector for widespread applications and a standard 128-electrode elastic cap. Three locations are used to store EEG data. As a result, the research has concentrated on analyzing a pervasive EEG-based depression detection system using cutting-edge data processing methods and machine learning.
The auditory tones used as the external stimuli. These tones were presented through headphones to both the healthy control group and the group of patients diagnosed with major depressive disorder. To ensure that the experiment was conducted in a consistent manner, the auditory stimuli were presented in a passive listening paradigm. During this paradigm, the participants were instructed to listen attentively to the sounds presented without any active response or task. This approach allowed us to isolate the brain responses to the auditory stimuli without any confounding factors that could arise from a specific task or cognitive demand. The auditory tones used in our study had a duration of 100 ms, which is a standard duration used in ERP studies. The inter-stimulus interval between the tones was 1000 ms. This interval was used to ensure that the auditory stimuli MODMA, a multi-modal open dataset for analyzing mental illnesses, collects EEG signal data. The EEG dataset contains data from an advanced wearable 3-electrode EEG collector for widespread applications and a standard 128-electrode elastic cap. Three locations are used to store EEG data. As a result, the research has concentrated on analyzing a pervasive EEG-based depression detection system using cutting-edge data processing methods and machine learning.
The auditory tones used as the external stimuli. These tones were presented through headphones to both the healthy control group and the group of patients diagnosed with major depressive disorder. To ensure that the experiment was conducted in a consistent manner, the auditory stimuli were presented in a passive listening paradigm. During this paradigm, the participants were instructed to listen attentively to the sounds presented without any active response or task. This approach allowed us to isolate the brain responses to the auditory stimuli without any confounding factors that could arise from a specific task or cognitive demand. The auditory tones used in our study had a duration of 100 ms, which is a standard duration used in ERP studies. The inter-stimulus interval between the tones was 1000 ms. This interval was used to ensure that the auditory stimuli were presented in a controlled and consistent manner, allowing us to measure the brain's response to each tone in isolation. The use of a consistent and controlled presentation of stimuli is critical in ERP studies, as it allows for the reliable measurement of the brain's response to external stimuli.
For the ERP analysis, we selected several local peaks including N1, P2, N2, and P3, as these components have been previously shown to be sensitive to depression-related alterations in brain function. The amplitude and latency measures of these peaks were used to compare the differences between the two groups. N1, P2, N2, and P3 are common event-related potential (ERP) components that are often used in neurophysiological research. N1 (Negative 1) is a negative deflection that occurs approximately 80-150 ms after the onset of a stimulus. It is thought to reflect early sensory processing and attention allocation to the stimulus. P2 (Positive 2) is a positive deflection that occurs approximately 150-250 ms after stimulus onset. It is thought to reflect higher-order cognitive processing and attention, such as identifying and categorizing the stimulus. N2 (Negative 2) is a negative deflection that occurs approximately 200-300 ms after stimulus onset. It is thought to reflect cognitive processes related to stimulus evaluation, including working memory, attention, and decision-making. P3 (Positive 3), also known as the P300 or the "oddball" response, is a positive deflection that typically occurs 300-500 ms after stimulus onset. It is thought to reflect cognitive processes related to stimulus evaluation, including memory updating and response preparation. Both the mean and maximum amplitudes of specific time windows for each local peak, as well as the latency of each peak from stimulus onset, were determined to identify differences in brain function between healthy controls and patients with major depressive disorder. The following EEG signals in the resting state with 128 channels are taken into consideration in this project to support this research: • Event-related potentials in response to external stimulation were recorded over 128 channels; 24 patients had major depressive disorder, while 29 persons in the healthy control group did not.

•
In resting-state 128-channel recordings, 24 persons with major depressive disorder and 29 without the condition were found. • Twenty-nine healthy control subjects and 28 people with major depressive disorder were found in 3-channel resting-state recordings, as detailed in the section below.
The purpose of the work is to use EEG data analysis to create an algorithm that can automatically identify depression tendencies. The complexity and non-stationarity of EEG signals, as well as the unique variations that may have an impact on the generalizability of detection systems, are discussed in the study. The suggested methodology integrates demographic information such as age and gender with EEG signals to increase the precision of depression diagnosis. The study used machine learning and deep learning methods to automatically identify people with depression. The usage of wearable EEG technology and open datasets in practical applications is also covered in the article. The study's ultimate objective is to aid in the early and accurate identification of depression, which might improve patient outcomes. Section 2 is devoted to discussing associated works in the organization of this review. Section 3 discusses data collection, while Section 4 discusses methodology. Section 5 contains the provided results. Section 6 includes a discussion. The conclusion is addressed in Section 7 at the end.

Literature Review
This section reviews the studies that have looked at EEG signals and deep learning techniques for diagnosing and predicting depression patients. Ref. [7] reviewed research that used EEG data to identify the two types of depression, bipolar disorder (BD) and MDD, using neural network and deep learning techniques. It searched among papers published over the previous ten years using a variety of search engines and a mix of different keywords, then took some valuable information from those. The fact that this review classified exploited datasets, techniques for analyzing or extracting features, and algorithms in the publications was one of its strong qualities [11,12]. It also creates many tables to exhibit the extracted data and allow comparisons between them in different ways. Only about five articles, as indicated, especially for MMD diagnosis, were considered an apparent fault in this research because it needed to employ a significant number of publications to review. Additionally, the journals must explain their general concept and method more. The review by [26] focused on studies using deep learning techniques to investigate mental diseases, including depression. The four primary areas of this study were the detection Diagnostics 2023, 13, 1779 6 of 39 of mental illness using clinical data, genetic data in disease diagnosis, analysis of various datasets, and social media data to estimate the risk of mental illness. Only three studies that dealt with depression diagnosis or prediction employed the electroencephalogram dataset type out of the selected papers that were published up through April 2021. The examined datasets were wholly represented in this study. An in-depth discussion was also given about the opportunities and difficulties that could result from using each dataset. However, because the review was comprehensive and focused on a wide range of mental illnesses, several studies on deep learning for depression diagnosis and prediction using EEG data were briefly discussed. CNNs have recently been used to investigate the possibility of EEG encoding and decoding. Ref. [11] proposed a parallel linear CNN to capture dynamic and static energy identifiers. In [10], CNN was utilized for features extracted from epileptic intraoperative EEG signals. In [27], EEG signals were transformed into multi-spectral images and decrypted using recurrent CNN. Ref. [10] used a 13-layer CNN model to detect depression. In light of the entitlements of CNN, this work describes 1D CNN to retrieve spatiotemporal representations of EEG signals. Convolutional neural networks (CNNs) have recently been recognized as an essential and reliable deep learning methodology. Recently, the method has expanded its employment in biomedical signal and image processing problems due to its notable success in computer vision [28]. Researchers have also concentrated on creating a CNN-based computer-aided diagnosis system for the medical industry. Ref. [29] developed a CNN model for extracting EEG data and characterizing the signal as predicted, preictal, or convulsion, with an overall accuracy of 89.8%. Ref. [30] provided a 1D CNN to structure standard and pathological EEG data instantly and discovered a 21.10% classification error. Refs. [10,31] recently developed CNN models with eleven and thirteen layers to recognize depressed patients using EEG signals. Ref. [32] created a one-dimensional CNN-based model with 91.33% accuracy to detect cardiac arrhythmia from long-term ECG signal segments. In a comparison study of Alzheimer's disease diagnoses performed via three distinct NN models, the FFNN, the block-based neural network, and CNN, CNN was the best classifier [33].
Ref. [34] proposed using a kernel eigen-filter-bank typical spatial pattern to extract characteristics from the EEG of twelve patients suffering from severe depression and twelve normal individuals. The study used the leave-one-subject-out cross-validation assessment method to achieve an SVM classifier recognition rate of 80%. Ref. [35] estimated the power spectral density in multiple bandwidths (theta, beta, and alpha) as well as the entire band of the EEG signal to categorize forty depressive patients and forty healthy subjects.
A review of studies [4] examined how EEG signals and various classifiers could be used to monitor issues like emotion identification and identify neurological diseases like depression. Only four publications were from an earlier period, and most of the papers were published between 1999 and 2020, using various sources, including journals, books, conferences, and theses. Only about ten articles related to the diagnosis of depression were considered. This provided a comprehensive comparative assessment of the techniques and data used in publications separated into separate regions, such as artifact removal, types of extracted features, dimensionality reduction, feature selection, and clustering algorithm algorithms [36]. Based on their method concerning the collection gathered, numerous adopted datasets were summed up and are presented as general and local recognition categorizations. Additionally, this included details on functional neuroimaging methods. However, because it covered so many fields of research on mental health issues, the method needed to treat each one more thoroughly. Table 1 shows a list of past paper references with methodology used and results. Automatic clinical depression detection [6] EEG, CNN, Transfer Learning Visual abstract theta, alpha, and beta band EEG power is calculated.
CAD; ConvNet The proposed system delivered an 85.82% accuracy rate.
CNN's use for recognizing mild depression [7] EEG data, SVM, LR, and LNR are associated with MDD.
Ratio of features taken out of EEG signals in different frequency bands.

Elimination of recursive features, Pearson correlation coefficient
The development of this MDD detection framework may be integrated into a healthcare system to assist medical professionals in identifying MDD patients.
Framework for detecting depressive disorders with two stages of feature selection [8] Symptoms of child anxiety related to the Children's Depression Inventory Sample of 451 young adults and adolescents.

Multivariable linear regression
There was an increase in depression and somatic/panic symptoms in females, in addition to social anxiety and social phobia.
Symptoms of anxiety [37] Decision Tree, Variance, SVM, and Feature Selection 13 features in total were retrieved, and a subset of the 6500 total features was calculated.
RF Model, FDR-based feature selection, and tree-based feature selection Calculations of the linear, non-linear, and power spectral features were made for each channel of the EEG data for each sub-band. According to the literature evaluation, the present research gap is that just a few studies have used EEG data and deep learning algorithms to diagnose and predict depression. While some studies have used EEG signals to diagnose or predict depression, the majority of studies have focused on other mental health issues. Furthermore, few studies have extensively discussed the difficulties associated with using EEG data and deep learning techniques for depression diagnosis and prediction. More research into the most appropriate methods for feature extraction from EEG signals is also needed to improve the accuracy of depression diagnosis and prediction. Finally, further study is needed to examine the efficacy of various deep learning approaches for diagnosing and predicting depression using EEG data. The suggested technique overcomes constraints in using EEG data to diagnose depression. To capture the complexity and non-stationarity of EEG data, multiband analysis is performed. To account for individual differences, demographic parameters such as gender and age are incorporated in the modeling. Machine learning and deep learning approaches are utilized to improve accuracy and efficiency in automated depression diagnosis. The approach employs a vast and diversified dataset (MODMA) spanning a variety of mental diseases, as well as traditional and wearable EEG collectors, which improves the system's generalizability and resilience.

Dataset
The multi-modal open dataset MODMA, which is used for the investigation of mental disorders, is where EEG signal data are obtained. Data from both a conventional 128-electrode elastic cap and a cutting-edge wearable 3-electrode EEG collector for widespread applications are included in the EEG dataset.
A 128-electrode elastic cap is a common EEG recording equipment item used in research and clinical settings. It consists of a cap that is fitted over the participant's scalp, with 128 electrodes placed at specific locations according to the International 10-10 system. These electrodes detect electrical signals generated by the brain and transmit them to an amplifier, which amplifies the signals and converts them into digital data for further analysis. A wearable 3-electrode EEG collector is a newer type of EEG recording equipment that is designed for widespread applications. It typically consists of a small device that is worn on the forehead or behind the ear, with three electrodes that are placed in contact with the skin. These electrodes detect electrical signals generated by the brain and transmit them to a wireless receiver or a smartphone app, which records and analyzes the signal. It is noted that the eyes were closed during EEG recording to reduce any potential visual artifacts caused by eye movement. Lighting levels were also kept constant to minimize the effect of visual stimuli on brain activity. Moreover, additional methods for artifact correction, such as time-domain signal filtering or spatial filtering techniques, were commonly used to further improve the quality of EEG signals. Table 2 summarizes the characteristics of three different experiments that were conducted on participants with major depressive disorder and healthy controls. For EEG, there are three datastores. This project considers 128-channel resting state EEG signal recording data. The inclusion criteria for participants in the MODMA dataset includes individuals between the ages of 18 and 55 years old with normal or corrected-to-normal vision and a primary or higher education level. For participants diagnosed with major depressive disorder (MDD), the diagnostic criteria of Mini-International Neuropsychiatric Interview (MINI) must be met, and the Patient Health Questionnaire-9item (PHQ-9) score must be greater than or equal to 5. MDD patients must not have received any psychotropic drug treatment in the last two weeks. Control participants should not have a personal or family history of mental disorders. All participants must provide written informed consent. The inclusion criteria ensure that the study sample is representative of the population of interest and that the study results are generalizable to the intended population.
The exclusion criteria for participants in the MODMA dataset include individuals with mental disorders or brain organ damage, serious physical illness, or severe suicidal tendencies for MDD patients. Participants with a personal or family history of mental disorders are excluded from the control group. In addition, participants who have abused or been dependent on alcohol or psychotropic drugs in the past year, women who are pregnant or lactating, or taking birth control pills are excluded from the study. These exclusion criteria ensure that participants are healthy and have not been exposed to any substances that could affect their brain function. The criteria also help to minimize any potential confounding factors that could influence the results and increase the internal validity of the study.
The Analysis of Variance (ANOVA), a statistical analysis, was carried out to compare the mean age of two groups, and the outcome indicated that there was no significant distinction between the mean age of the two groups. The results of the ANOVA revealed that there was no significant difference in the mean age between the depression group and healthy control group. The p-value was greater than 0.05, indicating that any differences in EEG signals between the two groups were unlikely to be solely caused by the difference in age between them. Therefore, it can be inferred that the lack of age difference between the two groups suggests that the differences in EEG signals were more likely due to the presence of depression in the depression group rather than an age difference between the two groups.

Data Visualization
Typically, an EEG machine has a number of electrodes. The electrodes are positioned on the patient's scalp, and after extracting voltage, they transform it into signal data. For instance, if there are n electrodes, each electrode will produce a time series of voltage values. Different parts of the brain have different voltage levels. The architecture of a typical 128-channel headset is shown in Figure 3: Figure 4 displays the 128-channel voltage for the resting state power spectral distribution at a sampling frequency of 250.
Frequency information for 128 channels of EEG signals is shown in the figure below. Figure 5 displays the power spectral distribution for 16 channels.
AC is responsible for the increase in power at frequency = 50,120. These spikes are viewed as signal noise that will be eliminated in the next part.

Data Visualization
Typically, an EEG machine has a number of electrodes. The electrodes are positioned on the patient's scalp, and after extracting voltage, they transform it into signal data. For instance, if there are n electrodes, each electrode will produce a time series of voltage values. Different parts of the brain have different voltage levels. The architecture of a typical 128-channel headset is shown in Figure 3:   Frequency information for 128 channels of EEG signals is shown in the figure below. Figure 5 displays the power spectral distribution for 16 channels.

Data Visualization
Typically, an EEG machine has a number of electrodes. The electrodes are positioned on the patient's scalp, and after extracting voltage, they transform it into signal data. For instance, if there are n electrodes, each electrode will produce a time series of voltage values. Different parts of the brain have different voltage levels. The architecture of a typical 128-channel headset is shown in Figure 3:   Frequency information for 128 channels of EEG signals is shown in the figure below. Figure 5 displays the power spectral distribution for 16 channels. AC is responsible for the increase in power at frequency = 50,120. These spikes are viewed as signal noise that will be eliminated in the next part.

Artifact Correction and Re-Referencing
Artifact correction and re-referencing are important steps in EEG data preprocessing to improve the quality of EEG signals and remove unwanted artifacts. In the context of this study, artifact correction was likely performed to remove any electrical noise or artifacts caused by muscle movement or eye blinks.
One common method for artifact correction is Independent Component Analysis

Artifact Correction and Re-Referencing
Artifact correction and re-referencing are important steps in EEG data preprocessing to improve the quality of EEG signals and remove unwanted artifacts. In the context of this study, artifact correction was likely performed to remove any electrical noise or artifacts caused by muscle movement or eye blinks.
One common method for artifact correction is Independent Component Analysis (ICA), which separates EEG signals into independent components that correspond to different sources in the brain or outside the brain, such as muscle activity or eye movement. By identifying these components, the artifacts can be isolated and removed from the EEG signals.
Re-referencing is a process of changing the reference electrode to improve the signalto-noise ratio and enhance the detectability of EEG signals. In the context of this study, the EEG signals were likely referenced to a common reference electrode or a referencefree method was applied. This is done to eliminate or minimize the impact of spatially distributed electrical activity that is unrelated to the underlying brain activity of interest.

Noise Removal
The power spectral distribution spikes are eliminated using a bandpass filter with a filter size of 50 Hz. Since we cannot process signals directly for feature extraction, the function built additionally turns the EEG signal into a NumPy array in addition to eliminating these spikes. After using the band pass filter, the smoothed power spectral is shown in Figure 6.
this study, artifact correction was likely performed to remove any electrical noise or a facts caused by muscle movement or eye blinks.
One common method for artifact correction is Independent Component Anal (ICA), which separates EEG signals into independent components that correspond to ferent sources in the brain or outside the brain, such as muscle activity or eye movem By identifying these components, the artifacts can be isolated and removed from the E signals.
Re-referencing is a process of changing the reference electrode to improve the sig to-noise ratio and enhance the detectability of EEG signals. In the context of this stu the EEG signals were likely referenced to a common reference electrode or a referen free method was applied. This is done to eliminate or minimize the impact of spati distributed electrical activity that is unrelated to the underlying brain activity of inter

Noise Removal
The power spectral distribution spikes are eliminated using a bandpass filter wi filter size of 50 Hz. Since we cannot process signals directly for feature extraction, function built additionally turns the EEG signal into a NumPy array in addition to el nating these spikes. After using the band pass filter, the smoothed power spectral is sho in Figure 6.

Feature Engineering
After the noise is removed from the EEG signals, they are transformed to a NumPy array, then feature engineering is used to extract valuable features from the data that will be used to distinguish between the EEG power spectrum of a healthy person and that of a mentally ill person. Following are two different kinds of features: 1: Linear features; 2: Nonlinear features. Following are the linear features which are given as: power at alpha, power at beta, power at delta, power at theta, mean amplitude, median amplitude, maximum amplitude, minimum amplitude. The explicit EEG features used in the analysis are linear and nonlinear features. The linear features include power at different frequencies such as alpha, beta, delta, and theta. The amplitude of power signals including mean, median, maximum, and minimum are also used as linear features. The nonlinear features used in the analysis are spectral entropy and singular-value deposition entropy. The Pandas data frame contains all the extracted features, including linear and nonlinear features.
Power can be seen at various frequencies, including alpha, beta, delta, and theta. Amplitudes of power signals include mean, median, maximum, and minimum. The Pandas data frame contains the features that were extracted. The linear features are shown in Table 3.

Non-Linear Features
The supplied EEG signal datastore is used to extract two nonlinear features: spectral entropy and singular-value deposition entropy. These two characteristics demonstrate how much valuable information is lost from the signal. The retrieved nonlinear features are shown in Table 4.

Feature Allocation and Visualization
All of the features in this step are collected into a single data frame and saved in a CSV file that will be used in the following section. Table 5 in the citation below displays the Pandas data frame, which stands for the feature datastore.

Visualization of Linear Features
The terms "transformation" and "function" both refer to something that takes in a number and produces a number, such as f(x) = 2xf(x) = 2xf, where x is the input number and x is the output number. However, despite the fact that we frequently visualize functions using graphs, the phrase "transformation" is frequently used to imply that you should instead see a thing moving, stretching, squishing, etc. Consequently, the translation of the function f(x) = 2xf(x) = 2xf, left parenthesis, x, and right parenthesis, equals 2, x, gives us the multiplication-by-two video above. The number line's point one is moved to where two begins, two to where four begins, etc. Figure 7's Visualization of Linear Features section displays four EEG features-delta, theta, alpha, and beta-that were retrieved.
should instead see a thing moving, stretching, squishing, etc. Consequently, the translation of the function f(x) = 2xf(x) = 2xf, left parenthesis, x, and right parenthesis, equals 2, x, gives us the multiplication-by-two video above. The number line's point one is moved to where two begins, two to where four begins, etc. Figure 7's Visualization of Linear Features section displays four EEG features-delta, theta, alpha, and beta-that were retrieved.

Visualization of Power Spectral Features
The frequency and power characteristics of a signal are extracted using the block called spectral features. Unwanted frequencies can also be filtered out using low-pass and high-pass filters. Figure 8 shows the visualization of four power spectral features, including the minimum, maximum, median, and mean.

Visualization of Power Spectral Features
The frequency and power characteristics of a signal are extracted using the block called spectral features. Unwanted frequencies can also be filtered out using low-pass and high-pass filters. Figure 8 shows the visualization of four power spectral features, including the minimum, maximum, median, and mean.
should instead see a thing moving, stretching, squishing, etc. Consequently, the translation of the function f(x) = 2xf(x) = 2xf, left parenthesis, x, and right parenthesis, equals 2, x, gives us the multiplication-by-two video above. The number line's point one is moved to where two begins, two to where four begins, etc. Figure 7's Visualization of Linear Features section displays four EEG features-delta, theta, alpha, and beta-that were retrieved.

Visualization of Power Spectral Features
The frequency and power characteristics of a signal are extracted using the block called spectral features. Unwanted frequencies can also be filtered out using low-pass and high-pass filters. Figure 8 shows the visualization of four power spectral features, including the minimum, maximum, median, and mean.

Model Development
At first, characteristics were extracted from EEG data from several patients while they were at rest. Along with the patient's condition and demographic information, these data are merged. The patient's current state of health will serve as the response, and the retrieved attributes will be employed as a predictor. Major depressive disorder and healthy control are the two basic categories into which the patient's condition can be divided. Additional MDD subtypes include the following six: Trauma and stress-related disorder; • Mood disorder;

Model Development
At first, characteristics were extracted from EEG data from several patients while they were at rest. Along with the patient's condition and demographic information, these data are merged. The patient's current state of health will serve as the response, and the retrieved attributes will be employed as a predictor. Major depressive disorder and healthy control are the two basic categories into which the patient's condition can be divided. Additional MDD subtypes include the following six: All the demographic data kept in the Pandas data frame are displayed in Table 6 below. The features are integrated with this data frame, as shown in Table 7, to generate the training datastore.

Data Preprocessing and Pre-Operation
The columns for age, gender, IQ, and serial number are removed from this data frame as well as any null entries. The target columns and predictor columns for classification are selected after the dataset has been analyzed. The remaining columns of the feature are set as predictor columns, with the MDD column set as the target column. Data are subjected to cross-validation using 10-fold validation.

Label Datastore
The label datastore is displayed below in Table 8.  Table 9 displays the feature datastore.

Classification Model
For the classification of each MDD based on the features that were taken from the EEG data, three models were created.

XGBoost
A distributed, scalable gradient-boosted Decision Tree (GBDT) machine learning framework is called Extreme Gradient Boosting (XGBoost). Parallel tree boosting is a feature of the best ML library for regression, classification, and ranking problems. Table 10 below lists the XGBoost parameters. According to the findings, feature optimization combined with the XGBoost algorithm improves classification accuracy. A number of features are extracted from the EEG brain signals in this work, and the set of features is then optimized utilizing the correlation matrix, information gain computation, and recursive feature removal approach.  Figure 9 depicts the overall operation of the XGBoost method, which preprocesses the data before segmenting it, extracting features, and creating a correlation matrix. The data splitter separates it into training sets and testing sets once it has received the data. The classification outcome is presented by the XGBoost classifier last.    Regularization: XGBoost offers a range of regularization penalties in order to prevent overfitting. Penalty regularizations result in successful training, which enables accurate generalization of the model. c.
Non-linearity: XGBoost can recognize and learn from non-linear data patterns. d.
Cross-validation: Pre-installed and readily available. e.
Scalability: Thanks to distributed servers and clusters like Hadoop and Spark, XGBoost can handle large amounts of data.

Random Forest Model
The Random Forest method's ensemble of Decision Trees is constructed from a data sample selected from a training set and a replacement sample known as the bootstrap sample. The RF model's parameters are shown in Table 11.

Model Parameters
The steps of the Random Forest algorithm are as follows and as shown in Figure 10:

•
Step 1: The Random Forest technique uses n randomly chosen records from a data collection of k records.

•
Step 2: A distinct Decision Tree is constructed for each sample.

•
Step 3: Each Decision Tree will generate an output.

•
Step 4: The final outcome for classification and regression is assessed using a majority vote or an average.

1D CNN Model
The creation of a neural network is a very iterative process that calls for adjusting a number of hyperparameters to maximize the output. Additionally, trying out other architectures is part of it. Here, we will begin by constructing a sequential CNN. It will include our classification layer, two convolution layers, one dropout layer, one max pooling layer, one flatten layer, and one dense connected layer. Table 12 lists the parameters for the 1D CNN model, whereas Table 13 lists the hyperparameters.

•
Model Architecture

1D CNN Model
The creation of a neural network is a very iterative process that calls for adjusting a number of hyperparameters to maximize the output. Additionally, trying out other architectures is part of it. Here, we will begin by constructing a sequential CNN. It will include our classification layer, two convolution layers, one dropout layer, one max pooling layer, one flatten layer, and one dense connected layer. Table 12 lists the parameters for the 1D CNN model, whereas Table 13 lists the hyperparameters.

•
Model Architecture

Epochs 25
Batch size 32 Learning rate 0.001 Loss Binary cross entropy loss Input, output, and hidden layers are all features of CNNs that aid in the processing and classification of pictures. Convolutional, pooling, ReLU, and fully linked layers are included in the hidden layers. The CNN Classification layer is displayed in Figure 11.  Multiple artificial neuronal layers make up CNN. Artificial neurons are mathematical processes that compute the weighted sum of a number of inputs and output an activation value, just like their biological counterparts do. Each layer of a ConvNet generates several activation functions in response to the entry of a picture, which are subsequently transmitted to the following layer. Basic elements, including borders with a horizontal or diagonal axis, are often removed in the first layer. The layer below receives this output and thus can identify more complex properties, such as corners and multiple edges. The classification layer provides a series of confidence ratings (numbers between 0 and 1) that indicate how likely it is for the image to belong to a "class," based on the activation map of the final convolution layer.

Model Training Results
The sklearn library is used to import both models. Both models classify each MDD with an accuracy of more than 80%. Tables 14 and 15 show the outcomes of the two mod- Multiple artificial neuronal layers make up CNN. Artificial neurons are mathematical processes that compute the weighted sum of a number of inputs and output an activation value, just like their biological counterparts do. Each layer of a ConvNet generates several activation functions in response to the entry of a picture, which are subsequently transmitted to the following layer. Basic elements, including borders with a horizontal or diagonal axis, are often removed in the first layer. The layer below receives this output and thus can identify more complex properties, such as corners and multiple edges. The classification layer provides a series of confidence ratings (numbers between 0 and 1) that indicate how likely it is for the image to belong to a "class," based on the activation map of the final convolution layer.

Model Training Results
The sklearn library is used to import both models. Both models classify each MDD with an accuracy of more than 80%. Tables 14 and 15 show the outcomes of the two models.

Model Evaluation
The MODMA dataset is taken into consideration in order to predict depression using EEG signals from MDD patients and healthy control individuals. First, linear characteristics and nonlinear features are retrieved from the EEG signals. Additionally, MODMA offers data from patients, including demographic and psychological assessment data. After that, the features are integrated with the demographic information, which includes details such as gender, age, and MDD type. MDD is divided into six classes.

•
Obsessive-compulsive disorders; • Addictive disorder; • Trauma and stress-related disorder; • Mood disorder; Every class has an additional two characteristics, namely patients with disorders and healthy controls. In model training, classes are used as the responses and the features that were extracted as predictors. In order to determine the accuracy of three distinct models for six different illnesses, models are then assessed using the testing dataset. •

Evaluation of Training Model
The following metrics are being considered for evaluation of the trained model Accuracy score; Micro F1 score; Macro F1 score; ROC curve; Micro Recall score; Macro recall score; Macro precision score; Micro precision score.
Accuracy: Ratio of the number of correct predictions to the total number of predictions, and this represents how often the classifier makes the correct predictions.
Here, Equation (1) relates to an equation for accuracy, which expresses the proportion of correctly classified data instances to all other data instances.
If the dataset is unbalanced, accuracy might not be an acceptable metric (both negative and positive classes have different numbers of data instances).
Precision: Proportion of anticipated positives that are actually positive.
The precision model is shown in Equation (2). A good classifier's precision should preferably be 1 (high). Only when the numerator and denominator are equal, or when TP = TP + FP, does precision become 1, which also implies that FP is zero. The accuracy value drops as FP rises because the denominator value exceeds the numerator.
Recall: The fraction of true positives successfully identified.
The recall equation is shown in Equation (3), where recall for a good classifier should ideally be 1 (high). Only when the numerator and denominator are identical, as in TP = TP + FN, does recall become 1, which also implies that FN is zero. As FN increases, the denominator value rises above the numerator and the recall value falls.
F1 score: The harmonic mean of recall and precision.
The F1 Score equation is shown in Equation (4). When precision and recall are both 1, the F1 Score is 1. Only when precision and recall are both strong can the F1 score rise. A more useful metric than accuracy is the F1 score, which is the harmonic mean of recall and precision. The results are as shown in the below Table 16.

First Class Addictive Disorder
Early detection of depression symptoms is a crucial initial step towards evaluation, diagnosis, and behavior modification. The performance of a classification model is determined using an N × N matrix termed a confusion matrix, where N is the total number of

First Class Addictive Disorder
Early detection of depression symptoms is a crucial initial step towards evaluation, diagnosis, and behavior modification. The performance of a classification model is determined using an N × N matrix termed a confusion matrix, where N is the total number of

First Class Addictive Disorder
Early detection of depression symptoms is a crucial initial step towards evaluation, diagnosis, and behavior modification. The performance of a classification model is determined using an N × N matrix termed a confusion matrix, where N is the total number of target classes. The RF model's parameters are listed in Table 17 below. In comparison, the confusion matrix for the RF model is shown in Figure 14.

Random Forest Classification Model
The model evaluation parameters for the RF classifier model are displayed in Table 17. Additionally, Figure 13a displays the RF classifier model's confusion matrix.

XGBoost classification Model
XGBoost with its traditional classifier will be the first algorithm we employ. This is the standard basic algorithm from the XGBoost library, and Table 18's display of the XGBoost model's parameters illustrates this. The confusion matrix for the XGBoost model is shown in Figure 14a.  Figure 14b shows the ROC graph for the XGBoost model. There are two linear graphs showing ROC curve and random curve

CNN Classification Model
The effectiveness of the categorization approach is summarized in a confusion matrix. In other words, the confusion matrix summarizes how well the classifier performed. The parameters of the CNN Classification model are shown in Table 19. The confusion matrix for the CNN model is shown in contrast in Figure 15.

CNN Classification Model
The effectiveness of the categorization approach is summarized in a confusion matrix. In other words, the confusion matrix summarizes how well the classifier performed. The parameters of the CNN Classification model are shown in Table 19. The confusion matrix for the CNN model is shown in contrast in Figure 15.     Figure 16a shows the training and validation accuracy performance graph, where the blue graph shows the training accuracy and the orange graph shows the validation accuracy. Similarly, Figure 16b shows the training and validation loss performance graph. Figure 16c shows the ROC curve for the CNN Classification model.

Class Obsessive-Compulsive Disorder
The broad category of neurotic, stress-related, and somatoform disorders, which also includes hypochondriacal disorder as a sub-group of somatoform disorders, is where OCD is categorized in the ICD-10.

Class Obsessive-Compulsive Disorder
The broad category of neurotic, stress-related, and somatoform disorders, which also includes hypochondriacal disorder as a sub-group of somatoform disorders, is where OCD is categorized in the ICD-10.

Random Forest Classifier Model
Random Forest is an ensemble classifier made up of several Decision Trees that produces a class based on the average output of the class from each individual tree. The RF classifier model's parameters are displayed in Table 20. Figure 17 depicts the RF classifier model's confusion matrix in contrast.    Figure 18a depicts the XGBoost model's confusion matrix in contrast.   Figure 17 shows the ROC curve for the RF classifier model. There are two linear graphs showing ROC curve and random curve. Table 21 displays the XGBoost model's parameters. Figure 18a depicts the XGBoost model's confusion matrix in contrast.  Figure 18 shows the ROC curve for the XGBoost classifier model. There are two linear graphs showing ROC curve and random curve.  Table 22 lists the CNN model's parameters, whereas Figure 19 shows the confusion matrix.  Figure 18 shows the ROC curve for the XGBoost classifier model. There are two linear graphs showing ROC curve and random curve. Table 22 lists the CNN model's parameters, whereas Figure 19 shows the confusion matrix.   Figure 20a shows the training and validation accuracy performance graph. The blue graph shows the training accuracy and the orange graph shows the validation accuracy. Similarly, Figure 20b shows the training and validation loss performance graph. Figure  20c shows the ROC curve for the CNN Classification model.  Figure 20a shows the training and validation accuracy performance graph. The blue graph shows the training accuracy and the orange graph shows the validation accuracy. Similarly, Figure 20b shows the training and validation loss performance graph. Figure 20c shows the ROC curve for the CNN Classification model.

Random Forest Classifier
The model evaluation parameters for the RF classifier model are displayed in Table 23. Additionally, Figure 21a displays the RF classifier model's confusion matrix.    Table 24 lists the XGBoost model's parameters according to the Class Trauma Stressrelated Disorder, and Figure 22 displays the confusion matrix.    Figure 21b shows the ROC curve for the RF classifier model. There are two linear graphs showing ROC curve and random curve. Table 24 lists the XGBoost model's parameters according to the Class Trauma Stressrelated Disorder, and Figure 22 displays the confusion matrix.   Table 24 lists the XGBoost model's parameters according to the Class Trauma Stressrelated Disorder, and Figure 22 displays the confusion matrix.     Table 25 lists the CNN model's parameters, while Figure 23 depicts the confusion matrix for the same model.  Figure 24a shows the training and validation accuracy performance graph. The blue graph shows the training accuracy and the orange graph shows the validation accuracy. Similarly, Figure 24b shows the training and validation loss performance graph. Figure  24c shows the ROC curve for the CNN Classification model.  Figure 24a shows the training and validation accuracy performance graph. The blue graph shows the training accuracy and the orange graph shows the validation accuracy. Similarly, Figure 24b shows the training and validation loss performance graph. Figure 24c shows the ROC curve for the CNN Classification model. matrix for the same model.  Figure 24a shows the training and validation accuracy performance graph. The blue graph shows the training accuracy and the orange graph shows the validation accuracy. Similarly, Figure 24b shows the training and validation loss performance graph. Figure  24c shows the ROC curve for the CNN Classification model.

Class Mood Disorder
To identify mood disorders, we employ tree-based classification algorithms, specifically classification trees, along with the Random Forest, XGBoost, and CNN approaches.

Class Mood Disorder
To identify mood disorders, we employ tree-based classification algorithms, specifically classification trees, along with the Random Forest, XGBoost, and CNN approaches.

CNN Model
The parameters of the CNN model are shown in Table 28, while the CNN model's confusion matrix is shown in Figure 27.    Figure 26b shows the ROC curve for the XGBoost classifier model. There are two linear graphs showing ROC curve and random curve.

CNN Model
The parameters of the CNN model are shown in Table 28, while the CNN model's confusion matrix is shown in Figure 27.

CNN Model
The parameters of the CNN model are shown in Table 28, while the CNN model's confusion matrix is shown in Figure 27.   Figure 28a shows the training and validation accuracy performance graph. The blue graph shows the training accuracy and the orange graph shows the validation accuracy. Similarly, Figure 28b shows the training and validation loss performance graph. Figure  27c shows the ROC curve for the CNN Classification model.  Similarly, Figure 28b shows the training and validation loss performance graph. Figure 27c shows the ROC curve for the CNN Classification model.

Class Schizophrenia
ML algorithms can identify diseases like schizophrenia and support clinical decisi making with predictive models. In order to forecast the presence of hospitalized schi phrenia patients, machine learning techniques such as Decision Tree, Random For

Class Schizophrenia
ML algorithms can identify diseases like schizophrenia and support clinical decisionmaking with predictive models. In order to forecast the presence of hospitalized schizophrenia patients, machine learning techniques such as Decision Tree, Random Forest, XGBoost, and CNN are used. Table 29 provides the RF model's parameters according to Class Schizophrenia, and Figure 29a depicts the model's confusion matrix.

XGBoost Model
The XGBoost parameters are shown in Table 30, and Figure 30a provides the confusion matrix.   Table 31 lists the CNN model's parameters, and Figure 31 displays the model's confusion matrix.  Figure 32a shows the training and validation accuracy performance graph. The blue graph shows the training accuracy and the orange graph shows the validation accuracy.

XGBoost Model
The XGBoost parameters are shown in Table 30, and Figure 30a provides the confusion matrix.

XGBoost Model
The XGBoost parameters are shown in Table 30, and Figure 30a provides the confusion matrix.   Table 31 lists the CNN model's parameters, and Figure 31 displays the model's confusion matrix.   Table 31 lists the CNN model's parameters, and Figure 31 displays the model's confusion matrix.    Figure 32a shows the training and validation accuracy performance graph. The blue graph shows the training accuracy and the orange graph shows the validation accuracy. Similarly, Figure 32b shows the training and validation loss performance graph. Figure 31c shows the ROC curve for the CNN Classification model.

Anxiety Disorder
Data mining is able to find hidden patterns and associations that can be utilized to forecast generalized anxiety disorder, which leads to substantial insights. The Random Forest approach is one of the categorization data mining strategies that embeds good predictive properties for accurate prediction.

Anxiety Disorder
Data mining is able to find hidden patterns and associations that can be utilized to forecast generalized anxiety disorder, which leads to substantial insights. The Random Forest approach is one of the categorization data mining strategies that embeds good predictive properties for accurate prediction.

XGBoost Model
The XGBoost model's parameters are listed in Table 33, and Figure 34a provides the model's confusion matrix.    Figure 33b shows the ROC curve for the RF classifier model. There are two linear graphs showing ROC curve and random curve.

XGBoost Model
The XGBoost model's parameters are listed in Table 33, and Figure 34a provides the model's confusion matrix.

XGBoost Model
The XGBoost model's parameters are listed in Table 33, and Figure 34a provides the model's confusion matrix.     Figure 36a shows the training and validation accuracy performance graph. The blue graph shows the training accuracy and the orange graph shows the validation accuracy. Similarly, Figure 36b shows the training and validation loss performance graph. Figure  36c shows the ROC curve for the CNN Classification model.  Figure 36a shows the training and validation accuracy performance graph. The blue graph shows the training accuracy and the orange graph shows the validation accuracy. Similarly, Figure 36b shows the training and validation loss performance graph. Figure 36c shows the ROC curve for the CNN Classification model. model's confusion matrix.  Figure 36a shows the training and validation accuracy performance graph. The blue graph shows the training accuracy and the orange graph shows the validation accuracy. Similarly, Figure 36b shows the training and validation loss performance graph. Figure  36c shows the ROC curve for the CNN Classification model.

Discussion
Electroencephalograms serve as an important point of reference and an objective foundation for the detection and diagnosis of depression (EEGs). In order to improve the diagnostic accuracy, a high-performance hybrid neural network depression detection strategy using deep learning technology is proposed in this research. This research considers resting-state neurological signals via 128 channels. Data from both an advanced wearable EEG collector and a traditional 128-electrode elastic cap are given. Results for

Discussion
Electroencephalograms serve as an important point of reference and an objective foundation for the detection and diagnosis of depression (EEGs). In order to improve the diagnostic accuracy, a high-performance hybrid neural network depression detection strategy using deep learning technology is proposed in this research. This research considers resting-state neurological signals via 128 channels. Data from both an advanced wearable EEG collector and a traditional 128-electrode elastic cap are given. Results for categorization accuracy from three models were inconsistent. When comparing CNN to the other two models, it performs data classification more accurately. EEG signal data are collected from the multi-modal open dataset MODMA, which is employed in the study of mental diseases. The EEG dataset contains information from both a traditional 128-electrode elastic cap and a cutting-edge wearable 3-electrode EEG collector for widespread applications. There are three datastores for EEG. An EEG machine typically has a lot of electrodes. After obtaining voltage from the patient's scalp, the electrodes convert the voltage into signal data. For instance, each electrode will generate a time series of voltage values if there are n electrodes. The voltage in different areas of the brain varies.
A bandpass filter with a 50 Hz filter size is used to remove the power spectral distribution spikes. The programme not only removes these spikes but also converts the EEG signal into a NumPy array since we cannot analyze signals directly for feature extraction. The EEG signals are processed to reduce noise and converted to a NumPy array, and feature engineering is applied to the data to extract useful features that will be used to differentiate between the EEG power spectrum of a healthy individual and that of a mentally ill person. We extract spectral entropy and singular-value deposition entropy, two nonlinear characteristics, from the provided EEG signal datastore. The signal loses a lot of important information, as evidenced by these two properties. A number of patients' resting EEG data were first used to derive characteristics. These statistics are now combined with details about the patient's condition and demographics. The gathered qualities will be used as a predictor, and the patient's current state of health will be the response. The two fundamental groups into which the patient's state can be separated are major depressive disorder and healthy control. There are six additional MDD subtypes: Mental illnesses such as, for example, obsessive-compulsive disorders, addiction disorders, disorders linked to trauma and stress, mood disorders, schizophrenia, and anxiety disorders.
In this study, three machine learning models-Random Forest, XGBoost, and CNNbased models-are used to analyze MODMA data in order to diagnose depression. The study's objective is to identify traits and link those qualities to the appropriate labels-in this case, MDD and healthy controls. We translate these labels to 1 and 0. There are six major depressive illnesses that fall under the MDD umbrella. Models will be trained using attributes and particular labels from the MDD class. Three models produced results for categorization accuracy that varied. Comparing CNN to the other two models, it is more accurate in classifying data. CNN reported a 97% accuracy rate for training with 25-epoch iterations.
The suggested strategy offers a number of advantages. First, it can more accurately identify between persons with depression and healthy participants based on the same dataset than previous methods. The network model also includes an attention mechanism that considerably reduces training time. The results show that by focusing computing resources on traits with high weights, the attention mechanism decreases overhead.

Comparative Analysis
Three key classification techniques-Xgboost, Random Forest, and CNN model-are used in the design of our model. The best accuracy, 97%, was provided by the CNN model over 25 training epochs. With a large number of layers, the CNN model is designed to learn data trends with greater precision. In EEG diagnosis of depression based on multi-channel data fusion and clipping augmentation and convolutional neural network, a maximum accuracy of 90.02 is attained with the aid of the CNN model as compared to some prior work for depression identification with the use of the MODMA dataset. Using the CNN + GRU model, the minimum accuracy in comparable work is achieved of up to 89.63. In our model, we employed a lower learning rate and a greater number of epochs, which assisted the CNN model in extracting the greatest number of features from the dataset and providing the most accuracy. Table 35 lists some of the most recent works. Using a Deep Learning CNN network, our technique had a 97% accuracy rate. There are a number of useful aspects of the suggested model for EEG-based depression detection using multiple ML techniques that could make it useful in practical applications. The proposed model's capacity to increase diagnostic precision for depression by fusing EEG signals with demographic information such as age and gender is one of its main benefits. Earlier and more efficient treatment might result from this, which would ultimately improve patient outcomes. The proposed model's use of open datasets, particularly the MODMA dataset for gathering EEG signal data, is another practical feature. This broadens the information available for studying mental illnesses and improves the approach's usability and applicability in the real world. This method is effective and reliable, automating the diagnosis of depression through the use of machine learning and deep learning techniques for automatic depression detection from EEG signals. This may lessen the amount of work clinicians have to do and increase the speed and precision of diagnosis. Furthermore, the proposed model can be applied widely with both conventional 128-electrode elastic caps and cutting-edge wearable 3-electrode EEG collectors. This makes data collection more flexible and convenient, increasing its accessibility to a wider range of patients.

Conclusions
In order to comprehensively examine the features of EEG signals and recommend a high-performance technique for mental state detection, the researchers used DL algorithms as the study object and EEG signals as the research object. After that, the model's model parameters and hyperparameters were adjusted via testing; comparison studies were conducted to confirm the approach's applicability and effectiveness. The approach indicated in this study is more productive in terms of recognizing and diagnosing depression, based on the trial data. In the case of a few repetitions, our algorithm might dynamically extract the EEG signal features to outperform earlier methods in classification performance. This method is implemented well on open datasets and establishes a technological foundation for the evaluation and diagnosis of depression. The performance and effectiveness of the methodology were validated through comparative experiments. The approach utilized in this research has a 98% accuracy rate when applied to the public dataset.
Even if the model is successful in identifying mental states, the following problems need to be fixed: Despite the fact that there were not enough negative samples, the dataset used in this research can substantiate the observations made in this article. In the future, we intend to gather more diagnostic data to enhance the generalization ability of the model. Additionally, as the main objective of this research was to diagnose depression, future studies on non-destructive treatments could be taken into consideration. Heavy electrode caps that had to be forced on the scalp surface in order to fully connect with them were employed to capture EEG data from the study's participants. As a result, some users might have felt pain. We can take into consideration employing fewer, lighter electrodes in future studies with portable acquisition techniques such as ear-BCI. (3) Only a qualitative analysis of the psychological state was carried out in this investigation. Future study might involve quantitative assessments of psychological status. Depression may be diagnosed depending on its intensity, which is further classified as normal, mild, moderate, or severe. The functional form could be established as a confirmation to exhibit the concentration level.
When diagnosing depression, it is beneficial to consider demographic factors other than age and gender, like ethnicity and socioeconomic status. These elements may significantly affect the prevalence and severity of depression, but our paper did not specifically address them. Future studies can be carried out that take these factors into account.