Machine Learning Model for Computer-Aided Depression Screening among Young Adults Using Wireless EEG Headset

Depression is a disorder that if not treated can hamper the quality of life. EEG has shown great promise in detecting depressed individuals from depression control individuals. It overcomes the limitations of traditional questionnaire-based methods. In this study, a machine learning-based method for detecting depression among young adults using EEG data recorded by the wireless headset is proposed. For this reason, EEG data has been recorded using an Emotiv Epoc+ headset. A total of 32 young adults participated and the PHQ9 screening tool was used to identify depressed participants. Features such as skewness, kurtosis, variance, Hjorth parameters, Shannon entropy, and Log energy entropy from 1 to 5 sec data filtered at different band frequencies were applied to KNN and SVM classifiers with different kernels. At AB band (8–30 Hz) frequency, 98.43 ± 0.15% accuracy was achieved by extracting Hjorth parameters, Shannon entropy, and Log energy entropy from 5 sec samples with a 5-fold CV using a KNN classifier. And with the same features and classifier overall accuracy = 98.10 ± 0.11, NPV = 0.977, precision = 0.984, sensitivity = 0.984, specificity = 0.976, and F1 score = 0.984 was achieved after splitting the data to 70/30 ratio for training and testing with 5-fold CV. From the findings, it can be concluded that EEG data from an Emotiv headset can be used to detect depression with the proposed method.


Introduction
Depressive disorder is a highly prevalent mental illness. Sadness, loss of interest or enjoyment, feelings of guilt or low self-worth, interrupted sleep or food, fatigue, and difculty concentrating are some characteristics of depression. It may afect a person's capacity to operate in daily life or at work or school. According to the World Health Organization (WHO) back in 2015, almost 4.4% of the world's population was sufering from depression [1]. Because of the COVID-19 pandemic, many people sufered from depression due to job loss, study hampering, losing close relatives, staying indoors, etc. A study showed 19.3% increase in depression symptoms among people after COVID-19 in the United States [2]. A study has shown the changes in obsession, depression, and quality of life in schizophrenia patients before and after COVID-19 [3]. When depression is severe it can lead to suicide. Every year around 800 thousand people die because of suicide [1]. In 2017, 13.2% of young adults (aged [18][19][20][21][22][23][24][25] in the U.S. sufered from depression which was 5.1% less in the year 2009 [4]. Of the deaths of young people, around 9.1% are due to suicide [5]. In most suicide cases, people had psychiatric disorders where depression is the most common disorder among others [6]. According to a recent study, insecure attachment styles are linked to greater problems such as depression, social anxiety, and suicidal thoughts [7]. So, depression is a major issue that should be diagnosed and treated at an early age to prevent suicide and for the betterment of the quality of life. Tere are various screening tools to detect depression. Tere are chronic social defeat stress models of depression such as the Morris water maze test and T-maze test to learn about the cognitive functions [8]. Traditionally, clinical questionnaire-based diagnoses are used to detect depression, where medical professionals (psychologists, psychiatrists, counselors, and physicians) interview and observe patients' behavior to determine depression [9]. Te Multiple Sclerosis Depression Rating Scale (MSDRS) is a screening tool to evaluate depression in multiple sclerosis (MS) patients and may make fatigue and depressive symptoms more distinguishable [10]. Adjustment disorders with depressed mood are diagnosed using the Diagnostic and Statistic Manual of Mental Disorders, Fourth Edition, Text Revision (DSM-IV TR), published by the American Psychiatric Association (APA) [11]. Te Beck Depression Inventory-II (BDI-II), a 21-item self-report questionnaire, is frequently used to assess the severity of depression in adults and adolescents. To be more congruent with DSM-IV depression criteria, the BDI-II was redesigned in 1996 [12]. Te Center for Epidemiologic Studies Depression (CES-D) Scale, a commonly used self-report depression symptom scale, was given to convenience samples of high school and college students [13]. Te most popular tool for patient selection and followup in research studies on depression treatments is the Hamilton Depression Rating Scale (HDRS, often known as HAM-D) [14]. Te Depression Anxiety Stress Scales 21 (DASS-21) is reliable and can easily be used by the patient to understand their symptoms of depression, anxiety, and stress. It is based on 21 questionnaires, where 7 is for depression, 7 for anxiety, and 7 for stress [15]. Te International Classifcation of Diseases 10th Revision (ICD-10) is authorized by the WHO. Te ICD-10 Symptom Rating (ISR) is a brand-new 29-item self-rating questionnaire containing scales for evaluating eating disorders, OCD, depression, and anxiety [16]. Te three-page Patient Health Questionnaire (PHQ9) is completely self-administered by the patient. Te PHQ evaluates eight diagnoses, broken down into subthreshold disorders (disorders whose criteria encompass fewer symptoms than are required for any specifc DSM-IV diagnoses: other depressive disorder, probable alcohol abuse/dependence, somatoform, and binge eating disorder) and threshold disorders (disorders that correspond to specifc DSM-IV diagnoses: major depressive disorder, panic disorder, other anxiety disorder, and bulimia nervosa) [17]. Sometimes depressed individuals are not willing to provide reliable information due to the nature of the disorder which can lead to inaccurate diagnosis. So, it is essential to fnd more efective methods to diagnose depression.
Tere are also other methods to detect depression effectively. Studies indicate that depressed individuals have abnormal brain activity compared to depression control individuals. Functional magnetic resonance imaging (fMRI) is an imaging technique to investigate brain functionality and structure. Yang et al. has shown results that depression can be identifed using fMRI [18]. Electroencephalography (EEG) is a noninvasive technique for evaluating brain function. Using electrodes attached to the scalp, EEG analyzes the electrical activity of sizable groups of synchronously fring neurons in the brain [19]. EEG is used to detect electrical activity of the brain which is being used as a diagnostic biomarker for depression. Depression creates emotional variation and unusual brain activity which can be detected by EEG; thus, EEG can identify depression [20]. MRI can provide more accurate results, but it is not a very practical way of detecting depression. MRI is expensive, can cause claustrophobia, is not portable, and is not easy to use [21]. On the other hand, EEG is portable, noninvasive, easy to use, gives higher temporal resolution, and is less expensive than other brain monitoring methods [22]. With the help of machine learning (ML) and deep learning, the electrical signals recorded from the EEG can be used to classify depressed and depression control individuals. Tere are existing studies that have shown great promise to diagnose depression using ML. So, ML can diagnose depression by using the EEG signal. and EMD features and applied logical regression to classify the data, and acquired 90.5% accuracy with 10-fold CV [23]. Te same dataset was used by [24][25][26][27][28][29][30][31][32][33]. Here, Mahato and Paul extracted band power, interhemispheric asymmetry, relative wavelet energy, and wavelet entropy as features, multilayered perceptron neural network (MLPNN), and radial basis function network (RBFN) classifer which similarly obtained 93.3% accuracy [24]. Ke [34]. Li et al. in the year 2019 [35] and 2020 [36] used a dataset containing EEG data of 24 depressed and 24 healthy participants. Te depression detection was performed using the BID-II screening tool and the EEG data was recorded using HydroCel Geodesic Sensor Net (HCGSN) device. Li et al. in year 2019 achieved 85.62% accuracy by fltering the data from 1 to 40 Hz and using computer-aided detection (CAD) system using a convolutional neural network (ConvNet) with 24-fold CV [35]. Li et al. in the year 2020 used 3 channels of data and fltered the data from 0.5 to 70 Hz. Tey achieved 80.74% accuracy using CNN classifer [36]. Akbari et al. acquired data from 44 participants (22 depressed and 22 healthy) aged between 23 and 58 years. By extracting the rhythm feature by empirical wavelet transform and centered correntropy features with SVM classifer, 98.33% accuracy was achieved using FP1-T3 channels data and 98.76% accuracy was achieved using FP2-T4 channels data [37]. Ten, using reconstructed phase space of the EEG and genetic algorithm (GA) with SVM classifer, 97.74% accuracy was achieved using FP1-T3 channels data and 99.3% accuracy was achieved using FP2-T4 channels data [38]. Ten, they collected fuctuation index as features and cascade forward neural network (CFNN) as a classifer which achieved 99.5% accuracry using FP1-T3 channels data and 100% accuracy using FP2-T4 channels data [39]. Cai et al. used a dataset containing EEG data of 92 depressed and 121 healthy participants. PHQ9 was used as a screening tool to identify depression among the participants. Tey extracted peak, variance, Hjorth parameter, skewness, kurtosis, relative centroid frequency, absolute centroid frequency relative power, absolute power, Kolmogorov Entropy, Shannon Entropy, co-complexity, correlation dimension, and powerspectral entropy as features and achieved 79.27% accuracy using KNN classifer [20]. Cai et al. used a dataset containing EEG data of 152 depressed and 113 healthy participants aged between 18 and 55 years. Tey extracted features such as variance, peak, kurtosis, inclination, Hjorth parameter, cocomplexity, correlation dimension, power spectrum entropy, Kolmogorov entropy, and Shannon entropy. For classifcation, they used decision tree (DT) and achieved 76.4% accuracy [40]. Cai et al. in the year 2020 used a dataset containing EEG data of 86 depressed and 92 healthy participants aged between 18 and 55 years. Here, PHQ9 was used as the screening tool. Tey extracted linear features (band powers, center frequency, skewness, kurtosis, and peak of the whole band) and nonlinear features (variance, Hjorth's activity, power spectral entropy, Kolmogorov entropy, Shannon entropy, correlation dimension, and cocomplexity). Using the KNN classifer, they achieved 86.98% accuracy [41]. Wu et al. used an EEG dataset consisting of data from 24 depressed (aged 29.7 ± 10.9) and 31 healthy (aged 29.75 ± 9.9) participants. Te BDI-II and DSM-IV were used as screening tools for depression detection and HydroCel Geodesic Sensor Net (HCGSN) device was used for EEG recording. Tey extracted spectral power density (SPD) and the band power (BP) and achieved 83.64% using conformal kernel SVM (CK-SVM) as a classifer [42]. Acharya et al. used a dataset that contains EEG data from 15 depressed and 15 healthy individuals aged between 20 and 50 years. Tey applied CNN classifer and achieved 93.54% accuracy using FP1-T3 channels (left hemisphere) data and 95.49% accuracy using FP2-T4 channels (right hemisphere) data [43]. Kim et al. used 30 depressed (aged 42.5 ± 16.96) participants' EEG data and 37 healthy (aged 29.75 ± 9.9) participants' EEG data for their research. Tey extracted four types of electrodermal activity features (dMSCL, dSDSCL, dSKSCL, and dNSSCR). Tey achieved 74% accuracy by applying the extracted features to a support vector machine recursive feature elimination (SVM-RFE) for feature selection and decision tree (DT) classifcation [44]. Bachmann et al. used an EEG dataset of 26 participants (13 depressed and 13 healthy) aged between 18 and 66. Te ICD-10 screening tool was used to identify depression among participants and Neuroscan Synamps2 was used to record EEG signals. Tey used alpha band power variability, relative gamma power, spectral asymmetry index, Lempel-Ziv complexity, detrended fuctuation analysis, and Higuchi's fractal dimension as features which they applied to the logistic regression classifer and achieved 92% accuracy [45]. Arora et al. used EEG data from 25 participants aged between 16 and 60 years. Te DSM-IV was used as the screening tool. For features, they measured correlation dimension and co-complexity then applied the features to an SVM classifer with RBF kernel and got 91% accuracy [46]. Ay et al. used a dataset containing EEG data from 15 depressed and 15 healthy participants aged between 20 and 50 years. Tey applied CNN classifer to achieve 97.66% accuracy using FP1-T3 channels (left hemisphere) data and 99.12% accuracy using FP2-T4 channels (right hemisphere) data [47]. Peng et al. worked with a dataset with 27 depressed (aged 31.67 ± 10.94) and 28 healthy (aged 31.82 ± 8.76) participants with EEG data. Te PHQ9 was used as a screening tool for depression detection and HydroCel Geodesic Sensor Net (HCGSN) device was used for EEG recording. Tey extracted phase lag index (PLI) and high discriminative power features. Tey achieved 92% accuracy by applying SVM with the linear kernel as a classifer [48]. Mohammadi et al. worked with EEG data from 60 participants (aged 32.4 ± 10.5). Te participants were evaluated for depression using DSM-IV and BDI-II screening tools. Tey extracted fuzzy entropy, Katz fractal dimension, and fuzzy fractal dimension features, and then achieved 90% accuracy using fuzzy function based on neural network (FFNN) [49]. Wan et al. worked with 2 datasets. Te frst dataset contains EEG data from 35 participants aged between 20 and 56 years and the second dataset contains EEG data from 30 participants aged between 24 and 55 years. Te DSM-IV and HAM-D were used as screening tools to evaluate depression among participants. For the features, they extracted wavelet features, power spectral entropy, co-complexity, approximate entropy, and wavelet entropy. Using the KNN classifer Computational Intelligence and Neuroscience with a genetic algorithm for feature selection, they achieved 94.29% accuracy for the frst dataset and by using the regression trees classifer with a genetic algorithm, they achieved 86.67% accuracy for the second dataset [50]. Zhu et al. used a dataset with EEG signals from 19 depressed (aged 21.1 ± 1.95) and 20 healthy (aged 20.11 ± 2.07) participants. Te BDI-II was used as the screening tool and the data was recorded using HydroCel Geodesic Sensor Net (HCGSN) device. Tey extracted variance, maximum power, sumpower, approximate entropy, Kolmogorov entropy, permutation entropy, Lempel-Ziv complexity, correlation dimension, Lyapunov exponent, singular-value deposition entropy, min-entropy, Shannon entropy, spectral entropy, Hartley entropy, co-complexity as features with BestFirst algorithm for feature selection. Tey achieved 83.42% accuracy by using an SVM classifer with a linear kernel [51]. Toduparambil et al. worked with a database taken from the Public Domain Dedication and License (PDDL) v1.0; the data was recorded using a Neuroscan Synamps2 system. Tey applied CNN classifer and achieved 98.84% accuracy from the channels located at the left part of the brain and 99.07% accuracy from the channels located at the right part of the brain [52].Čukić et al. used EEG data collected from 21 depressed and 20 healthy participants aged from 24 to 68 years. Te depression was identifed by ICD-10 screening tools and EEG was recorded using NicoletOne Digital EEG Amplifer. Tey extracted Higuchi's Fractal Dimension (HFD) and Sample entropy. Tey achieved 97.57% accuracy using multilayer perceptron, logistic regression, decision tree, and Naïve Bayes classifer [53]. Mahato et al. worked with EEG data collected from 24 depressed (aged 35 ± 5.9) and 20 healthy (aged 36 ± 4.2) participants. Te DSM-V and HAM-D were the screening tools used to identify depression. Tey extracted band power, interhemispheric asymmetry, paired asymmetry, sample entropy, and detrended fuctuation analysis as features and achieved 96.02% accuracy with SVM as a classifer [54]. Liu et al. worked with a dataset containing EEG data from 20 depressed and 19 Healthy participants aged between 23 and 65 years. For screening depression among participants, HAM-D was used, and EEG data were recorded using a Neuroscan Quik-cap device. Tey measured quantifed infuence, phase synchronization, and functional integration, including degree, functional segregation, clustering coefcient, and characteristic path length, and did statistical analysis from the data and used PCA for feature selection. Tey achieved 89.7% accuracy for the beta band using the SVM classifer [55]. Bai et al. used a dataset containing EEG data from 142 depressed and 71 healthy participants. Tey extracted absolute centroid, variance, relative power, absolute power, power spectral density, activity, skewness, kurtosis, spectral entropy, Higuchi's fractal dimension, Hjorth parameters, and detrended fuctuation analysis as features and achieved 81.16% accuracy using treebased feature selection and random forest classifer [56]. Uyulan et al. used EEG data collected from 46 depressed and 46 healthy participants aged between 20 and 51 years. Te HAM-D was used as the screening tool to identify depression among participants and Neuroscan/Scan LT was used to record EEG data. Using CNN with MobileNet architecture, they achieved 89.33% from the left hemisphere and 92.66% from the right hemisphere. Tey have also achieved 90.22% accuracy in the delta band using CNN with ResNet-50 architecture [57]. Avots et al. used EEG data collected from 20 participants (aged between 24 and 60 years) using the Cadwell Easy II EEG device. Te HAM-D was used for screening purposes. Tey extracted alpha power variability, relative band power, spectral asymmetry index, Lempel-Ziv complexity, Higuchi fractal dimension, and detrended fuctuation analysis as features. Using ReliefF for feature selection, they achieved 95% accuracy with both KNN and decision tree classifer [58]. Lei et al. worked with EEG data collected from 101 depressed participants, 82 participants with bipolar disorder, and 81 healthy participants using the Brain Products GmbH device. Using the CNN classifer, they achieved 96.88% accuracy with depressed vs. healthy and 97.3% accuracy with bipolar vs. healthy [59]. Zhao et al. used EEG data collected from 40 depressed participants and 38 healthy participants (aged 18.72 ± 0.36) using a device from Neuroscan. Te BDI-II was used for screening depression. Microstate and Omega complexity features were extracted and using SVM and they achieved 76% accuracy [60]. Liu et al. worked with 2 datasets. Te frst dataset contains EEG data collected from 24 depressed and 29 healthy participants using HydroCel Geodesic Sensor Net (HCGSN) and the second dataset contains data collected from 16 depressed and 16 healthy participants. For the frst dataset, PHQ9 was used for screening, and for the second dataset, BDI was used for screening. Tey achieved 89.63% accuracy from the frst dataset and 88.56% accuracy from the second dataset using CNN classifer with gated recurrent unit (GRU) [61]. Nassibi et al. worked with EEG data collected from 42 depressed (aged 18.64 ± 1.12) and 42 healthy (aged 19.04 ± 1.16) participants using Neuroscan Synamps2. Te screening was performed using BDI-II. Tey extracted band power, relative band power, maximum Power spectral density, power spectral density, median frequency, relative median, mean frequency, Shannon entropy, Hjorth parameters, rootmean-square, kurtosis, skewness, variance, and singular value as features and neighborhood component analysis (NCA) for feature selection. Using the Naïve Bayes classifer, they achieved 91.8% accuracy [62]. Seal et al. used EEG data collected from 46 depressed and 46 healthy participants (aged between 20 and 51 years) using EEG Traveler Braintech 32+ CMEEG-01. Te PHQ9 was used for screening. Tey extracted band power, mean, median, mode, mean cube, standard deviation, frst diference, normalized frst diference, second diference, normalized second diference, mobility, Pearson's coefcient of skewness, Shannon entropy, Alpha asymmetry 1, and Alpha asymmetry 2 as features and ANOVA test and correlation analysis for feature selection. Tey achieved 87% accuracy using the XGBoost classifer [63]. 4 Computational Intelligence and Neuroscience

Contribution and Objectives.
We can observe that previously multiple research work has been performed using both machine learning and deep learning algorithms. We have noticed that over the years most of the datasets that have been created had participants of all ages. No dataset was created using young adults (aged 18-25 years). And if we investigate the depression screening tools, we can see that there is a hand full of work which has been performed with PHQ9 which is a very good self-administrable screening method for depression. In previous works, the EEG recording devices that have been used were bulky, wired, and not easy to use, which may not serve as an ideal alternative for depression diagnosis from traditional methods. EEGbased depression detection should provide a better and more accurate diagnosis and also should be easy to use, so that the patients who are not willing to face questioner/interview base diagnosis can have a better, more reliable, and easy solution.
We want to create an EEG-based depression detection technique that will be reliable and easy to use. For this, we want to target young adults aged between 18 and 25 years (University students). Use the PHQ9 screening tool for a depression diagnosis. Use the Emotiv EPOC+ EEG headset which is wireless and easy to use for data recording. Create a machine learning model which will be reliable and will give the best results.

Materials and Method
In this section, we will discuss the proposed method. Te proposed method can be divided into a few parts. Te fowcharts of the proposed method are shown in the following fgures from Figures 1-9.

Dataset Acquisition.
Te target of this work is to identify depression among young adults. Initially, there was a survey to identify depressed and depression control participants aged between 18 and 25 years. Over 500 students participated in the initial survey. All the participants were university students from Independent University, Bangladesh (IUB). Te Patient Health Questionnaire (PHQ9) was used in the initial survey to select depressed and depression control individuals. After the survey, 82 participants were selected to record EEG data. But only 64 participants were willing to give consent for EEG recording. Te recording was performed a few weeks after the survey. All participants signed consent forms for the recording. For labeling purposes, the participants were asked to fll up the PHQ9 questionnaires before the EEG recording. During the recording, participants were sitting in a comfortable chair and were asked to sit still and to keep their eyes closed during the EEG recording. Figure 1 shows the fow chart for data acquisition. Te recordings were 5 min long for each participant. Figure 2 shows the timeline of each recording session. Tis shows that a total of 20 min (Approx.) was required for every session for each participant. Depending on the PHQ9 score 32 participants (16 males and 16 females, aged between 18 and 25 years) recordings were chosen for this work.
Among the 32 participants, 19 were identifed as depressed (age 21.6 ± 1.98) and 13 were identifed as depression control participants (age 21.3 ± 2.06). Here, we have observed that among the depressed group, 74% were female participants. Tis also supports the WHO report that depression is more common among females than males [1].
Psychiatrists or mental health care centers typically favor the PHQ9-based depression screening approach. Te entire exercise takes between two and fve minutes to complete. Table 1 displays the severity measuring score and its accompanying labels.
For this study, we have only considered the participants who scored between 20 and 27 were selected as the depressed group, and the participants who scored between 0 and 4 were selected as the control group. Te information on the participants is given in Table 2.
Tere are 14 EEG signals (From each channel) collected from each participant. 266 EEG signals from depressed participants (19 Participants × 14 Channels) and 182 EEG signals from depression control participants (13 Participants × 14 Channels). Here, a total of 448 EEG signals have been recorded for this work. Figure 4 shows EEG data collected from depression control and depressed participants. Figure 5 shows the brain maps of the depression control and depressed participants at 2 Hz, 6 Hz, 10 Hz, 22 Hz, and 40 Hz. Each of the frequency points represents a single point of diferent sub-bands (Delta, Teta, Alpha, Beta, and Gamma). And a signifcant diference can be observed from the Brain Maps.

Segmentation.
From each subject, 5 min of data was recorded. So, we collected 32 samples which were not enough for machine learning. Terefore, segmentation was required. Segmentation helps create smaller samples from Computational Intelligence and Neuroscience 5 a big sample. We carefully divided the raw data of each participant into multiple nonoverlapping data with their corresponding channels. Each 5 min data was segmented into 1, 2, 3, 4, and 5 seconds creating 5 datasets with diferent sample lengths. Each segment contains an EEG signal from 14 channels. Table 3 contains the total number of samples and the group-wise number of the sample after dividing the raw data into diferent segments. If we consider P equal to the total number of samples collected from participants, i represents the number of participants, T is the total number of samples in each recording, S is the segment length, C is the number of channels, and N is the number of samples after segmentation, then where, i � 1, 2, . . . 32.
(2)  Figure 3: Channel locations, Emotiv headset, and channels at diferent regions of the brain.  Computational Intelligence and Neuroscience

Control Participants Raw Data
We have a total of 5 min data equivalent to 38400 samples (sampling 128 Hz); now if we follow equation (2), we can calculate the total number of samples we can get from each recording for desired segment length. Now if we chose 5 sec (640 samples) as segment length, then we will get 60 samples after segmentation. So, from each participant's 5 mins data of 38400 × 14 samples, we will be equal to 60 × 640 × 14 samples for choosing a 5-sec segment length by following equations (1) and (2).

Preprocessing.
We applied IIR Butterworth flters to all the channels for preprocessing the data. To fnd the optimal frequency band for the best result, we have fltered the raw data into multiple sub-bands. Te signal-processing steps are shown in Figure 6 Figure 7 shows the fltered data.

Feature Extraction.
Depending on the characteristics of EEG signals, linear or nonlinear features can be extracted. For this study, we have extracted features such as variance, Hjorth activity, Hjorth mobility, Hjorth complexity, kurtosis, skewness, Shannon entropy, and Log energy entropy to create feature matrixes (shown in Figure 8). Te features were applied to the classifer individually and combined to get the best possible features.

Variance.
Basically, variance is a statistical term that describes how data are distributed relative to their mean or expected value. It is the only kind of probability distribution that accounts for the degree of dispersion of a set of data.     (i) Hjorth activity: Tis represents the variance of a time function as well as the signal power. Te activity provides a measurement of the squared standard deviation of the time domain signal's amplitude. Tis may represent the frequency domain power spectrum's surface. Te activity will be depicted by the following equation, where x is a signal and t is the time: Hjorth Activity � Var(x(t)).
(ii) Hjorth mobility: Te mobility parameter represents the power spectrum's mean frequency or standard deviation as a percentage. Equation (4) represents Hjorth mobility, where it is determined by dividing the square root of the variance of the signal's x (t) frst derivative by the variance of the signal x (t).
(iii) Hjorth complexity: Complexity provides an estimate of the signal's bandwidth and shows how a signal resembles a pure sine wave in terms of shape. Equation (5) represents Hjorth mobility. It is described as the proportion of the time derivatives of the mobility of the signal x to the mobility signal x at time t.

Entropy.
A random process's degree of uncertainty can be gauged using entropy. It represents the signal's unpredictability. Without failure, rolling element equipment often produces a more random signal, but machines with failure typically produce a more predictable signal. Entropy is thought to be a powerful characteristic for identifying emotions in EEG signals. In this study, Shannon entropy and Log energy entropy were used.
(i) Shannon entropy: Shannon Entropy is a metric for the unpredictability of a random variable and a random signal. Te uncertainty and randomness increase with increasing entropy. It can be represented by the following equation: where p(x i ) is the probability of i number sample of the signal x and N denotes the length of the signal.
(ii) Log energy entropy: Log energy (LogEn) entropy is related to the energy of the signal. It is similar to wavelet entropy, but only uses the summation of logarithms of the probabilities. It is used to analyze the EEG signal's complexity. It can be defned by the following equation.
Here, x i is the i number sample of the signal and N denotes the length of the signal.

Kurtosis.
Kurtosis can measure the peakedness of an EEG signal. When the signal has a normal distribution, the kurtosis will be three and when the signal will not have normal distribution the kurtosis will be higher than three (for heavier peak) or less than three (for lighter peak). If the signal is x, the mean of the signal is x, and length is N, then kurtosis can be defned by the following equation:

Skewness.
Skewness is a metric for a distribution's asymmetry. When the left and right sides of a distribution are not mirrored, the distribution is said to be asymmetrical.
Here, x i is the i th number sample of the signal, x is the mean of the signal, and N denotes the length of the signal.

Classifcation.
For classifcation, we used support vector machine (SVM) algorithms and K-nearest neighbor (KNN) algorithms (shown in Figure 9). Linear, quadratic, cubic, and Gaussian radial basis are the kernels we used for SVM, and fne KNN, medium KNN, coarse KNN, cosine KNN, cubic KNN, and weighted KNN are the diferent types of KNN classifers we used to identify the best option for the project. Te description of the classifers is provided in Table 4. It is difcult to know which classifer will give the best outcome as not all data are the same. Our dataset is new and recorded using an Emotiv EPOC+ headset, unlike other depression-related datasets. So, we decided to apply multiple classifers to fnd out which classifer will be the best for our dataset.

Experiments.
We conducted several experiments for this research work. First, we extracted all the features from the full band (0.5-64 Hz) of diferent sample lengths and fetched the features to the classifers separately and combined them to fnd the best features and sample length. After selecting the sample length and the features, we extracted those features from diferent frequency bands and fetched them to the classifers with 25 iterations to select the frequency range and the best classifer for this work. We used fve-fold crossvalidation for the validation. Cross-validation is very important as it tests the performance of a machine learning algorithm to classify new data and prevents problems such as overftting.
After that, we extracted features from diferent channels located at the diferent regions of the brain (left hemisphere, right hemisphere, frontal lobe, parietal lobe, temporal lobe, and occipital lobe) from the selected frequency range and classifed the data using the selected classifer to determine which channels give the best outcome. For this part, we used 70 percent of the data for training, for validation we used fve-fold cross-validation on 70 percent of the data, and the remaining 30 percent of the data for classifcation (shown in Figure 10). We used 10 iterations for all classifcations and each time the training testing data were selected randomly.

Performance Evaluation.
To evaluate the performance of the proposed experiments we considered several performance parameters. We have considered accuracy, precision, negative predictive value (NPV), sensitivity, specifcity, and F1 score.

Confusion Matrix.
In the feld of machine learning algorithms, a confusion matrix is a matrix or table that helps summarize and visualize the performance of a classifcation algorithm. It is an n-by-n matrix where we can see the true and false predictions of a classifcation algorithm. We can get the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) from the confusion matrix (FN). Here, TP stands for the correct forecasts of the positive class, TN for the accurate predictions of the negative classes, FP for the positive class's incorrect predictions, and FN for the wrong predictions of the negative classes (shown in Figure 11).

Accuracy.
It gives us an idea that how many times the classifcation algorithm was able to predict correctly. It can be calculated using the following equation:

Precision.
Precision is also known as the positive predictive value (PPV). It is a performance parameter of an ML algorithm that tells us the performance of a positive prediction made by the algorithm. Te precision is calculated by equation (11). Here, the total number of true positives is divided by the sum of true positives and false positives.   (12). Here, the total number of true negatives is divided by the total number of negative predictions.
2.7.5. Sensitivity. It is also known as recall or true positive rate (TPR). When the classifcation is binary, the sensitivity is known as recall. It represents how well a classifcation algorithm can predict truly positive cases. It is calculated by dividing the total number of true positives by the sum of all true positives and false negatives, as follows: 2.7.6. Specifcity. It is also known as selectivity or true negative rate (TNR). It represents how well a classifcation algorithm can predict truly negative cases. It is calculated as the total number of true positives divided by the total number of true positives and false negatives (shown in equation (14)).
2.7.7. F1 Score. It is a measure of the classifcation accuracy of a binary classifer. By utilizing equation (15), F1 score is calculated. Te harmonic mean of sensitivity and specifcity is the F1 score.

Results and Discussion
First, we fltered the raw data at 0.5-64 Hz (full-band) from the diferent segments (1-5 sec data). Ten, we extracted the features to create the feature matrix. Te feature matrix then was fetched to the classifers for classifcation. In this part, we used a 5-fold CV. Te fndings from diferent feature matrices can be seen in " Figures 12-20." Here we can observe that the individual fgures represent the accuracy level of diferent feature combinations from diferent classifers at diferent sample lengths. Figures 12 and 13 are showing results for skewness and kurtosis features. Both features have poor results at diferent sample lengths and with all the classifers. But Hjorth parameters (Figure 14), variance (Figure 15), and entropy ( Figure 16) show better results. And Hjorth parameters gave the best accuracy of 94% with Fine KNN classifer and 5-sec sample length. Now, we combined the Hjorth parameters and entropy (Figure 17), variance and entropy (Figure 18), skewness and kurtosis (Figure 19), and all the features except variance ( Figure 20) and observe the accuracy at diferent sample lengths using the classifers. We did not include the variance when we combined all the features (Figure 20) because the variance is the same as the frst Hjorth parameter (Hjorth activity). After investigating all the results, we can see in Figure 17 that the Hjorth parameters and entropy gives the best results among all other combination of features. Here, the highest accuracy is 96.5% using the 5-sec sample length and with quadratic SVM, cubic SVM, and fne KNN classifers. From all these fndings, we can decide that by extracting the Hjorth parameters (activity, mobility, and complexity) and the entropies (Shannon and Log energy) from 5-sec segments, we can achieve the highest accuracy To further analyze, we extracted the Hjorth parameters and the entropy features from diferent frequency sub-bands (delta, theta, alpha, beta, and gamma) with 5-sec sample lengths from all channels. Ten, we classifed the features using all the classifers with a 5-fold CV and 25 iterations.
We calculated the average accuracy and the standard deviation to identify the best classifer and frequency band. From Figure 21, we can observe that the Beta band gives better classifcation accuracy than the other frequency bands. Table 5 shows the average accuracy and the standard deviation results. From there we can observe that the beta band with Cubic SVM 97.22 ± 0.21 accuracy and with weighted KNN gives 97.213 ± 0.18 accuracy. Here, weighted KNN is best as it has a lower standard deviation than Cubic SVM although cubic gives 0.01% higher accuracy. Te features of the Beta band give the best accuracy because this Computational Intelligence and Neuroscience band indicates logical thinking and thoughts, and it allows us to focus. A depressed and depression control person will have diferent thoughts and will have diferent levels of focus. And therefore, features from the Beta band perform better for depression classifcation. After this, we wanted to see if fltering the signal at any other frequency range will improve classifcation performance or not. So, we fltered the data at diferent frequencies in a way that the frequency range includes multiple bands. After that, we extracted the features (Hjorth and entropy) from all the fltered data and classifed them using all the classifers. From Figure 22, we can observe the accuracy of the classifers. Here, we can see that ABDT gives lower accuracy than the others. So, we can exclude that frequency range. But this fgure cannot give us a clear picture to choose the best classifer and frequency band.
If we observe the average accuracy and the standard deviation from Table 6, we can see all the bands give better results with fne KNN. Here, ABT, AB, ABTG, and ABG gave accuracy higher than 98% with lower standard deviation. From this experiment, we can decide that fne KNN will perform better with ABT, AB, ABTG, and ABG.
To check the performance and reliability of ABT, AB, ABTG, and ABG bands with the fne KNN classifer, we measured accuracy, precision, NPV, sensitivity, specifcity, and F1 score. For this part, we divided the dataset into a 70/30 ratio for training and testing and we applied a 5-fold CV for training with 10 iterations. Figure 23 shows the training, testing, and overall average accuracy of the 4 bands. Here, we can observe that ABG gives better training accuracy and AB gives better testing accuracy. And overall ABG accuracy is the highest. If we observe Table 7, we can see that in terms of overall accuracy, AB has a lower standard deviation (98.10 ± 0.11) and ABG has better accuracy but a high standard deviation (98.20 ± 0.30). Plus, the AB band with fne KNN gives better NPV (0.977 ± 0.002), sensitivity (0.984 ± 0.002), and F1 score (0.984 ± 0.001). And other parameters such as precision (0.984 ± 0.003) and specifcity (0.976 ± 0.005) are also satisfactory.
Finally, we can decide that by segmenting the dataset into 5-sec epochs, then fltering the data from 8 to 30 Hz (AB) frequency, extracting the Hjorth parameters (activity, mobility, and complexity) and entropy (Shannon entropy and Log energy entropy) and using fne KNN algorithm, we can create a classifer model that will give the highest accuracy for the dataset we created.
So far, we have used features extracted from all channels (whole brain) to train machine-learning models. Furthermore, we explored diferent regions of the brain. For this, we created feature matrixes by extracting features from the Computational Intelligence and Neuroscience channels located on the left side of the brain (left hemispheric data), from the channels located on the right side of the brain (left hemispheric data), from the channels located on the frontal lobe of the brain (frontal lobe data), from the channels located on the temporal lobe of the brain (temporal lobe data), from the channels located on the parietal lobe of the brain (parietal lobe data), and from the channels located on the occipital lobe of the brain (occipital lobe data). Ten, we compared the classifer performance using those data with data from the whole brain (all channels). Figure 24 shows the training testing and overall accuracies of the brain regions. And it is clear from the bar chart that the whole brain gives better accuracy compared to other regions of the brain. From the chart, we can also observe that the left   Figure 21: Classifcation accuracy using diferent sub-bands.   Figure 22: Classifcation accuracy using diferent combinations of sub-bands.
Computational Intelligence and Neuroscience  Figure 23: Training, testing, and overall accuracy of diferent combination of bands using fne KNN classifer.   Figure 24: Training, testing, and overall accuracy of diferent brain regions using fne KNN classifer.  In Table 8, we can also observe that the whole brain gives better results in terms of accuracy, precision, NPV, sensitivity, specifcity, and F1 score. So, using whole brain region (all channels data) for depression detection is the best approach for our work.
We can compare our work with the existing work and see the signifcance of our research. Table 9 shows the existing research that has been conducted with state-of-the-art methods in recent years with our work.

Conclusion
In this work, we have recorded EEG data of young adults (19 depressed and 13 Control) evaluated by the PHQ9 screening tool and proposed a machine learning approach to learn about the EEG properties for depression detection.
We conducted multiple experiments with the reported machine learning (SVM and KNN) classifers with our recorded data. Te frst experiment we conducted was on segmentation to fnd the better sample length suitable for ML. From our experiments, we have identifed that 5-second segments are suitable for our work. Ten, we have identifed a suitable frequency range from various experiments that improve performance using features that are related to depression detection. We have found out that Hjorth parameters along with Shannon entropy and long energy entropy provide better results among other reported features and the beta band (12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) gives the highest accuracy of 97.21 ± 0.21% with 25 iterations and 5-fold CV using weighted KNN compared to the other sub-bands. By combining the sub-bands, we have also investigated some other frequency ranges. We have found out that by taking the range from alpha to beta 8-30 Hz (AB), we can improve ML performance and achieve 98.43 ± 0.15% accuracy with 25 iterations and a 5-fold CV with fne KNN classifer. Using AB (8-30 Hz), we can see a signifcant improvement of 1.22% accuracy and slandered deviation. To further investigate the reliability, we divided the dataset 70/30 for training and testing with 5-fold CV and 10 iterations. In this experiment, we have found out that the ML performance is better by choosing the AB (8-30 Hz) band with fne KNN classifer with an accuracy of 98.10 ± 0.11%, precision of 0.984 ± 0.003, NPV of 0.977 ± 0.002, sensitivity of 0.984 ± 0.002, specifcity of 0.976 ± 0.005, and F1 score of 0.984 ± 0.001. Ten, we analyzed the ML performance in diferent regions of the brain and concluded that using the whole brain for depression detection will give the highest accuracy. Te proposed method can detect depression among young adults with minimum requirements compared to other related works.
Our proposed work can aid as compliment to the traditional screening tool-based depression diagnosis. Tis method will be able to help in treatments by cross checking the condition before and after the treatment. Wired EEG headsets are expensive, bulky, and are inconvenient to use but a wireless EEG headset is less expensive and easy to use. Using wireless EEG headset, multiple setups can be arranged which will require less manpower and can automatically screen depression among young adults.

Limitations and Future Work.
We have faced a few limitations/challenges during this work. In this work, we have focused on very selective study population for example the subjects are young adults who are private university students from Bangladeshi urban culture belonging to a specifc socioeconomical status. So, the study result may be diferent if the study population belong to diferent age group or public university going student or has diferent socioeconomical status. During EEG recording, there were participants with thick or long hair as a result it was difcult for them to put on the headset for better connectivity. Few participants were unable to sit still during EEG recording which created artifacts, so we had to re-record their data. Te dataset was imbalanced as the number of control and depressed participants were not equal. We were able to conduct only one session of the EEG recording and because of that we were unable to analyze in diferent period. So, in the future, we will record the EEG data from the participants at diferent periods to monitor and analyze the changes. In our next work, we will explore the efects of artifact removal with our dataset to improve the quality of the recorded signal. In the future, we will explore more feature extraction and feature selection methods to improve the performance of ML algorithms. We will also analyze channel selection methods to identify the best combination of channels to improve our fndings. We will explore a deep-learning models with our dataset. We will also investigate other mental health issues such as anxiety and stress for screening using EEG and machine learning.

Data Availability
Te recorded EEG data are available from the corresponding author upon request.

Conflicts of Interest
Te authors declare that they have no conficts of interest.