Distinguishing Parkinson’s Disease with GLCM Features from the Hankelization of EEG Signals

This study proposes a novel method that uses electroencephalography (EEG) signals to classify Parkinson’s Disease (PD) and demographically matched healthy control groups. The method utilizes the reduced beta activity and amplitude decrease in EEG signals that are associated with PD. The study involved 61 PD patients and 61 demographically matched controls groups, and EEG signals were recorded in various conditions (eyes closed, eyes open, eyes both open and closed, on-drug, off-drug) from three publicly available EEG data sources (New Mexico, Iowa, and Turku). The preprocessed EEG signals were classified using features obtained from gray-level co-occurrence matrix (GLCM) features through the Hankelization of EEG signals. The performance of classifiers with these novel features was evaluated using extensive cross-validations (CV) and leave-one-out cross-validation (LOOCV) schemes. This method under 10 × 10 fold CV, the method was able to differentiate PD groups from healthy control groups using a support vector machine (SVM) with an accuracy of 92.4 ± 0.01, 85.7 ± 0.02, and 77.1 ± 0.06 for New Mexico, Iowa, and Turku datasets, respectively. After a head-to-head comparison with state-of-the-art methods, this study showed an increase in the classification of PD and controls.


Introduction
More than 200 years have passed since James Parkinson's seminal article was published and, during that time, understanding of this disease has grown substantially [1]. Parkinson's Disease (PD) is the second most common neurodegenerative disorder after Alzheimer's, with an increasing prevalence with age, affecting approximately 2-3% of people over 65 years of age [2,3]. According to the World Health Organization (WHO) [4] and the Parkinson's Foundation [5], the number of PD has doubled in the last 25 years, with over 10 million individuals with PD globally.
PD is a degenerative process that affects primarily the substantia nigra which is a structure of the basal ganglia and other brainstem-pigmented neurons. The neurons afflicted with PD are unable to release dopamine which is a messenger substance between the brain and the rest of the body. As a result, both motor and non-motor symptoms develop, including mild tremor, changes in posture, walking, and facial expressions. Over time, these symptoms progress to loss of balance and slowness of movements, frequent falls, stiffness, hallucinations and delusions, mood and sleep disturbances, and cognitive failures [6]. The main clinical manifestations of PD are resting tremor, bradykinesia, rigidity, and postural reflex dysfunction [7].
The accuracy of clinical diagnoses, especially in the early stages of PD, has not significantly improved for about 25 years, and the average accuracy of 20 studies is around 80.6%. Correct diagnosis of PD is an essential requirement for patient counseling and therapy management [8]. Currently, the diagnosis of PD relies on the medical observation and clinical

Datasets
In this study, three publicly available datasets were used to classify PD. Electrode positions were obtained in the standard 10-20 EEG layout and from 64 channels with sintered Ag/AgCl electrodes at a 500 Hz sampling frequency for all datasets.
The first data source is from the University of New Mexico [15,23]. Hereinafter, this data source will be named UNM. UNM contains records from a total of 54 patients, comprising 27 PD patients and 27 control groups. There are two session records for each patient in the PD group, taken 7 days apart. These were recorded while on the drug and 12 h after the last dose of dopaminergic medication for all individuals; thereby, there are two types of records: eyes closed for 1 min and eyes open for 1 min.
The second data source is from the University of Iowa [15,23]. Hereinafter, this data source will be named UI. UI contains records from a total of 28 patients, comprising 14 PD patients and 14 control groups. These records were collected only with eyes open and on-drug.
The third data source is from the University of Turku [24]. Hereinafter, this data source will be named UT. UT contains records from a total of 40 patients, comprising 20 PD patients and 20 control groups. These records were collected in both eyes-open and eyes-closed sessions while on-drug.
Based on these three sources of records, seven different variants were created. Detailed information is provided in Table 1. In all 3 data sources, the PD Control group was demographically matched with the PD group in terms of age and gender. Subjects did not differ in any measurements of education or premorbid intelligence. Detailed information about data sources is given in Table 2.

Preprocessing
First, all the records were organized to facilitate processing in the MATLAB 2022a (MathWorks, Inc., Natick, MA, USA) environment and to make them easily applicable for analysis. The organized records were preprocessed using EEGLAB in MATLAB [25]. Only channels that were common to all records, a total of 57 channels, were used for analysis. Other channels were excluded. The common channel information can be found in the Supplementary Materials, Figure S1 for the layout of EEG electrodes and Table S1 for channel names. All the records were filtered with a 1-50 Hz band pass filter. Afterward, the channel averages were re-referenced to the records. To remove noises from the records such as muscle, eye, heart, and line noise, among others, that fall between 0.9-1 among the signals, independent component analysis (ICA) [26] was performed. This process left only components related to brain activity in the records. As an example, Figure 1 shows the differences between the preprocessed and not preprocessed EEG records, with baseline shift, sudden and large peaks, and non-normalized structure. The preprocessing operations eliminated all these issues.

Preprocessing
First, all the records were organized to facilitate processing in the MATLAB 2022a (MathWorks, Inc., Natick, MA, USA) environment and to make them easily applicable for analysis. The organized records were preprocessed using EEGLAB in MATLAB [25]. Only channels that were common to all records, a total of 57 channels, were used for analysis. Other channels were excluded. The common channel information can be found in the Supplementary Materials, Figure S1 for the layout of EEG electrodes and Table S1 for channel names. All the records were filtered with a 1-50 Hz band pass filter. Afterward, the channel averages were re-referenced to the records. To remove noises from the records such as muscle, eye, heart, and line noise, among others, that fall between 0.9-1 among the signals, independent component analysis (ICA) [26] was performed. This process left only components related to brain activity in the records. As an example, Figure 1 shows the differences between the preprocessed and not preprocessed EEG records, with baseline shift, sudden and large peaks, and non-normalized structure. The preprocessing operations eliminated all these issues.

Feature Extraction and Selection
In the feature extraction process, the channels in the records of each subject are segmented into a total of 50 windows determined experimentally. Afterward, segmented data in the windows was transformed into a picture by making it 2D with a Hankel matrix as a novel approach. Hankel matrix is created by MATLAB's Hankel function. In this function, a Hankel matrix of a vector signal such as an EEG record is defined as, firstly, a segmented EEG signal (X1) flipped left to right (X2). The first column of a Hankel matrix is X1 and the following columns are produced by shifting X1 one position at a time to the left, and in doing this the upper triangular matrix is formed. The last row of a Hankel matrix is X2 and the following rows are produced, shifting X2 one position at a time to the right and in doing this the lower triangular matrix is formed. Details are given in Figure  2.

Feature Extraction and Selection
In the feature extraction process, the channels in the records of each subject are segmented into a total of 50 windows determined experimentally. Afterward, segmented data in the windows was transformed into a picture by making it 2D with a Hankel matrix as a novel approach. Hankel matrix is created by MATLAB's Hankel function. In this function, a Hankel matrix of a vector signal such as an EEG record is defined as, firstly, a segmented EEG signal (X1) flipped left to right (X2). The first column of a Hankel matrix is X1 and the following columns are produced by shifting X1 one position at a time to the left, and in doing this the upper triangular matrix is formed. The last row of a Hankel matrix is X2 and the following rows are produced, shifting X2 one position at a time to the right and in doing this the lower triangular matrix is formed. Details are given in Figure 2.
Using this Hankel matrix, GLCM was created by MATLAB's graycomatrix function. This function: calculates how often the gray-level intensity of the pixel of interest in an image occurs with the gray-level intensity of the specified neighboring pixel. First, the Hankel matrix's number of gray levels is scaled to 8 levels with MATLAB's default settings. GLCM matrix is created by calculating over the scaled matrix [21,27]. Details are given in Figure 3.
Haralick texture features were extracted from each segment using the GLCM and a total of 19 features were obtained for each segment [21,[28][29][30]. Brynolfsson's MATLAB code was used to obtain these features [31] and notations for computing Haralick texture features and GLCM features are provided in Supplementary Materials in Tables S2 and S3. Using this Hankel matrix, GLCM was created by MATLAB's graycomatrix function. This function: calculates how often the gray-level intensity of the pixel of interest in an image occurs with the gray-level intensity of the specified neighboring pixel. First, the Hankel matrix's number of gray levels is scaled to 8 levels with MATLAB's default settings. GLCM matrix is created by calculating over the scaled matrix [21,27]. Details are given in Figure 3. Haralick texture features were extracted from each segment using the GLCM and a total of 19 features were obtained for each segment [21,[28][29][30]. Brynolfsson's MATLAB code was used to obtain these features [31] and notations for computing Haralick texture features and GLCM features are provided in Supplementary Materials in Tables S2 and  S3.
The features obtained from each channel's segment were converted into a vector. Here, for each channel, a total of 50 windows and 19 features were extracted with a length of 50 × 19, which is a total of 950 features. Each EEG recording of each subject consisted of 57 channels, resulting in a total of 950 features, and 57 channels were extracted with a length of 57 × 950, which is a total of 54,150 features for each subject.
To reduce the number of features in the analysis, a feature selection method was applied using the Chi-Square (chi2) method [32,33], which uses a statistical moment to select features that make sense for a particular model. The chi2 test was used to assess the relative variances of two distributions and determine which features depend on the output class label the most. The chi2 value for each feature in the dataset was calculated, and all features were sorted using their chi2 values in decreasing order. The first 100, 250, 500, 1000, 2500, and 5000 features with the highest scores from the chi2 method were empirically selected for the classification process.  2  5  4  3  1  3  3  1  3  4  5  2  1  8   8  1  2  5  4  3  1  3  1  2  5  4  3  1  3  1  2  5  4  3  1  3  1  3  5  4  3  1  3  1  3  4  4  3  1  3  1  3  4  5   Using this Hankel matrix, GLCM was created by MATLAB's graycomatrix function. This function: calculates how often the gray-level intensity of the pixel of interest in an image occurs with the gray-level intensity of the specified neighboring pixel. First, the Hankel matrix's number of gray levels is scaled to 8 levels with MATLAB's default settings. GLCM matrix is created by calculating over the scaled matrix [21,27]. Details are given in Figure 3. Haralick texture features were extracted from each segment using the GLCM and a total of 19 features were obtained for each segment [21,[28][29][30]. Brynolfsson's MATLAB code was used to obtain these features [31] and notations for computing Haralick texture features and GLCM features are provided in Supplementary Materials in Tables S2 and  S3.
The features obtained from each channel's segment were converted into a vector. Here, for each channel, a total of 50 windows and 19 features were extracted with a length of 50 × 19, which is a total of 950 features. Each EEG recording of each subject consisted of 57 channels, resulting in a total of 950 features, and 57 channels were extracted with a length of 57 × 950, which is a total of 54,150 features for each subject.
To reduce the number of features in the analysis, a feature selection method was applied using the Chi-Square (chi2) method [32,33], which uses a statistical moment to select features that make sense for a particular model. The chi2 test was used to assess the relative variances of two distributions and determine which features depend on the output class label the most. The chi2 value for each feature in the dataset was calculated, and all features were sorted using their chi2 values in decreasing order. The first 100, 250, 500, 1000, 2500, and 5000 features with the highest scores from the chi2 method were empirically selected for the classification process.  2  5  4  3  1  3  3  1  3  4  5  2  1  8   8  1  2  5  4  3  1  3  1  2  5  4  3  1  3  1  2  5  4  3  1  3  1  3  5  4  3  1  3  1  3  4  4  3  1  3  1  3  4  5  3  1  3  1  3  4  5  2  1  3  1  3  4  5  2  1  3  1  3  4  5  2  The features obtained from each channel's segment were converted into a vector. Here, for each channel, a total of 50 windows and 19 features were extracted with a length of 50 × 19, which is a total of 950 features. Each EEG recording of each subject consisted of 57 channels, resulting in a total of 950 features, and 57 channels were extracted with a length of 57 × 950, which is a total of 54,150 features for each subject.
To reduce the number of features in the analysis, a feature selection method was applied using the Chi-Square (chi2) method [32,33], which uses a statistical moment to select features that make sense for a particular model. The chi2 test was used to assess the relative variances of two distributions and determine which features depend on the output class label the most. The chi2 value for each feature in the dataset was calculated, and all features were sorted using their chi2 values in decreasing order. The first 100, 250, 500, 1000, 2500, and 5000 features with the highest scores from the chi2 method were empirically selected for the classification process.

Classification
By using the extracted and selected features, 10 × 5-fold and 10 × 10-fold crossvalidation classification was performed for feedforward network (FF), support vector machine (SVM), and k-nearest neighbor (KNN) classifiers in MATLAB default settings.

Feed-Forward Network
Many classification problems are not linearly separable. We can separate the classes in such nonlinear problems by introducing more hyperplanes, or threshold units. This is usually done by adding an extra layer of threshold units, each of which does a partial classification of the input and sends its output to a final layer, which assembles partial classifications into a final classification. Such a network is called a multi-layer perceptron or feed-forward network (FF) [34].

Support Vector Machine
Support vector machine (SVM) is a machine learning algorithm (MLA) that minimizes risk by offering different solutions to a number of linear and nonlinear problems. It is used to solve binary classification problems and can also be applied to multiclass classification problems. SVM is divided into two groups: linear SVM and nonlinear SVM, according to the state of the data. SVM is an effective learning algorithm in complex datasets, identifying patterns that are difficult to analyze [35].

K-Nearest Neighbor
K-nearest neighbor is a classification algorithm that uses learning data that is the closest distance or most common object characteristics to classify objects. There are several steps involved in calculating K-NN, including determining the value of K and calculating Euclidean distance for each object from training data. Arranges objects into groups whose smallest Euclidean distance is determined by finding the minimum value of these distances among all objects in the group [36].
The default settings of these classifiers used in this study are given in Table 3. The aim of this study is to determine which classifier provides the best results among others. Additionally, the success of the classifier was examined by using the leave-one-out cross-validation (LOOCV) method for each dataset.

Cross-Validation
Next, 5-and 10-fold cross-validation was used to evaluate the performance of the mode, respectively. The process of this verification method is described as follows: first, the entire dataset will be randomly divided into 5 and 10 copies; then, a single subset is randomly selected and retained as validation data to test the model, while the remaining 4 and 9 are used as training data to train the prediction model. This process is carried out 5 and 10 times; that is, every piece of data will be used as test data. Finally, the 5 and 10 results are averaged to obtain the final prediction [37].

Leave One out Cross-Validation
The success of the classifiers was examined by this method. In this method, data are segmented based on the subjects: one subject for testing and the other remaining for training. This process is repeated until each subject has been used for the test [38].

Performance Parameters
To evaluate the performance of these models, the area under the receiver operating characteristics (ROC) curve (AUC), accuracy (ACC), sensitivity (SENS), specificity (SPEC), positive predictive value (PPV), and negative predictive values (NPV) were calculated.
AUC is a performance measure that quantifies how well a binary classifier can distinguish between two classes. A value of 1 indicates a perfect classifier, while a value of 0.5 indicates a completely random classifier. ACC measures the proportion of correctly classified instances among all instances. SENS measures the proportion of true positive instances among all actual positive instances. SPEC measures the proportion of true negative instances among all actual negative instances. PPV measures the proportion of true positive instances among all instances that the classifier predicted as positive. NPV measures the proportion of true negative instances among all instances that the classifier predicted as negative.

Results
This study proposed a method based on the Hankelization and GLCM features to distinguish the PD group from the demographically matched control group. The method was validated using seven variants of datasets (UNM_All, UNM_Closed, UNM_Open, UNM_Off, UI, UT_Closed, and UT_Open) from three different data sources (UNM, UI, and UT). In this study, different success rates were obtained in the classifications.
Overall, the SVM method with 10 × 10 fold cross-validation using 1000 features yielded the best classification performance among other classifiers. However, some datasets showed better results using other classification methods. The results obtained in the SVM method with 10 × 10 fold cross-validation using 1000 features are presented in Table 4. When Table 4 is examined, the best results for 10 × 10 fold CV SVM were 94.9% AUC from UNM_Open, 92.41% ACC from UNM_All, 94.29% SENS from UI, 91.85% SPEC from UNM_All, 91.96% PPV from UNM_All, and 92.96% NPV from UNM_All. For UNM data source UNM_All dataset results; since UI data source only has one variant, UI dataset results; for UT data source, UT_Closed dataset; yielded better results according to Table 4 in terms of accuracy. The classification performance results for the 10 × LOOCV SVM method used to validate the results are presented in Table 5. When Table 5 is examined the 10 × 10 fold CV SVM and 10 × LOOCV SVM methods had very similar results for all datasets, obtained with accuracies of 93.7% for UNM_All, 89.26% for UNM_Closed, 87.04% for UNM_Open, 83.33% for UNM_Off, 82.5% for UI, 76.92% for UT_Closed, and 59.75% for UT_Open. A comparison of the classification performance with previous state-of-the-art methods is presented in Table 6. In Table 6, the proposed method is given as the 10 × 10 LOOCV

Discussion
We have designed and developed a novel method for the classification of PD from healthy and demographically matched control groups using GLCM features obtained from the Hankelization of EEG signals, which involves projecting a one-dimensional (1D) signal into a two-dimensional (2D) picture. Our proposed method is capable of and can achieve high accuracies in PD detection for these datasets and offers several advantages compared to previous methods reported in the literature.
Firstly, we achieved an increase of approximately 10% in accuracy compared to previous methods. Some researchers have reported moderate accuracy results such as 88.51% by Shah [39], 69.2% by Sugden [43], 85.4% by Anjum et al. [15], 78% by Chaturverdi et al. [42], and 88.5% by Aljalal et al. [48]. Other researchers reported quite high accuracy results, such as 99.2% by Lee et al. [50], 94.1% by Sugden [43], 99.58% and 99.41% by Aljalal [48,51], 94.3% by Vannesta [45], and 99.62% by Yuvaraj [46]. However, some studies in the literature reported have had issues, such as data leakage from the training dataset to the test dataset, unbalanced classes, demographically unmatched groups, or artificially replicated EEG records, leading to very high accuracies. In our study, we have addressed these issues by splitting the train and test dataset without any data leakage, using short EEG records, and ensuring demographically matched groups, without artificially replicating EEG records, and still achieved quite high accuracies.
Secondly, while some previous studies [42,45,46] only performed one round of crossvalidation, we confirmed the classification of the data by performing 10 rounds of crossvalidation and LOOCV.
Thirdly, we did not tune the optimization parameters of the classifiers and instead left them at default settings. While tuning these parameters could potentially yield better results, problems such as overfitting could become an issue.
Fourthly, we observed that our method can achieve good results on both drug and off-drug sessions.
Finally, our feature extraction approach, which involves transforming the signal into a picture using Hankelization and obtaining Haralick-GLCM features from EEG signals, has not been used before in the literature and is, therefore, a very novel method. The gold-standard diagnosis of neurodegenerative diseases such as PD requires post-mortem assessment. Clinical vs. pathology-confirmed diagnoses are approximately 90% accurate and results are very close in terms of the accuracy reported here.
However, this method has some limitations that need to be addressed in future studies. Firstly, the method is time-consuming. Increasing the number of windows in the method reduces the number of data to be segmented. In this case, the number of extracted features is becoming quite high. Reducing the number of windows increases the amount of data to be segmented. In this case, the size of the segmented images obtained as a result of the Hankelization process of the EEG segment increases considerably. In both conditions, the processing load increases, and this causes an enhancement in the time taken for feature extraction. In our studies, we have obtained the best result in a feature number of 1000 and standard 50 windows for acceptable time experimentally. Instead of extracting a large number of features in the first place and then selecting features from these features, directly extracting a small number of features and giving these features to the classifier probably will give faster results.
Secondly, channel selection can be performed to reduce the number of features to be extracted, making the method work faster.
Finally, in future studies, separating the EEG signals into sub-bands instead of filtering the EEG signals globally between 1-50 Hz, may give better results for distinguishing PD and control groups. In conclusion, our proposed method showed promising results for the classification of PD from healthy and demographically matched control groups, and it has several advantages compared to previous methods with these datasets. Moreover, as an EEG-based classifier, it is low-cost, easily accessible, and widely applicable compared to other methods. Future work with a large sample will demonstrate the true generalizability of this method.

Supplementary Materials:
The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/diagnostics13101769/s1, Figure S1: Common Electrode Locations; Table S1: Common Channel Names List; Table S2: Notations to compute Haralick Texture Features. All performance parameters and evaluation results in this article are given in xls format; Table S3: GLCM Texture Features.