Autism Spectrum Disorder Diagnostic System Using HOS Bispectrum with EEG Signals

Autistic individuals often have difficulties expressing or controlling emotions and have poor eye contact, among other symptoms. The prevalence of autism is increasing globally, posing a need to address this concern. Current diagnostic systems have particular limitations; hence, some individuals go undiagnosed or the diagnosis is delayed. In this study, an effective autism diagnostic system using electroencephalogram (EEG) signals, which are generated from electrical activity in the brain, was developed and characterized. The pre-processed signals were converted to two-dimensional images using the higher-order spectra (HOS) bispectrum. Nonlinear features were extracted thereafter, and then reduced using locality sensitivity discriminant analysis (LSDA). Significant features were selected from the condensed feature set using Student’s t-test, and were then input to different classifiers. The probabilistic neural network (PNN) classifier achieved the highest accuracy of 98.70% with just five features. Ten-fold cross-validation was employed to evaluate the performance of the classifier. It was shown that the developed system can be useful as a decision support tool to assist healthcare professionals in diagnosing autism.


Introduction
A shortfall in social interaction and nonverbal communication emerging as early as the first three years of life is recognized as autism spectrum disorder (ASD). ASD is a multifactorial neurodevelopment disorder that stems from genetic or non-genetic factors [1]. The etiology of ASD includes genes such as EN2(Engrailed 2) [2], UBE3A (Ubiquitin protein ligase E3A) locus, GABA (Gamma-aminobutyric acid) system, and serotonin transporter [3], which have been found to be linked to cerebellar development. Some environmental factors such as low birth weight, unusually short gestation period, viral infections, hypoxia, harm by mercury, and maternal diabetes are believed to contribute to ASD in young children [4,5]. Poor eye contact; grappling with expressing, controlling, or understanding emotions; intensified focus on a single thing; delayed speech; and social withdrawal are some tell-tale signs of ASD [6]. About 1 in 160 children are diagnosed with ASD [6] and the prevalence has heightened in the past 20 years [7]. The possibility of female genes exhibiting particular protective effects against autistic impairments [8] may be suggestive of ASD affecting males primarily [9,10] as compared to females. At present, the gold standard for autism detection includes the assessment of behavioral, historical, and parent-report information by a team of experts. However, this process is long-winded [11]; hence, diagnosis at an early stage may be delayed. The breakthrough in neuroimaging modalities such as magnetic resonance imaging (MRI) has led to the discovery that the amygdala is a main part of the brain related to the onset of autism [12]. Howard et al. [13] reported the rise in bilateral amygdala volume as well as a decrease in hippocampal and para hippocampal and gyrus volumes in ASD patients in an MRI study. In a voxel-based whole-brain examination study, Abel et al. [14] reported an increase in left amygdala volume, as well as in the right inferior and middle temporal gyruses. However, these techniques exhibit some disadvantages. MRI scans are expensive as compared to computed tomography (CT) scans [15]. Yet, CT scans and positron emission tomography (PET) are prone to analysis error due to artifacts produced by head motion [16]. A cost-effective, computer-aided brain diagnostic system (CABDS) for the detection of ASD could be very beneficial for autism analysis. The electroencephalogram (EEG) record of brain activity provides useful information regarding state. Hence, EEG signals are commonly used to detect brain diseases such as depression [17], epilepsy [18], schizophrenia [19], autism [20,21], and Parkinson's disease [22].

Data Used
The instruments used to establish the pre-diagnosis criteria for ASD included the qualitative behavioral assessment by experts through internationally established descriptive standards, such as Childhood Autism Rating Scale (CARS), Autism Treatment Evaluation Checklist (ATEC), Psychoeducational Profile (PEP3), and Social Responsiveness Scale (SRS). Thereafter, EEG signals were acquired from 37 normal and 40 autistic children who ranged in age between 4 to 13 years. There were approximately 50% males and 50% females in each group. The autistic children were recruited from normal schools and centers of special education in Jordan. Informed consent was obtained from each parent prior to commencement of the study.

Recording and Pre-Processing of Signals
Brain signals from the entire brain were recorded for 20 min as participants remained in the resting state. Each record had 64 channels of varying length, and the sampling frequency of each channel was 500 Hz. The frequency range considered was 0.3-40 Hz. All signals were discretized to 5519 samples in length. After segmentation, the higher-order spectra (HOS) bispectrum [23,24] is obtained. Nonlinear features are extracted from the HOS bispectrum plots. Figure 1 presents the proposed methodology.

HOS Bispectrum
The HOS bispectrum are obtained from the segmented ASD EEG signals. It is a nonlinear method which helps to provide the pase information present in the EEG signal.

Feature Extraction
Textural features are widely used in image analyses. These features allow images to be separated into regions of interest and classified thereafter. Textural features are exemplary as they capture crucial characteristics such as smoothness, consistency, and roughness of an image [25]. Textural parameters define the spatial distribution of intensity levels in a neighborhood. Some textural features extracted in image analyses include the co-occurrence matrix and difference-vector-based and run-length-matrix-based features. In this study, run-length-matrix-based features that were nonlinear were extracted after pre-processing. The features included the log energy, Kapoor entropy, max entropy, Rényi entropy [26], Shannon entropy [27], Vajda entropy [28], Yager entropy [29], short run emphasis [30], long run emphasis [31], gray-level nonuniformity [31], run length nonuniformity [31], run percentage [31], low gray-level run emphasis (LGRE) [32], high gray-level run emphasis (HGRE) [30], short run low gray-level run emphasis (SLGRE) [32], short run high gray-level run emphasis (SHGRE), long run low gray-level run emphasis (LLGRE) [30], and long run high gray-level run emphasis (LHGRE).

Description of Features
As EEG signals exhibit nonlinear characteristics, nonlinear features are used for classification of normal and anomalous signals [33]. Additionally, nonlinear features were used, as they are better able to capture complicated dynamic variants of EEG signals as compared to linear signals [34]. The short-run emphasis parameter increases when short runs take control in fine-grained image textures. Similarly, in long-run emphasis, the long runs take control in textures that are coarse or have sizeable uniform areas. Both short and long-run emphasis features describe the distribution of the corresponding long or short uniform runs in an image [35].
In LGRE, the feature metric increases as runs of low gray value govern the texture. Analogously, the measurement of HGRE spikes when the texture is controlled by large runs of gray value. Both low and high gray-level run emphasis features define the distribution of low or high gray-level runs within an image [32]. In gray-level nonuniformity, as gray level outliers dominate the histogram, the parameter increases, whereas in run length nonuniformity, the metric increases when the histogram is dominated by a few gray-level outliers. Both features explain the non-uniformity of the gray-levels or the length of the homogenous runs [32].
The run percentage feature details the homogeneousness of the histogram, and is at its peak when all runs are of uniform length regardless of gray-level [35]. In SLGRE, as more short runs of gray value dominate the texture, the metric of the feature increases. The measurement of the SLGRE increases as short runs with elevated intensity levels govern the texture. Both parameters generally describe the distribution of the short homogeneous runs with either high or low gray-levels [32]. As for LHGRE, it increases when long and high gray value runs are used together. The measurement of LLGRE increases as long runs with low gray-levels control the gray levels [35]. Both features define the distribution of long homogeneous runs with high or low gray-levels [32].

Feature Reduction and Selection
The extracted features are then subjected to locality sensitive discriminant analysis (LSDA) [36], a feature reduction technique. Data reduction techniques are employed to transform the features to a low-dimensional space for the discriminant analysis of data points [36]. LSDA works by determining the local manifold structure, and finding the prediction that maximizes the margin between data points from dissimilar classes at each local area. Unlike LSDA, other data reduction techniques such as principal component analysis (PCA) and linear discriminant analysis (LDA) do not determine the fundamental structure if the data appears to be proximal to the submanifold of the surrounding space; only the Euclidean structure is identified [36]. Being more advantageous, LSDA was thus used in this study. The reduced feature set was subjected to the independent t-test thereafter [37], in order to select the most significant features. Features with p-values ≥ 0.05 were discarded, whereas the remainder were used for classification.

Classification
A range of classifiers were explored in this study for the discrimination of classes. The LDA [38] describes Fisher's linear discriminant in a basic way. It predicts by estimating the probability that a new set of input data fits each class. A prediction is made when the output data is formed with the class having the largest probability. Quadratic discriminant analysis (QDA) [39] which is an extension of LDA, was also used. It is based upon the supposition that the covariances are not certainly equal, and if they do happen to be equal, the decision boundary becomes linear, causing QDA to be reduced to LDA. The k-nearest neighbor (KNN) [40] classifier was also employed in this study. The feature classification takes place on the basis of the class that is most common to the feature's k-nearest neighbors. Another classifier explored was the probabilistic neural network (PNN) classifier. PNN comprises layers wherein the concealed layer computes the probability density, whereas the summing layer combines the results. Support Vector Machine (SVM) has the ability to be generalized in a high-dimensional space, with a small training data size, and achieve high accuracy [41,42]. Hence, the SVM with radial basis function (SVM-RBF) kernel [43] and polynomial kernels [44] 1, 2, and 3 were also used. The RBF kernel is more adept than linear kernels due to its ability to nonlinearly map samples with nonlinear relationships into a higher dimensional space. The 10-fold cross-validation [45] technique was used to evaluate the performance of the classifiers. Table 1 presents the classification results based on the performance of the classifiers used. From the results obtained, it is evident that the PNN classifier achieved the highest accuracy, sensitivity, specificity, and positive predictive values of 98.70%, 100%, 97.30%, 97.56%, respectively, besting other classifiers. Table 2 presents the significant features selected using the t-test after LSDA feature reduction. As seen in Figure 2, only five features were needed to obtain the highest accuracy with the PNN model. Lower accuracies were obtained with the support vector machine with radial basis function (SVMRBF), k-NN, and SVM polynomial 3 classifiers, as lesser features are used to train the models. Hence, PNN is the most desirable classifier to be used to best differentiate ASD from normal EEG signals. Figure 3 presents the box plot of the top LSDA features. The boxplot was plotted by using the five most significant features, LSDA 13, LSDA 8, LSDA 9, LSDA 11, and LSDA 7, with p < 0.05, as seen in Table 2. It is observable that generally the mean of LSDA features was higher in the autism group than in the normal group. This could be due to higher variability in the autism class. Figure 4a,b shows the bispectrum plots of the normal and ASD classes, respectively, acquired from one channel (channel 64). More bispectrum plots of the normal and ASD classes for channels 10 and 50 are also shown in Figure 5a,b and Figure 6a,b, respectively. From these plots, it can be seen that the bispectrum patterns for the two classes were unique and distinct. Thus, the features used in our study had high discriminatory capacity.       Table 3 summarizes prior studies in which a CABDS and EEG signals were used to assist in autism diagnosis. In the first study, the discrete wavelet transform (DWT) was employed to decompose acquired EEG signals. The signals were then mixed with artifacts and subjected to fast Independent Component Analysis (ICA) to obtain independent components. The signals were subsequently grouped into six different cases with different artifacts. The proposed method achieved an average correlation coefficient of 0.757 and regression of 0.699, demonstrating this to be an acceptable method for ASD detection [46]. Table 3. A summary of studies using computer-aided brain diagnostic system (CABDS) for the prediction/diagnosis of ASD using electroencephalogram (EEG) signals.   DWT was also employed in the second study to decompose pre-processed EEG signals, thereby obtaining sub-bands. Entropy values were then computed on these bands to form the feature vector, which was put into an artificial neural network (ANN). Ten-fold validation was used for evaluation. The area under the receiver operating curve (ROC) with statistical measures obtained the highest accuracy of 99.7% for DWT coupled with Shannon entropy [47].

Discussion
In the third study, a power spectral analysis was performed on pre-processed signals. The relative and absolute power were computed per frequency band, after which coherence indices were calculated for six intra-hemispheric and eight interhemispheric brain regions, respectively. Large differences in EEG power were reported between the groups, and larger EEG power in delta and theta power were found in the frontal and posterior regions [48].
Similarly, the wavelet transform was also employed in another study, in order to decompose the acquired EEG signals into six frequency bands, after which nonlinear features were extracted from these bands. The recursive feature elimination algorithm was used to select significant features, which were fed to a support vector machine with radial basis function (SVMRBF) classifier. High sensitivity and specificity values of nearly 100% were achieved for early detection of ASD [49].
Nonlinear features were extracted from time and frequency domains in the subsequent study, reporting that nonlinear features served as good indicators of early stages of ASD [50].
The spectral power and mean coherence parameters were computed from the EEG signals in another study. Student's t-test was used to obtain the important differences for intragroup comparisons. It was reported that the spectral power of the theta rhythm was lower in autistic children than in healthy children, whereas the gamma power was larger [51].
In a separate study, variance in time and modified multiscale entropy features were extracted from pre-processed signals and fed to different classifiers. The highest accuracy of 79% was yielded with the naïve Bayes classifier [52].
In another study, the childhood autism rating scale coupled with statistical measures was used to examine the relationship between EEG anomalies and autism severity level. It was reported that the relationship between EEG anomalies and severity of autism was statistically significant [53].
After pre-processing the EEG signals, principal component analysis (PCA) was employed for dimensionality reduction, prior to extracting recurrence quantification analysis (RQA) nonlinear features from the signals, in another study. The SVM classifier coupled with leave-one-subject-out validation yielded a high classification accuracy of 92.9% [54].
Multiscale entropy (MSE) features were explored for the identification of ASD severity level in children, in another unique study. The MSE patterns that were obtained revealed that children with mild ASD had increased sample entropy values as compared to those with severe ASD. Also, the MSE values and physical representations were reported to represent children according to mild and severe ASD [55].
Elsewhere, EEG signals were extracted from children as they were subjected to images of different facial expressions (happiness, sadness, and calmness). A hybrid model was developed thereafter to map to the feature space. The mapping process was optimized and the resulting vector was input to the SVM classifier. The proposed method was able to discriminate normal versus ASD classes successfully [56].
In the next study, an artefact-free EEG segment was employed to calculate input values for successive analyses. The Implicit Function as Squashing Time (I-FAST) algorithm was employed subsequently for the selection of predictive parameters. The resulting invariant feature vector was then input to several classifiers, in which a highest accuracy of 92.8% was achieved with the random forest classifier coupled with leave-one-out cross-validation [57].
In another study, three different datasets were explored: eye, EEG, and a combination of both data. For each set, Fast Fourier Transform (FFT), entropy, and statistical features were extracted. PCA or sequential feature selection was used to obtain significant features, which were then input to different classifiers. The best performing models were naïve Bayes and logistic classifiers, which obtained an accuracy of 100% with the combination of eye and EEG data, whereas an accuracy of 100% was achieved with the logistic and deep neural network classifiers with only eye data [58].
In the next study, statistical features were extracted from the pre-processed EEG signals prior to and after the application of the discrete wavelet transform. Correlation-based feature selection was used thereafter to select significant features. The features were then input to various classifiers. A highest accuracy of 93% was achieved with the random forest classifier, using k-fold validation [59].
In the second-to-last study, the mean power spectral density of EEG features were computed after pre-processing. The features were then input to the SVM and artificial neural network (ANN) classifiers, and confusion matrixes were used to validate model performance. The highest accuracy of 90.5% (without emotions) was yielded by the ANN classifier for classification without emotions. A highest accuracy of 92.5% was also achieved with the ANN classifier, for classification with emotions [60].
Lastly, the global functional connectivity was computed after brain signals were acquired. Statistical analyses were conducted thereafter, and the results were supported by the autism diagnostic interview coupled with clinical evaluations. It was reported that the difference in global functional connectivity values between the high-risk (HR) and low-risk (LR) ASD groups and other groups in comparison was insignificant. In addition, the increase in the networks in the alpha range between the HR and LR groups and other groups by comparison was insignificant [61].
From Table 3, it is apparent that nonlinear features have been prevalently used to diagnose AD [49,50,54,55,57]. Additionally, SVM classifiers have also been commonly employed to classify EEG signals for the detection of ASD [52,54,56,[58][59][60] similar to our study. Although a classification study was done, lower accuracies were achieved in the following studies: [52,54,57,59,60] as compared to ours. Although higher classification accuracies of 100% [58] and 99.71% [47] were achieved in these particular two studies as compared to our study, smaller data sizes were used for training in both studies. Although the results achieved in [30] are comparably high, the study reports on classification and correlation results, different from our study, which focused on classification alone. The remaining studies in Table 3 did not discuss classification; only correlation or comparison results were discussed. Hence, with the high accuracy obtained and larger data used as compared with most studies in Table 3, our proposed method is robust, as it has been tested on more data. There are several benefits and drawbacks of our technique: Benefits: 1.
The recommended technique allows for rapid and accurate diagnosis of ASD.

2.
The diagnostic method is non-invasive.

3.
The method is promising, as the model used has been validated by 10-fold validation. Drawbacks: 1. Feature extraction and selection processes are done manually.

2.
This technique only supports a small data size; thus, sizeable data cannot be studied for early detection.

Summary
Both genetic and non-genetic factors may contribute to ASD. Disturbingly, its prevalence has been rising steadily over the past 20 years. Current diagnostics are either lengthy procedures, costly, or invasive, and exhibit other limitations. Hence, we have recommended a non-invasive and cost-effective CABDS to detect autism. After pre-processing, the EEG signals were converted to two-dimensional images using the HOS bispectrum. Nonlinear features were extracted thereafter, and the features were then reduced using LSDA. Student's t-test was then employed to obtain significant features from the reduced feature set, which was input to various classifiers. A highest accuracy of 98.70% was yielded by the PNN classifier. Ten-fold validation was utilized to evaluate classifier performance. The robust system can potentially be used by healthcare professionals as a decision support tool for ASD detection.

Future work
In future work, we intend to gather a large volume of data over a period of a few years to utilize for the early detection of autism in children. Additionally, with the sizable data, we aim to use a deep learning model for classification [21,[62][63][64][65]. When more data is used, the model can be trained well, and it is thus anticipated to perform well. Early detection of ASD assists patients as well as caregivers significantly in better managing the disorder.