Functional analysis of ADHD in children using nonlinear features of EEG signals

Attention deﬁcit hyperactivity disorder is a neurodevelopmental condition associated with varying levels of hyperactivity, inatten-tion, and impulsivity. This study investigated brain function in children with attention deﬁcit hyperactivity disorder using measures of nonlinear dynamics in electroencephalogram signals during rest. During eyes-closed resting, 19 channel electroencephalogram signals were recorded from 12 ADHD and 12 normal age-matched children. The multifractal singularity spectrum, the largest Lyapunov exponent, and approximate entropy were employed to quantify the chaotic nonlinear dynamics of these electroencephalogram signals. As conﬁrmed by Wilcoxon rank sum test, the largest Lyapunov exponent over left frontal-central cortex exhibited a signiﬁcant difference between attention deﬁcit hyperactivity disorder subjects and the age-matched control groups. Further, mean approximate entropy was signiﬁcantly lower in attention deﬁcit hyperactivity disorder subjects in prefrontal cortex. The singularity spectrum was also considerably altered in attention deﬁcit hyperactivity disorder subjects when compared to control children. Evaluation of these features was performed with two classiﬁers: a support vector machine and a radial basis function neural network. For better comparison, subject classiﬁcation based on frequency band power was assessed using the same types of classiﬁers. Nonlinear features provided better discrimination between attention deﬁcit hyperactivity disorder and control than band power features. Under four-fold cross-validation testing, the support vector machine gave 83.33% accurate classiﬁcation results.


Introduction
Attention deficit hyperactivity disorder (ADHD) refers to a heterogeneous neurodevelopmental condition or set of conditions in which children or adults exhibit varying levels of inattention, impulsivity and physical restlessness and/or hyperactivity symptoms [1]. Approximately 3-7% of school-aged children are affected by ADHD [1]. According to the Diagnostic and Statistical Manual of Mental Disorders (DSM-V) this syndrome has three clinical subtypes: predominantly inattentive, predominantly hyperactive-impulsive, and combined [2]. Currently, diagnosis of ADHD is based on levels of symptoms listed as DSM-V diagnostic criteria, as assessed from diagnostic interviews.
Assessment of electroencephalogram (EEG) characteristics associated with ADHD, has generated substantial interest, resulting in a large number of investigations [3][4][5][6][7][8][9][10][11][12][13][14][15][16]. Most of these studies concern frequency-domain indicators, typically absolute and relative power estimates for different frequency bands or ratios of power in different frequency bands [7,8,[10][11][12][13]15]. Although simple to compute and visualize, these approaches are not capable of measuring the nonlinear structure of EEG brain dynamics. Methods from nonlinear dynamics and chaos theory have been used to investigate nonlinear properties of brain dynamics. EEG coherence provides useful information about the coupling of brain activity in different regions [8]. However, the inability of coherence to characterize nonlinear interdependencies, specifically for non-stationary time series, has led to the use of nonlinear synchronization methods for analysing functional brain connectivity [3-6, 9, 14, 16].
According to Stam [17], distinct states of brain activity have different chaotic dynamics which might be quantified by nonlinear measures including entropy and Lyapunov exponents. Approximate entropy (ApEn) has proved to be particularly useful for characterizing short, noisy time series. It is capable of providing a robust, model-independent, and information-theoretic estimate of dynamical complexity [14]. A recently published study has investigated the nonlinear features of EEG signals in ADHD and normal adult subjects performing a continuous performance test [9].
Fractality is another common property of time series representing the dynamics of complex systems. Brain activity, as a complex dynamical system, exhibits a multifractal structure [18,19]. Fractal analysis of EEG time series has proved to be useful for describing brain activity during sleep [20]. Fetterhoff et al. [21] show that multifractal firing patterns of hippocampal spike trains are more complex during a working memory task and significantly declined with the administration of memory impairment in rats. Zorick et al. [19] showed the capability of MFDFA to interfere with the recognition of changes in states of consciousness.
To the knowledge of the authors, the present study is the first to assess multifractal spectra of EEG signals in ADHD subjects. It was predicted that multifractal analysis of EEG signal might provide useful information about the dynamics of brain electrical activity in ADHD subjects. Nonlinear EEG dynamic features were employed, including ApEn, the largest Lyapunov exponent (LLE), and multifractal spectra, to discriminate between children with ADHD and age-matched controls.

Materials and methods
The aim of this research was to analyse nonlinear dynamical properties of EEG signals in children with ADHD and to present a classification system based on these features. The algorithm consisted of four steps: signal preprocessing, feature extraction, feature reduction, and subject classification. Fig. 1 shows all analytic steps involved and subsequently reported here.

Dataset and pre-processing
The dataset employed in this study was EEG time series recorded at the Atieh Comprehensive Centre for Psychology and Nerve Disorders, Tehran, Iran [3]. The subject group comprised 24 right-handed children 7-12 years old (12 ADHD and 12 age-matched healthy subjects). The ADHD group included all clinical subtypes (hyperactiveimpulsive, inattentive, and combined). All subjects were asked to sit comfortably in a silent room with eyes-closed. The EEG data were collected according to the 10-20 International system (Fp1, Fp2, F3,  F4, C3, C4, P3, P4, O1, O2, F7, F8, T3, T4, T5, T6, Fz, Cz, and Pz,) with eyes-closed during rest [3]. The eyes-closed condition was preferred as the preference was to analyse ADHD brain dynamics in the absence of external stimuli. Each recording consisted of 19 channels of EEG data with a sampling frequency rate of 256 Hz and 16-bit resolution [3]. Signal pre-processing was performed in Matlab (MatlabR2013a). Artefact rejection was based on both visual inspection and com-puterized review. To remove out-of-band noise, EEG time series were lowpass filtered at 60 Hz and highpass filtered at a 1 Hz cut-off frequency. Line noise suppression was by notch filtering at 50 Hz. Signals were also visually analysed by an expert and any artefacts discarded. This procedure reserved about one-minute of artefact-free EEG trace out of a three-minute duration signal (15,000 time samples).

Feature extraction
Four types of feature extraction methods were selected, frequency band powers and three nonlinear features: the LLE, ApEn, and the height and width of the multifractal singularity spectrum of the EEG time series. With 19 channels for each subject, a total of 24,198 features were extracted for the analysis. The following feature extraction methods were applied to selected one minute durations of noise-free signal obtained from each channel.

EEG frequency band power
EEG signals were filtered with a 16th order Butterworth band-pass filter to extract four typical frequency bands: delta (0.5-4 Hz), theta (4-8 Hz), alpha (8)(9)(10)(11)(12)(13), and beta (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30). For each band, power was calculated by invoking Perceval's theorem and summing the signal in the time domain [22]. Overall procedure for functional brain dynamic analysis of ADHD and control children using nonlinear dynamical features of EEG signals. First step: Signal was preprocessed to remove any noise and artefacts. Followed by calculation of LLE, ApEn, and multifractal spectra for every channel of the recorded EEG signals. In parallel with nonlinear features, frequency band power estimates were also calculated. Subsequently, the principal component coefficients of these features were computed and the 15 largest components of each feature category employed as final features. Finally, subject group classification was implemented with two classifiers: a support vector machine classifier and a radial basis function neural network.

Largest Lyapunov exponent
Lyapunov exponents quantify exponential divergence or convergence of nearby trajectories in a state space. Since it may indicate chaotic dynamics in a system, existence of positive LLE is of special importance [17]. Rosenstein et al. [23] have suggested a straightforward approach to calculate LLE [23]. The basic idea of the algorithm is that the average growth rate of inter-vector differences starting from pairs of nearest neighbours is bounded by the largest Lyapunov component. Based on Taken's theorem [24], reconstruction of trajectories from time series in a state space is given by: where X i is the state of the system at discrete time i, which for a N point EEG signal x = x 1 , x 2 , . . . , x N is calculated from: where J is the lag delay and m is the embedding dimension. So X would be an M × m matrix in which M = N-(m−1)J. To avoid information loss, the lag delay and embedding dimension are determined so that the reconstructed trajectories in the new state space are not folded [17]. After reconstructing the trajectories, the initial distance from the " j th point" X j to its nearest neighbour Xĵ is expressed as: where d j (0) is the initial distance and · · · denotes the Euclidean norm. In Rosenstein's algorithm, the growth rate of inter-vector differences starting from nearest neighbour pairs is bounded by LLE [23]. Therefore, LLE(λ 1 ) of the EEG signal is calculated from: where ∆t is the sampling period of the EEG signal and d j (i) is the distance between the jth pair of the nearest neighbours after i discrete time steps. A positive Lyapunov exponent implies that any two points that are initially very close together will diverge exponentially quickly in the direction of the positive Lyapunov exponent [23].
Since reconstructing a trajectory in a state space with the embedding dimension lower than its minimum embedding dimension, causes the trajectory to be folded, the minimum embedding dimension must be calculated first [17]. Moreover, very small lag delays may bring about almost linear trajectory reconstruction with high correlations between consecutive phase space points and very large delays may disregard the deterministic structure of the sequence [17]. According to previous investigations, the minimum embedding dimension for EEGs varies mostly in the range of 9 ± 3 [3,4]. The minimum embedding dimension of the EEG time series employed here, for all channels and all subjects, calculated using the false nearest neighbour algorithm [25], varied in the range of (6-15), so the optimum embedding dimension employed was greater than 15. In this study, the embedding dimension was assumed to be 15 because this value was the maximum value found for minimum embedding dimension in the dataset analysed here. Another important factor was the optimum lag delay. As it gave a better result at the classification step, after testing several different values from 1 to 10, a lag delay of J = 1 was selected for use here.

Approximate entropy
The apen, as suggested by Pincus [26], is a nonlinear measure that quantifies the complexity or irregularity of a system. It measures the logarithm of the frequency at which the neighbourhoods of temporal patterns with the same length within a certain distance by one time point augmentation in phase space remain close [26]. Given the embedding dimension m, and time lag J, the state space of the N dimensional EEG signal x = x 1 , x 2 , . . . , x N was computed from Equation (2). ApEn was computed from the correlation integral C m (r), which represents the number of points within distance r from the ith point of the EEG time series when the signal is embedded in a phase space with embedding dimension m. The correlation integral was then calculated as: In eqn (5), Θ is the Heaviside function and X i and X j are vectors in the state space and r is a threshold value for distance. Finally, the ApEn is defined as [26]: The embedding dimension can be found as described above. In some applications a value between 10 and 25 % of the data standard deviation (SD) is used for the threshold value [27]. Here, the parameter r (distance) was set to be 10 % of the SD.

Multifractal detrended fluctuation analysis
Detrended fluctuation analysis (DFA) is a widely-used method for exploring the fractal properties of a time series. Nonetheless, time series with a complex structure are multifractal and multifractal detrended fluctuation analysis (MFDFA) has been proposed for evaluation of their fluctuations [28]. The MFDFA procedure consists of five steps. In the first step, the structure of the EEG signal must be converted into a random walk [28].
where x denotes the average amplitude of an EEG time series. Next, the profile Y (i) is divided into Ns ≡ int(N/s) non-overlapping segments with equal lengths s. For each segment, the root mean square (RMS) variance is calculated from Equation (8), in which ν = N s + 1, . . . , 2N s and y ν (i) is the fitting polynomial in segment ν: To obtain the qth-order fluctuation function, the mean RMS value over all the segments is calculated from: Spatial and temporal variations in the scale-invariant structure of a multifractal time series require Steps two and three to be repeated at several different time scales s. Finally, the scaling behaviour of the fluctuation functions is determined by analysing log-log plots of F q (s) versus s for each value of q: where h(q) is called the q-order Hurst exponent [28]. For multifractal EEG time series, there is significant dependence of h(q) on q [28]. For positive values of q, h(q) describes the scaling behaviour of segments with large fluctuations. For negative values of q, h(q) describes the scaling behaviour of segments with small fluctuations [28]. The q-order Hurst exponent h(q), is only one of several factors used to parameterize the multifractal structure of an EEG time series. It is related to the scaling exponents τ(q) by: Thereafter, scaling exponents can be converted into the q-order singularity exponent (α) and the q-order singularity dimension ( f (α)) with the following equations to obtain a multifractal singu-larity spectrum: where τ (q) is the derivative of τ(q). The width and shape of the multifractal spectrum are valuable factors for distinguishing different multifractal structures [28]. In this study, the width (dα) and its height (d f (α)) were used to evaluate the multifractal spectrum of EEG signals:

Feature reduction using principal component analysis
Principal component analysis (PCA) was used to extract features strongly correlated to each other. In this procedure, the principal component coefficients of the dataset are calculated and the k largest components are considered to give the joint features of the dataset that capture the most data variance in k dimensions [29]. Considering the feature vector X with mean µ x and a covariance matrix of C x ,e the eigenvectors and eigenvalues of the features can be calculated and sorted into an order of descending eigenvalues. In this case, transforming a data vector X gives: where A K is a matrix consisting of the first k eigenvectors of the covariance matrix C x and the components of y are then calculated as the orthogonal transformations of X. Finally, the original data vector X can be approximately reconstructed from y by: In this study, the principal component coefficients of previously extracted features (Frequency band powers as well as nonlinear features) were calculated and the 15 largest coefficients of each category were then used as the final feature set.

Support vector machines classifier
The support vector machine (SVM) proposed by Vapnik [30] is a supervised machine learning algorithm preferred for many classification problems. The aim of the SVM is to compute an optimal separating hyperplane to which the distance from each nearest data sample in each class is maximized. In non-separable cases, SVM offers a solution using kernel mapping. To allow separation in such cases, the data can be projected into a higher-dimensional feature space using a nonlinear function (φ (·)).
In general, the formulation of the hyperplane is as follows: in which W and b are a weight vector and a bias term, respectively. To find such an optimum hyperplane, the optimization problem is: The above-mentioned problem is solved using Lagrangian optimization theory [31]. Here, linear, polynomial, and radial basis function (RBF) kernel functions were tested. The RBF kernel led to better discrimination accuracy.

Radial basis function neural networks classifier
The radial basis function neural network (RBFNN), a supervised machine learning algorithm, is an artificial neural network that uses radial basis functions as activation functions [32]. The output of the network is a linear combination of radial basis functions of the inputs and neuron node parameters. A RBFNN typically has three layers: input, hidden with a nonlinear RBF activation function, and a linear output layer. For an input vector X, the output of the network is given by: where N is the number of neurons in the hidden layer, C i is the centre vector for neuron node i, and a i is the weight of neuron node i in the output neuron node [32]. The most commonly used radial basis function (ρ) is Gaussian: RBFNNs are typically trained by back propagation [32]. In this study, the MATLAB neural network toolbox was used to implement a RBFNN. It is important to consider the optimum value for the RBFNN spread parameter. It should be large enough that the radial basis layer neuron nodes respond to overlapping regions in the input space, but not so large that all neuron nodes respond in essentially the same manner.

Result
To evaluate the proposed algorithm, noise removal was first performed. The features mentioned above for the noise-free one-minute long EEG signals were then extracted. Because of the small sample size, evaluation of discrepancies in nonlinear feature category between ADHD and normal subjects was performed using the nonparametric Wilcoxon rank sum test. The most highly distinguished LLEs were the scalp channels Fp1 (p = 0.0379), Fp2 (p = 0.0499), F3 (p = 0.0379), C3 (p = 0.0148), F7 (p = 0.0148), and Cz (p = 0.0148).
The average LLE of the ADHD group was significantly higher than the control group at left frontal central scalp electrode channels This suggests that for those channels EEG chaoticity for ADHD subjects was higher than the age-matched control group.
Similarly, the Wilcoxon rank sum test revealed significant group differences for the ApEn measure. Mean ApEn in ADHD subjects was lower than for the control group. Mean ApEn for ADHD subjects trended lower at all 19 EEG channels, reaching statistical significance at channels Fp1 (p = 0.0351) and Fp2 (p = 0.0531) (Fig. 2).
High values of ApEn indicate a more complex and irregular system and low values, a lower complexity system. Fig. 2 shows that the complexity of the EEG signals in the ADHD group was lower over the prefrontal scalp.
The height of the multifractal spectrum for ADHD subjects at most scalp channels was dissimilar to that of controls. No significant differences were observed between the widths of the multifractal spectrum in the two groups, except for channels Fp1 (p = 0.0166) and F7 (p = 0.0404).
The height of the multifractal spectrum statistical evaluation result is shown in Fig. 3. Multifractal spectrum differences between ADHD and control subjects were found in some frontal and right scalp channel signals including Fp1 (p = 0.0304), F4 (p = 0.0086), F7 (p = 0.0404), F8 (p = 0.0141), P4 (p = 0.0226), T4 (p = 0.0141), O2 (p = 0.0464), Cz (p = 0.0262), and Pz (p = 0.0086). For ADHD subjects the multifractal spectrum was higher than the control group over these scalp electrodes. Fig. 2. Results of the Wilcoxon rank sum test for the mean ApEn feature between ADHD and control group at channels: (a) Fp1, (b) Fp2. Mean ApEn in subjects with ADHD was significantly lower than control in these two scalp electrodes. Fig. 3. Black disks indicate channels with multifractal spectrum height differences (p-value < 0.05) between ADHD and normal subjects. The multifractal spectrum for ADHD subjects was significantly higher than the control group over these scalp electrodes.
After evaluating the nonlinear features of channel signals, their ability to discriminate the two groups was considered. The first 15 principle components in each feature category (band power and nonlinear) were chosen; these 15 PCs preserved 90% of signal variance. Next, two classifiers, an SVM and an RBFNN, were constructed and the first 15 principle components in each feature category were separately analyzed. 80% of the data were used to train the classifiers; the other 20% were kept for testing. The four-fold cross validation procedure was used for evaluating the classifiers' performance. Fig. 4 summarizes the results of applying frequency band power values in the classifiers. As shown in Fig. 4, using alpha power in the SVM gave the highest accuracy (83.33%). Moreover, beta and delta power using RBFNN gave the least accurate classification discrimination (58.33%).
In Table 1, the results of using nonlinear features (LLE, ApEn, and the multifractal spectrum) are given for both classifiers. When it was used as input to the SVM classifier, the multifractal spectrum features produced the highest accuracy, 79.17%. Accuracy was lowest when ApEn was used as the classifier input for SVM (59%). Largest Lyapunov exponents with RBFNN exhibited a high accuracy (79.17%) in classifying two groups as well.  Finally, classification results using combined band power and nonlinear features are shown in Table 2. SVM accuracy for frequency band power (79.17%) did not change after PCA. Accuracy of band power features with RBFNN before PCA (83.33%) declined after PCA (70.83%). The highest classification accuracy (83.33%) was achieved when nonlinear features were applied to the SVM classifier. This result remained unchanged after applying PCA.
In general, using nonlinear features in the SVM classifier provided much better classification of ADHD and controls. Findings confirmed that these nonlinear features increased the accuracy of classification compared to using band power features.

Discussion
In this study, resting EEG from 12 ADHD subjects and 12 controls were analysed using nonlinear analysis metrics. It was believed that nonlinear analysis would offer new tools for exploration of brain information processing. Thus, three measures of nonlinear dynamics were explored: LLEs, ApEns, and the multifractal spectrum of oneminute noise-free EEG signals extracted from 19 scalp channels. To visualize the effects of each feature set, the mean and SD of all features for the two groups are shown in Fig. 5. It is clear that the MF spectrum height contains more discriminative information.
A Wilcoxon rank sum test revealed that mean LLE for ADHD subjects was significantly higher than the control group in scalp electrode channels over left frontal and left central scalp. Even though the LLE distinguished the two groups, it could not be concluded with certainty that these results indicate impaired cortical information processing in these brain regions.
ApEn was significantly lower in subjects with ADHD in scalp channels over prefrontal scalp. This suggests that the complexity of cortical dynamics in prefrontal regions of ADHD subjects may be lower than that of control subjects. Reduced EEG complexity has been associated with elevated dysfunction in some previous studies [14,17,33,34]. Ikawa et al. [34] described region-specific correlations between neuropsychological performance and dynamical EEG complexity in subjects with attention deficit disorder. Sohn et al. [14] also suggested an association between reduced cognitive performance on the continuous performance test in ADHD adolescents and impaired cortical information processing, as indicated by the lower complexity of the EEG. Therefore the reduced ApEn in ADHD children reported for this study might possibly relate to their relative inability to deal with demands for flexible information processing, also noted to be of difficulty for ADHD adults.
To the authors' best knowledge, this is the first time that MFDFA has been used for discrimination of ADHD. Several recent studies suggest that changes in the multifractal structure of EEG signal reflect changes in the adaptability of underlying physiological process [19,21]. According to the results shown in Fig. 3, the shape of the multifractal spectrum, including both its width and height differed in ADHD and healthy children. Specifically, in scalp channels over the right hemisphere the multifractal spectrum for ADHD subjects was higher than the age-matched control subjects. The elevated height and width of multifractal spectrum in children with ADHD show a more complex pattern in these children compared to healthy subjects. These results are in line with previous study of epileptic subjects which show that the degree of multifractality of EEG for patients in an epileptic seizure are much higher when compared to normal healthy people [18].
It has been suggested that symptoms of ADHD originate from dysfunction of neural systems in brain regions associated with attention and response inhibition [7]. A vast majority of previous investigation provides evidence for a deficit in the allocation of neural resources in the frontal cortices [11][12][13]35]. A neuropsychological approach was previously used to examine the frontal lobe and right parietal lobe theories of attention deficit hyperactivity disorder [36]. Hence, lower ApEn at prefrontal and higher LLE at frontal scalp electrodes (Fp1, Fp2, F3, F7) might suggest a lower complexity and predictability of underlying neural systems at these scalp regions. Also the distinct multifractal spectrum shape at frontal (Fp1, F4, F7, F8) and right parietal scalp channels might indicate impaired cortical activity at these regions compared to that of healthy subjects.
These features, as well as spectral power features, were used for classification (by use of two different classifiers: SVM and RBFNN). Classification accuracy was significantly higher when nonlinear features were used as input. In classifications using frequency band power, the highest accuracy (83.33%) was achieved using the alpha band power. For nonlinear features, using multifractal spectral features in an SVM classifier produced the highest accuracy (79.17%). This did not improve after PCA. Maximal accuracy (83.33%) was achieved using a combination of nonlinear features in an SVM classifier. These results confirmed that the nonlinear features of scalp EEG signals tested here can increase the accuracy of classification compared to using band power features.

Conclusion
The ability of nonlinear dynamical analysis to discriminate EEG signals of children with ADHD in comparison with age-matched healthy subjects was investigated. To do so, three nonlinear dynamical measures, LLE, ApEn, and the multifractal spectra of EEG signals were extracted from different channels of the scalp and were classified. Results showed that mean LLE, mean ApEn, and mean multifractal spectrum shape were significantly different for both groups. Using nonlinear features along with SVM yielded a high accuracy (83.3%) of ADHD identification. This was the first investigation to employ MFDFA for the discrimination of ADHD. However, small sample size was a limitation of the study. It is suggested that nonlinear measures along with large datasets may reveal more aspects of cognitive information processing deficit in children with ADHD.
This study investigated nonlinear properties of EEG signals during rest. Nevertheless, the allocation of neural resources differs when a subject directs their attention in an experimentally controlled situation. Therefore, the investigation of nonlinear measures of cognitive performance tasks should be further explored in future studies.