Spatial–temporal graph convolutional network for Alzheimer classification based on brain functional connectivity imaging of electroencephalogram

Abstract Functional connectivity of the human brain, representing statistical dependence of information flow between cortical regions, significantly contributes to the study of the intrinsic brain network and its functional mechanism. To fully explore its potential in the early diagnosis of Alzheimer's disease (AD) using electroencephalogram (EEG) recordings, this article introduces a novel dynamical spatial–temporal graph convolutional neural network (ST‐GCN) for better classification performance. Different from existing studies that are based on either topological brain function characteristics or temporal features of EEG, the proposed ST‐GCN considers both the adjacency matrix of functional connectivity from multiple EEG channels and corresponding dynamics of signal EEG channel simultaneously. Different from the traditional graph convolutional neural networks, the proposed ST‐GCN makes full use of the constrained spatial topology of functional connectivity and the discriminative dynamic temporal information represented by the 1D convolution. We conducted extensive experiments on the clinical EEG data set of AD patients and Healthy Controls. The results demonstrate that the proposed method achieves better classification performance (92.3%) than the state‐of‐the‐art methods. This approach can not only help diagnose AD but also better understand the effect of normal ageing on brain network characteristics before we can accurately diagnose the condition based on resting‐state EEG.

and limbic regions could account for AD-associated disturbances in brain function (Masliah et al., 2001;Scheff et al., 2006Scheff et al., , 2007Terry et al., 1991). Being more extensive than the corresponding neuronal loss when analysed in the same brain regions, synaptic dysfunction has been proved to be the best neuropathological correlate of cognitive impairment in AD patients (Davies et al., 1987;DeKosky & Scheff, 1990;Scheff et al., 2007;Terry et al., 1991). The pathological progression of AD leads to cortical disconnections and manifests as functional connectivity alterations (Nobukawa et al., 2020). Electroencephalography (EEG) is a noninvasive diagnostic method for studying the bioelectrical function alterations and degeneration of the brain.
Consisting of scalp electric potential differences, EEG is one of the first measurements that directly reflect the functioning of synapses in real-time (Jelic, 2005;Michel et al., 2009). In contrast to functional MRI or PET which detect indirect metabolic signals, EEG offers several additional attractions: noninvasiveness, high time-resolution, wide availability, low cost, and direct access to neuronal signalling (Michel et al., 2009;Smailovic et al., 2018).
For the diagnosis of AD based on EEG, there are two main development areas in recent years. The first area is statistically topological brain function characteristics. Finding statistical biomarkers of AD is based on analysing graph topographic or dynamic patterns. For graph topography, Jalili (2017) constructed functional brain networks and relevant graph theory metrics based on EEG for discriminating AD from Healthy Controls (HC). Similarly, Tylová et al. (2018) proposed the permutation entropy for measuring the chaotic behaviour of EEG and observed a statistically significant decrease in permutation entropy at all channels of AD. Fan et al. (2018) employed multiscale entropy as the biomarker to characterize the nonlinear complexity at multiple temporal scales to capture the topographic pattern of AD. For dynamic patterns, Zhao et al. (2019) proposed a method to measure nonlinear dynamics of functional connectivity for distinguishing between AD and HC, and Tait et al. (2020) developed a biomarker by combining microstate transitioning complexity and the spectral measure. The second area is data-driven machine learning or deep learning methods with various input features for the AD classification.
Among these machine learning methods, it has been proven that deep learning has exceptional performance in terms of the accuracy of classification. Convolutional neural networks (CNN), a leading deep learning structure for data on Euclidean space, outperform the above machine learning algorithms in classification accuracy (Craik et al., 2019). Based on the fast Fourier transform for extracting spectral features of EEG for AD diagnosis, Bi and Wang (2019) developed a discriminative convolutional Boltzmann machine; Ieracitano et al. (2019) and Deepthi et al. (2020) proposed a CNN model, respectively.
Based on combining latent factors output by the encoder part of variational auto-encoder of EEG, Li, Wang, et al. (2021) extracted characteristics of AD. Based on the time-frequency analysis using CWT, Huggins et al. (2021) proposed an AlexNet model for the classification of AD, mild cognitive impairment subjects, and HC. With a connections matrix from EEG, Alves et al. (2021) presented a CNN for classifying AD and schizophrenia. The above CNN applications on AD classification (Bi & Wang, 2019;Deepthi et al., 2020;Huggins et al., 2021;Ieracitano et al., 2019;Li, Wang, et al., 2021) focus more on learning the locally and continuously changed multiscaled features on the Euclidean space from the EEG signals, neglecting the functional connectivity features. Although Alves et al. (2021) used the connection of EEG channels as the CNN input, it neglected the temporal EEG channels features and the input connections topology feature cannot be modelled effectively due to the arranged order of EEG channels.
The key convolutional filters on CNN structures in the above applications cannot fully mine the multiscale topological interactive information of EEG channels.
Considering the complexity of EEG signals in the spatial and temporal domain, how to extract more abstract geometric features for better generalisation using the deep learning methods remains tremendously troubling. The structure-function connectivity network of EEG is non-Euclidean data because the channels are discrete and discontinuous in the spatial domain. Each EEG channel can be considered as a node and there is a cross-channel interaction between nodes.
Instead, geometric graph-based deep learning methods would provide a more suitable way to learn the cross-channel topologically associated features of EEG. Building neural networks under the graph theory, graph convolutional neural networks (GCNs) have been developed specifically to handle highly multirelational graph data by jointly leveraging node-specific sequential features and cross-nodes topologically associative features in the graph domain (Gallicchio & Micheli, 2010;Gori et al., 2005;Scarselli et al., 2008;Sperduti & Starita, 1997). In recent 2 years, GCNs have been applied in the diagnoses of various brain disorders, such as children's ASD evaluation , detection of epileptic (Zeng et al., 2020;Zhao et al., 2021), seizure prediction , and epilepsy classification . As far as we are concerned, there are no AD diagnostic approaches based on GCN-related models.
To enable this application, an adjacency matrix, representing the topological association between different EEG channels, must be constructed as the key input of GCN. EEG functional networks are widely used in cognitive neuroscience, for example, decision making , emotion recognition (Li, Liu, et al., 2019), and Schizophrenia research (Li, Wang, et al., 2019). Traditional statistical functional correlation measures include Pearson correlation (PC) Zhao et al., 2021), Tanh nonlinearity , average correlation coefficients (Zeng et al., 2020), and covariance , which cannot fully represent the complex brain connectivity. Many advanced methods to measure functional connectivity (FC) have been developed, such as phase locking values (PLV) and phase lag index (PLI) in the time domain (Franciotti et al., 2019;Mormann et al., 2000;Van Mierlo et al., 2014), magnitude squared coherence (MSC) and imaginary part of coherence (IPC) in the frequency domain (Al-Ezzi et al., 2021;Babiloni et al., 2005;Van Diessen et al., 2015;Wendling et al., 2009), and wavelet coherence (WC) in the time-frequency domain (Franciotti et al., 2019). These values can measure the degree of synchronisation between different brain regions and alterations in complex behaviours produced by the interaction among widespread brain regions (Babiloni et al., 2005;Sakkalis, 2011;Tafreshi et al., 2019;Van Mierlo et al., 2014), which have been proved important for AD classification using the statistical (Jalili, 2017;Zhao et al., 2019), and machine learning methods (Nobukawa et al., 2020;Song et al., 2018;Yu et al., 2019). However, the research on the combination of FC and GCN is limited, especially for AD-related research. Using these efficient FCs to construct the input adjacency matrix of GCN may promisingly provide more insightful information for the brain function interaction and lead to a higher classification accuracy of brain-related disorders.
In this article, a novel spatial-temporal GCN (ST-GCN) is proposed to classify AD from HC, benefiting from the adjacency matrix constructed by a variety of FC measures and the raw EEG recordings.
We tested six adjacency matrices based on PC, MSC, IPC, WC, PLV, and PLI using EEG recordings from patients with AD and HCs. ST-GCN can jointly leverage the cross-channel topological connectivity features and channel-specific temporal features. To the best of the authors' knowledge, this is the first attempt for GCN to distinguish between AD and HC based on EEG recordings.

| Spatial-temporal graph convolutional network
In 1997, Sperduti and Starita first adopted neural networks to direct acyclic graphs (Sperduti & Starita, 1997), which motivated the early studies on GCNs (Gallicchio & Micheli, 2010;Gori et al., 2005;Scarselli et al., 2008). Currently, there are two basic approaches to generalising convolutions to structure graph data forms: spatial-based and spectral-based GNNs. Spatial-based GNNs define graph convolutions by rearranging vertices into certain grid forms which can be processed by normal convolutional operations (Niepert et al., 2016;Yu et al., 2017). Bruna et al. (2013) presented the first prominent spectral-based GCNs by applying convolutions in spectral domains with graph Fourier transforms. Since then, there have been increasing improvements, approximations, and extensions on spectral-based GNNs (Defferrard et al., 2016;Henaff et al., 2015;Kipf & Welling, 2016;Levie et al., 2018) to reduce the computational com- Welling, 2016). Visually, a graph convolution can handle the complexity of graph data by generalising a 2D convolution, motivated by the successful applications of CNNs in Euclidean space (Wu et al., 2020).
Being considered as special graph data, each pixel of an image can be taken as a node whose neighbours are determined by a filter and a 2D convolution takes the weighted average of adjacent pixel values of each node. Similarly, graph convolutions can be performed by taking the weighted average of a node's neighbourhood information, which is unordered and variable in size, and different from images.
As shown in Figure 1a, the proposed architecture of ST-GCN is composed of two spatial-temporal convolutional blocks (ST-Conv Blocks), each of which is formed with one spatial graph convolution layer (Spatial Graph-Conv) and two sequential convolution layers (Temporal 1D-Conv).
ST-Conv block can be stacked based on the complexity of specific cases.
Layer normalisation is utilised within every ST-Conv Block to prevent overfitting. The EEG channels X with the adjacency matrix W are uniformly processed by ST-Conv Blocks to explore spatial and temporal dependence coherently. A flattened layer integrates comprehensive features to generate the final AD/HC classification.
For comparison, we designed a structure of a classical temporal convolutional neural network (T-CNN) shown in Figure 1b, inspired by some related works (Deepthi et al., 2020;Huggins et al., 2021;Li, Wang, et al., 2021). The difference between these two model structures is that T-CNN only has EEG channels X at the input without the adjacency matrix W, and no spatial graph convolution unit is used in the feature layer in the middle of the T-CNN. Other aspects of T-CNN are the same as ST-GCN. For brevity, we illustrate the structural details of each part of ST-GCN in the following.

| Spatial Graph-Conv
Based on the concept of a spectral graph convolution, we introduce the notion of a graph convolution operator "*G," multiplying a signal x R n in the spatial space with a kernel Θ, where the graph Fourier basis U R nÂn is a matrix of eigenvectors of the normalised graph Laplacian L ¼ I n À D À 1 2 WD À 1 2 ¼ UΛU T R nÂn (I n is an identity matrix, D R nÂn is the diagonal degree matrix with with multiplication between Θ and the graph Fourier transform U T x (Shuman et al., 2013).
We utilise Chebyshev polynomials and first-order approximations (Kipf & Welling, 2016) here to reduce expensive computations of kernel Θ in Equation (1) due to its O n 2 À Á complex multiplications. Kernel Θ can be restricted to a polynomial of Λ as Θ Λ ð Þ¼ P KÀ1 k¼0 θ k Λ k , where θ R K is a vector of polynomial coefficients and K is the kernel size determining the maximum radius of the convolution from central nodes. Chebyshev polynomial T k x ð Þ is traditionally used to approximate kernels as a truncated expansion of order K À 1 as where λ max denotes the largest eigenvalue of L (Hammond et al., 2011). Then the graph convolution in Equation (1) can be rewritten as where T k L R nÂn is the Chebyshev polynomial of order k evaluated at the scaled Laplacian L¼ 2L=λ max À I n . By recursively computing Klocalised convolutions through a polynomial approximation, the computational cost of Equation (1) can be reduced to O K ε j j ð Þ as Equation (2). By stacking multiple localised graph convolutional layers with a first-order approximation of graph Laplacian, a layer-wise linear formulation can be defined (Kipf & Welling, 2016). Further assumption λ max ≈ 2 can be made, due to the scaling and normalisation in neural networks. Thus, Equation (2) can be simplified to where θ 0 and θ 1 are two shared parameters of the kernel. θ 0 and θ 1 are replaced by a single parameter θ by letting θ 0 ¼ Àθ 1 ¼ θ to constrain parameters and stabilise numerical performances. By renorma- convolution can be expressed as The graph convolution operator "ÃG" defined on x R n can be extended to multidimensional tensors. For a signal with C i channels X R nÂCi , the graph convolution can be generalised to with the C i Â C 0 vectors of Chebyshev coefficients Θ i,j R K (y j is the output after graph convolution, C i and C 0 are the sizes of input and output of the feature maps, respectively). A graph convolution for 2D The flowchart of (a) the proposed ST-GCN framework and (b) the T-CNN framework for comparison. X is sized M*N (M = 25, representing the length of the mini-epoch channel; N = 23, representing the 23 channels) and the input W is sized N*N. The (i,j)th entry of the adjacency matrix W denotes spatial coupling correlation strength between the ith and jth of all the 23 different channels, the detailed calculation of which is presented in section 2.2. K t is the size of the temporal 1D-Conv filter, set as 3 here. C l l ¼ 1, 2,…,6 ð Þ is the number of filters in each layer. The estimated computational complexity of ST-GCN and T-CNN are 54M and 52M, respectively variables is denoted as "Θ Ã G X" with Θ R KÂC i ÂC0 . The input of ST-GCN is composed of M frames of EEG channels graph as shown in Figure 1a. Each frame X t can be regarded as a matrix whose column i is a C i -dimensional value of X t at the ith node in graph G t , as X R nÂCi (in this case, C i ¼ 1). For each time step t of M, the equal graph convolution operation with the same kernel Θ is imposed on X t R nÂCi in parallel. Thus, the graph convolution can be generalised to 3D variables, noted as "Θ Ã G χ" with χ R MÂnÂC i .

| Temporal 1D-Conv
Inspired by Gehring et al. (2017) that CNNs have the superiority of fast training in sequential-series analysis, we employ an entire convolutional structure on a temporal axis to capture sequential dynamic behaviours of EEG recordings. As shown in Figure 1a, a sequential convolutional layer contains a 1D convolution with a width K t kernel followed by ReLu (a rectified linear unit function) as a nonlinearity.
For each node in graph G, its corresponding sequential convolution explores K t neighbours of input elements, leading to shortening the length of sequences by K t À 1 each time. Thus, an input of a sequential convolution for each node can be regarded as a length-M sequence with C i channels as Y R MÂCi . The convolution kernel ð Þ . Similarly, the temporal convolution can be generalised to 3D variables by employing the same convolution kernel Γ to every channel node in G equally, noted as "Γ Ã T Y" with Y R MÂnÂCi .
The input and output of ST-Conv Blocks are all 3D tensors. For input x l R MÂnÂCl of block l, the output x lþ1 R MÀ2 KtÀ1 is computed by where Γ l 0 and Γ l 1 are the upper and lower temporal kernels within block l, respectively; Θ l is the spectral kernel of a graph convolution; ReLU Á ð Þ denotes a rectified linear unit function. After stacking three ST-Conv Blocks, the output features are fused as a flattened layer ( Figure 1a). We can obtain a final output Z R nÂc from the fully connected layer and calculate the classification result by applying a sig- where ω R c is a weight vector and b is a bias. We use the binary cross-entropy loss to measure the classification performance.
All models were trained on the CPU of DELL DESKTOP-D3UM3P9 with the Tensorflow platform in Microsoft Windows 10, and the optimiser used here is the Adam optimisation.

| Adjacency matrix
The spatial information carried by EEG signals plays an important role in AD/HC classification (Babiloni et al., 2005;Sakkalis, 2011;Tafreshi et al., 2019;Van Mierlo et al., 2014). The adjacency matrix W R nÂn of each mini-epoch, representing the spatial correlation along with the channel signals X R nÂM is one of the inputs of the ST-GCN model shown in Figure 1a. In some studies on brain disorders based on GCN Zeng et al., 2020;Zhang et al., 2021;Zhao et al., 2021), the relationship of different channels of EEG is short of effective prior guidance and the adjacency matrix cannot ensure the utilisation of the coupling information between each channel. To address these issues, we first apply functional connectivity, which has been proven to be useful in AD classification, to construct the adaptive adjacency matrix to extract spatial coupling features.
The raw EEG signals of each channel and the association of channels are modelled by a graph. The nodes of the graph denote the feature vector of EEG signals, which are the raw mini-epoch EEG data.

| Pearson correlation
The most well-known functional connectivity measure is the correlation, also called the Pearson correlation coefficient. It calculates the instantaneous linear interdependency between two signals based on the amplitudes of the signals in the time domain and it ranges from À1 to 1. The Pearson correlation coefficient between signal X and Y can be defined as follows: where E is the expected value, μ x and μ y are the mean values and σ x and σ y are the standard deviations of X and Y time series.

| Magnitude squared coherence
MSC is a linear method to estimate the interconnections between the PSD (power spectral density) of two signals in the frequency domain.
The MSC of signals X and Y can be written as where S xx f ð Þ and S yy f ð Þ are the PSD of signal X and Y, respectively, and S xy f ð Þ is the cross PSD at frequency f.

| Imaginary part of coherence
Given input signals x and y, wavelet cross-spectrum around time t, and frequency f can be derived by the Wavelet transforms of x and y, where * defines the complex conjugate and δ is assumed as a frequency-depending time scalar. WC at the time t and frequency f is derived as

| Phase locking value
Phase synchronization assumes that two oscillation signals without amplitude synchronization can have phase synchronization. The PLV is high-frequently utilised to obtain the strength of phase synchronisation (Van Mierlo et al., 2014). The instantaneous phase of a signal x is given by where x t ð Þ is the Hilbert transform of x t ð Þ, defined as where PV refers to the Cauchy principal value. The PLV for two signals is then defined as where Δt defines the sampling period and N indicates the sample number of each signal. The range of PLV is between 0 and 1, where 0 shows a lack of synchronization and 1 indicates strict phase synchronization.

| Phase lag index
Similarly to the calculation of PLV, PLI captures the asymmetry of the distribution of phase differences between two signals and is calculated based on the relative phase difference between the two signals where ; x jΔt ð ÞÀ; y jΔt ð Þ is the phase difference between two signals, sign stands for signum function, E is the expected value, and jj indicates the absolute value. PLI values range between 0 and 1, where 0 can indicate possibly no coupling and 1 refers to perfect phase locking. Ag/AgCL electrodes at a sampling frequency of 2 kHz by implementing a modified 10-10 overlapping a 10-20 international system of electrode placement, with a referential montage (linked earlobe reference). Thirty-minute resting state (task-freeparticipants were instructed to rest and refrain from thinking anything specific) EEG

| Wavelet coherence as adjacency matrix
To further explore the effectiveness of WC which has the best classification performance as the adjacency matrix, we conduct a statistical analysis of the WCs of all AD and HC. By averaging the WC adjacency matrices of the full band of all EO and EC mini-epochs for AD and HC, respectively, as shown in Figure 5, we can analyse their statistical characteristics. For ADs, the EC state has a slightly higher coupling strength than EO in the temporo-occipital area (the middle part of the adjacency matrix), while for HCs, the EC state has a bit higher overall coupling strength than EO. For both EO and EC, the strongest interchannel correlation of AD is less than 0.8 (the middle part of the left two images of Figure 5 temporo-occipital area), while HC has some F I G U R E 6 Threshold of averaged WC adjacency matrices of ADs and HCs F I G U R E 7 The 3D brain mapping of averaged WC adjacency matrices for EO state of ADs F I G U R E 8 The 3D brain mapping of averaged WC adjacency matrices for EO state of HCs The 3D brain mapping of averaged WC adjacency matrices for EC state of ADs F I G U R E 1 0 The 3D brain mapping of averaged WC adjacency matrices for EC state of HCs interchannel correlations close to 1 (the upper-left frontocentral and bottom-right posterior area of the right two images of Figure 5). For both EO and EC, the connectivity between ADs channels is lower than that of HCs in the bottom-right corner (posterior area).
To visually show the connectivity between the channels in the bottom-left corner for both EO and EC states, we thresholded the left two AD adjacency matrices in Figure 5 to screen out the channels with connectivity below 0.15. Figure 6 reports the corresponding cross-channel indexes for HCs. Through the software BrainNet, the 3D connectivity distributions in the brain of ADs and HCs are visually displayed for EO (Figures 7 and 8) and EC (Figures 9 and 10), respectively. Comparing Figures 7 and 8

| CONCLUSIONS
We proposed a Spatial-temporal Graph Convolutional Network (ST-GCN) for classifying Alzheimer's disease and Healthy Controls groups by jointly leveraging cross-channel topological association features and channel-specific temporal features of EEG recordings. Different from the currently leading GCN applications for diagnosing brain disorders, this method utilises brain functional connectivity methods for exploring the complex interactive information between EEG channels as well as the single-channel-based dynamic information. The main goal of this work was to determine whether the cross-channel topologically associated features constrained by the functional connectivity can reveal more hidden information in data and extend the applicability of the GCN-based algorithm. For the clinical AD and HC EEG recordings, ST-GCN has exhibited superior performance in achieving the highest classification accuracy with wavelet coherence as the adjacency matrix. For the tested data set, the overall classification accuracy of ST-GCN is higher than the classical T-CNN method on both eye states and different frequency bands, which suggests that spatial topology constraints can indeed mine brainwave features and thereby improve the classification accuracy. Different from existing studies for AD diagnosis that are based on either topological brain function characteristics of EEG (Fan et al., 2018;Jalili, 2017;Tait et al., 2020;Tylová et al., 2018;Zhao et al., 2019) or temporal dynamic features (Bi & Wang, 2019;Deepthi et al., 2020;Huggins et al., 2021;Ieracitano et al., 2019;Li, Wang, et al., 2021), the proposed ST-GCN considers both the adjacency matrix of functional connectivity from multiple EEG channels and corresponding dynamics of signal EEG channel simultaneously. It has the potential to pick up the anomaly of AD not only in the frequency response of local areas but also in the functional connectivity across different regions. Furthermore, the visualisation of wavelet coherence adjacency matrices increases the transparency of this solution by providing evidence of brain anomaly in terms of functional connectivity. This investigation is important as it will increase the trust in the developed AI-based solution. This algorithm lays a potentially effective strategy for the applications of other brain disorders.
In the present study, due to the limited number of subjects in the data set, the Leave-One-Subject-Out or cross-subject validation is not discussed to avoid biased conclusions caused by data insufficiency.
The accuracy of a hand-out cross-subject validation by ST-GCN and T-CNN can be found in Tables S1 and S2, Supporting Information. In this case, 65% of the subjects were used for training and the remaining 35% were used for testing. Although the overall accuracy is dropped significantly for both methods, the proposed ST-GCN still outperforms T-CNN.
To reduce volume conduction effects from a common reference, bipolar derivations were used to assess the degree of differences between various pairs of electrodes for two different cohorts of subjects. With this approach-the use of bipolar pairs of electrodes-the effects of volume conduction are reduced but not eliminated. We recognise that this work is based on a sensor level scalp EEG analysis, and we do not claim to be able to precisely localise the spatial characteristics underpinning the EEG sensor findings.

DATA AVAILABILITY STATEMENT
Data sharing is not applicable to this article as no new data were created or analyzed in this study.