A Novel Method to Identify Mild Cognitive Impairment Using Dynamic Spatio-Temporal Graph Neural Network

Resting-state functional magnetic resonance imaging (rs-fMRI) has been widely used in the identification of mild cognitive impairment (MCI) research, MCI patients are relatively at a higher risk of progression to Alzheimer’s disease (AD). However, almost machine learning and deep learning methods are rarely analyzed from the perspective of spatial structure and temporal dimension. In order to make full use of rs-fMRI data, this study constructed a dynamic spatiotemporal graph neural network model, which mainly includes three modules: temporal block, spatial block, and graph pooling block. Our proposed model can extract the BOLD signal of the subject’s fMRI data and the spatial structure of functional connections between different brain regions, and improve the decision-making results of the model. In the study of AD, MCI and NC, the classification accuracy reached 83.78% outperforming previously reported, which manifested that our model could effectively learn spatiotemporal, and dynamic spatio-temporal method plays an important role in identifying different groups of subjects. In summary, this paper proposed an end-to-end dynamic spatio-temporal graph neural network model, which uses the information of the temporal dimension and spatial structure in rs-fMRI data, and achieves the improvement of the three classification performance among AD, MCI and NC.


I. INTRODUCTION
A LZHEIMER'S disease is an irreversible, progressive neu- rodegenerative disease that primarily affects the elderly and is one of the most prevalent types of dementia [1].As some studies [2] estimated that 1 out of 85 people of the world will be influenced by AD in the future with the aging of the global population is still increasing.It is characterized by symptoms including problems with language, cognitive decline, severe memory impairment [3], emotional instability, behavioral problems, and other common phenomena in the disease progress [4], which seriously restrict the individual's life and even get patients into trouble as the disease continues to intensity [5].
Mild cognitive impairment (MCI) is the prodromal stage of AD [1] that the Alzheimer's Disease Neuroimaging Initiative (ADNI) has divided MCI into two types as early mild cognitive impairment (eMCI) and late mild cognitive impairment (lMCI).Most of studies have reported that lMCI patients easier transform into AD than eMCI patients [6], [7], [8].
An increasing body of evidence [8], [9], [10] argued that the patients of MCI has ranged from 10% to 15% risk of converting to AD per year.On the other hand, MCI patients has more than 50% of the probability of converting to AD within five years [11].Therefore, its particularly important that establishing a set of biomarkers and achieving the prompt diagnosis, reasonably promptly treats and cures, which can alleviate the disease symptom and cause the patient to be reduce even out of danger.Now, the Alzheimer's Disease Assessment Scale-Cognitive Subscale (ADAS-cog) was considered as the gold standard [12] in evaluation trials of the progression for Alzheimer's disease at this stage.Currently, the method of diagnosis also relies on the diagnosis of the image by the clinician.Since clinicians identify the patient's state in terms of their experience, it could lead to misdiagnosis.
Computer science and technology has been widely used in many fields, and doctors can make more reliable judgments for patients by computer-aided diagnosis [13].The necessary of the computer-aided diagnosis is mainly reflected in the following four points: the complexity of the medical diagnosis process, a large amount of clinical data related to the disease, most of the prior knowledge to guide the decision of the diseases, and the substantial increase in computer storage and computing capabilities.Now, resting state functional magnetic resonance imaging (rs-fMRI) technology has received growing interest for evaluating the stages of AD and MCI [14], [15], [16], [17], [18], [19], [20].Rs-fMRI is one of the most powerful technology to measure the inherent activity of brain in the resting state, which is the non-invasive, non-radiation, high temporal resolution and suitable for the whole body tissues and organ technology, comparing with other medical imaging techniques, such as X-ray, CT and ultrasound.
Most of deep learning methods were manifested to outperform traditional machine learning methods, which deep learning method could construct the end-to-end pattern.However, these models [21], [22], [23] based on the static functional connectivity (FC) to capture the global characteristics instead of dynamic functional connectivity, leading to ignoring the change the spatial structure and temporal dimension during the resting states.Meanwhile, some studies also have demonstrated that functional connectivity is changed rather than static, which is very important for pathological analysis.Convolutional neural networks (CNN) have revolutionized data analysis for data with structured and Euclidean spatial structures, such as in text, images, and videos, which have a relatively fixed number of adjacent nodes and are translation invariant, allowing for shared convolutional kernels in Euclidean space.Currently, CNNs have significantly outperformed traditional machine learning methods in classification performance on various datasets.Even more, Lin et al. [24] has pointed out that brain functions and cognitive states are dynamically changing during the resting state rather than remaining at a single constant state.Hence, many new efforts [14], [16], [25], [26], [27], [28], [29] have shifted toward dynamic functional connectivity analysis for guiding and interpreting disease.
Dynamic functional connectivity is highly desired to characterize the information of local and global from the dimension of temporal and spatial [16], which can better analyze how the difference of connectivity affect AD and MCI between brain regions.
Although the traditional deep learning methods have been applied to extract the features of spatial structure in the Euclidean domain and have made great progress [30], many practical application scenarios are typically represented in the non-Euclidean space which generally including local and ordered information for executing operation.
The performance of traditional deep learning in processing non-Euclidean spatial data is unsatisfactory.In recent years, it is precise because graph neural network (GNN) has strong interpretability and excellent performance for graph structure data that has attracted great attention from researchers.At the same time, GNN has been widely used in recommendation systems, drug discovery and development [31], road forecasting, and other domains.GNN can be regarded as a generalized convolution [32], which through the use of graph neural network to embed node information, transfer the information of the edges in the graph, and combine the node features, edge features, and the structure of the graph to train the model.The brain that has unique topological properties is a complicated network.Using GNN to model the brain, different regions of interest in the brain correspond to nodes, and the relationship between different regions of interest are defined as edges.
In this regard, we proposed a GNN model based on the information of temporal dimension and spatial structure to identify the subjects of normal control, mild cognitive impairment and Alzheimer's disease.Furthermore, we evaluated the performance of other models on this dataset and found that ours outperformed than previously reported.

A. Data Acquisition
In this study, we used the rs-fMRI time series data, which were collected from Xuanwu Hospital Capital Medical University, Beijing, China.Patients were provided written informed consent.There are total of 645 scans from 473 subjects, including 211 normal controls (NCs), 156 MCI and 106 AD subjects.Then, we retained a total of 388 scans from the above subjects after preprocessing operations, including the scans of 145 NC,165 MCI and 78 AD respectively.The specific preprocessing will be introduced in the following section.

B. Data Preprocessing
All the resting state functional images for all studied subjects were preprocessed and analyzed respectively by a standard procedure using Data Processing Assistant for Resting-state fMRI (DPARSF 4.2, http://www.restfmri.net/forum/DPARSF) and Statistical Parametric Mapping (SPM12, http://www.fil.ion.ucl.ac.uk/spm).Specifically, the preprocessing steps are as follows: (1) The first 10 time points were removed before preprocessing to eliminate machine's signal non-equilibrium effects of machine and allow patients to fit the scanning noise.(2) The remaining volume were corrected by different time points slice timing correction.(3) Head motion realignment for 6 parameters by rigid body transformation, some of scans with a max head motion over 3.0mm translation or 3 • rotation were discarded.(4) Spatially normalized to Montreal Neurological Institute (MNI) EPI template and resampled to 3×3 × 3 mm 3 voxels.(5) The normalized images were spatially smoothed with 6-mm FWHM Gaussian kernel.(6) Detrend.(7) Nuisance covariates regression.(8) Temporal filter ranging from 0.01-0.08Hz.The demographic information of the subjects as shown in Table I.

C. Dynamic FC Construction
In this part, we introduce how to generate dynamic functional connectivity for classifying and understanding how the changed of brain.The brain has parceled into 116 region of interests (ROI) in terms of automated anatomical labeling (AAL) atlas.Then, the time series of brain ROI were extracted based on the sliding window approach.Next, the entire time courses of fMRI were split into multiple sub-segments, which has the same length, x i ∈ R N ×L , i = {1, . . ., K , where N is number of the node, L is the length of the time course of each segments, in term of this formula: where T denotes the whole length of time points, L denotes the length of sliding window, S denotes the length of step and K denotes the number of sliding window segments.Second, we obtain the multiple element of functional connectivity matrix during each sub-segment by the Pearson correlation coefficient (PCC) analysis, c k i j = corr (x k i , x k j ) that matrix kept the sign in PCC calculations, which illustrated that the strength of functional connectivity between brain area i and brain area j, in k-th sub-segments.In addition to eliminate some influence that caused by lower functional connectivity between different brain regions and some noise in the brain region, each functional connectivity matrix has been normalized, we accumulated K functional connectivity matrixes as X = K i=1 x i .Then, normalizing operation was performed to execute the X into X ′ which the value from −1 to 1.In our study, in order to simplify the graph structure and remove the influence caused by signal noise, we change the adjacency matrix E ∈ R N ×N by using the threshold method into the binary adjacency matrix.Specifically, we assign a specific value τ for each subject and get a new adjacency matrix A by this formula:

D. Graph Definition
As mentioned above, we aim to model brain network with functional connectivity between different ROIs in this section.A brain network can be represented as an undirected graph structure G = (V, E), which we define the ROIs as graph nodes V = {v 1 , v 2 , ..v n .As brain ROIs can be parceled into N regions based on AAL atlas that can be aligned by brain parceled atlases with their location in the structure space.We define the correlation between ROIs as graph edges E ∈ R N ×N , such as e i j represent a pair of (v i , v j ) linking vertices from v i to v j .The data obtained from the original time series data after Temporal block processing are used as the node characteristics of each node.The flowchart of the construct graph structure is illustrated in Fig 1.

E. Model Construction
1) Temporal Block: To capture the dynamic properties of the time series, we use temporal convolutional network (TCN), which is a variant of convolutional neural network used for sequence modeling tasks and could automatically learn the feature representations from the temporal dynamics in each brain region, combining with RNN and CNN architecture.The preliminary evaluation of TCN studies shown that simple convolutional structures outperform typical recurrent networks, such as LSTM, on multiple tasks and data sets, while exhibiting longer effective memory.
The causal convolutions were designed in TCN based on 1D full convolution.By using zero padding and cropping operations to ensure that subsequent layers the same length as the previous one.TCN without information leakage from the future to the past, convolution where output at time t is convolved only with elements of time t or earlier in the previous layer.
The history length of simple causal convolution can only be linearly related to the network depth, which makes it challenging to apply causal convolution to sequence tasks.The method based on the dilated convolution can effectively address the problem of the receptive field, and the receptive field expands exponentially as the depth of the network increases.More specifically, the dilated causal convolution operation of ROI time series x with filter f ∈ R n at time t is represented as: where d is the dilation factor, n is the filter size and t − d × j means direction of the past.The residual connections have been proved that is the efficient method to learn modification, which enables the network transmit information in a cross-layer manner, and the identify the mapping instead of the entire transformation, which is useful for very deep networks.Hence, the temporal block may be able to capture the subtle disruption of the brain functional tissues caused by disease, which our model can capture the differences between brain regions.
2) Spatial Block: In this model, in order to represent the complex spatial structure of brain as a simple embedding representation, we introduced the MetaLayer that put forward by Battaglia et al. [33].It could support constructing complex module from simple building blocks to generalizes and extends various neural network, such as stacked multiple fully connected layers into the multilayer perceptrons and stacked convolutional layers into convolutional neural networks.GNNs are originally designed to learn meaningful node level representations, thus a commonly adopted approach that generate Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
graph level representation is to globally summarize all the node representations in the graph.
Specifically, the graph as input data that the features of each node in the graph structure x, the attributes of all edges edge_attr and global features were operated by node model, edge model and global model.It could update these corresponding characteristic information.Final, an updated graph with the same structure as the original graph has been output.
Spatial block expects the adjacency matrix that the structure of entire of brain and the output of the previous block as input.We use the entire graph as the input of block and output the representation of graph which update the edge's attribute and the node's features of the nodes i and j.More details are explained in these articles [34], [35].
3) Pooling Block: In this part, we design pooling block in this graph neural network.As we all known, pooling layers combine with convolution networks have achieved great success in the deep neural networks.Some recent studies explore, a standard recipe for a representation of graph level is, generally, graph pooling network followed by graph convolution network.The graph pooling operation also can scale down the size of the input and enlarge the receptive fields.Graph pooling operation will coarsen some nodes information as a single node's feature for a final downstream classification task.Researchers seems to dictate that average pooling, max pooling and add pooling which are the simplest ways that directly employed to describe the embedding representation of the graph itself.These global pooling approaches might ignore possible hierarchies in the graph, and it is not conducive to the researcher to build an effective model for the prediction task on the entire graph, which can't aggregate node information in hierarchical way and not take neighborhood's features and relation between nodes into account.In order to describe to account for different importance of brain regions for the classification task, some of researchers focus on the hierarchical pooling operation procedure in GNNs by sampling or grouping nodes into some subgraphs.
Here, we adopt a pooling method [36] that called DIFF-POOL that gradually sampling or grouping nodes, which leverage node features and graph structure information in a hierarchical manner, until coarsen into a single node by some subgraph.DIFFPOOL can cluster some nodes as different groups, which collapse clusters as smaller nodes until single node to represent the whole graph.This method can effectively handle the weight contribution of different brain regions to the prediction results.In order to obtain the embedding of pooling nodes, a brain graph was defined by the adjacency matrix A ∈ R n×n and the node features matrix X ∈ R n×d , where n is the number of nodes and d is the dimension of the node features.To execute the DIFFPOOL operation, we construct a function f di f f pool to update the adjacency matrix and embedding feature of the layer: where A l+1 ∈ R m×m and X l+1 ∈ R m×d are the adjacency matrix and node features of the layer l + 1, m is smaller than n, which repeated this operation until the entire graph transform into the representation of the node.To achieve DIFFPOOL operate, we define the embedding matrix and cluster assignment matrix learned in the layer l by GNN architecture are respectively as: S l = so f tmax f gnn p ool (A l , X l ) where f gnn_ pool means the function of GNN pooling layers, Z l means the output of GNN pooling layers, which the dimension of Z l means the number of clusters, S l ∈ R n l ×n l+1 means the cluster assignment matrix that reflect the probability from the each node in the layer l to the each cluster in the layer l + 1, n l and n l+1 means the number of nodes in the layer l and l + 1 which n l is smaller than n l+1 .The end of representation of the adjacency matrix and node information are followed as: These output will be inputted to the next layer.Finally, the fully-connected layer employed the single node embedding which was flattened to predict the disease progression with single node's feature.The architecture of our model is shown in Fig 2.

A. Experimental Setup
In this paper, we proposed model architecture was shown in Fig 2 .that implemented by Pytorch, Pytorch Geometric [37], and Nilearn [38] in the Python environment using a NVIDIA RTX3090 with 24GB memory.
We proposed model consists of three modules.The TCN network consists of two TCN sub-modules, each of which contains two Conv1d, two RELU, two BatchNorm1d, two dropout layers, and residual connections.The output channels of the Conv1d convolution operation are 8, 16, 32, and 64, respectively, and the data are then flattened and put into the Linear layer for processing.The temporal block reduces the temporal feature dimension of each ROI from 229 dimensions to 128 dimensions.The spatial block consists of edge model and node model, both of which are composed of two Linear layers, and the updated node features and edge features are output.The DIFFPOOL module consists of three GNN embedding layers and two GNN pooling layers, each of GNN embedding layers contains three SAGEConv layers, three BatchNorm1d and concat functions.The output of the three SAGEConv layers is stitched together.Finally pass through the Linear layer, where the graph pooling rate is 0.25.
We divided the brain into 116 ROIs as the node of the graph.The setting of the hyper-parameters during the training stage were as follow: the learning rate is 5e-3, the weight decay is 0.01, the drop out is 0.3, batch size is 32, the number of epoch is 250, the input size and embedding size were 229 and 128, respectively.We set the sliding window length is 29 time points with the step size as 25.Adaptive moment estimation(Adam) was used as the optimizer.And we apply the NLLLoss as the loss function.We adopt the StepLR as the measurements of dynamic learning rate reducing, which decay the learning rate of each parameter group by gamma was 0.7 in every step size epoch.
To simplify the brain network, improve the signal-to-noise ratio [39] and eliminate some influence that caused by lower functional connectivity between different brain regions and some noise in the brain region, we use the threshold and set the appropriate value τ = 0.6, which reflect that τ = 0.6 has powerful ability to express the characteristic of dynamic functional connectivity.In addition, the subjects' data were divided into training set, validation set and test set according to 6:2:2 to determine the optimal parameters.If the subject data has collected multiple times, all the sample data of the subject will be divided into training set or validation set or test set when dividing the data set to prevent data leakage.Using five-fold validation on the training set.The training set is used to train the model, the validation set is used to adjust the model hyperparameters and the test set is used to evaluate the model generalization function.
We adopt the classification accuracy, precision, recall, F1 score(F1) and the area under the receiver operating characteristic curve(AUC) as the evaluation criteria to evaluate the classification performance.

B. Performance of the Different Threshold
In order to verify the dynamic temporal-spatial graph neural network model can efficaciously identify the disease progression stage of the subject.There are significant differences in the functional connectivity structure of brain corresponding to different thresholds τ , therefore τ was set to 0, 0.2, 0.4, 0.6 and 0.8 respectively.And the classification result of AD, MCI and NC as shown in Fig 3.

C. Performance of the Different Architecture
We adopted ablation experiments to determine the effect of different modules on model classification performance.The results are shown in Table II.In order to simplify the name, static temporal module is referred to as ST, dynamic temporal module is referred to as DT, static spatial module is referred to as SS and dynamic spatial module is referred to as DS.The four module architectures in the neural network model have been implemented based on the sections mentioned above.Specifically, the dynamic spatio-temporal graph neural network model that composed of DT and DS modules was expressed as "DT+DS".The other models are named in this way.Table II has reported the classification performance of our ablation experiments which comparing several architectures with different modules.We confirmed the effect of different parts in the model in order to identify the dynamic spatio-temporal graph neural network can boost the performance for predicting the label of subjects.
It can be seen from the Table II, the result of ablation experiment shown that the impact of different modules on model performance were studied by considering accuracy and the other different evaluation metrics.It is obviously that ST + SS has the worst performance among the four models, which contains static spatial block and pooling block without any redundant modules.The BOLD signal of each ROI was input into the model as the node feature.The temporal block was integrated into ST + SS to form DT + SS model, which could capture the changing characteristics of time series in different brain regions.Compared with ST + SS, the accuracy and AUC has improved for classifying AD, MCI and NC.
The static spatial in the ST + SS module was replaced by dynamic spatial, which generating ST+DS model that can be found all evaluation metric has bigger promotion.It's manifested that the abundant feature information of graph structure helps to find out the correlation between brain regions, which taking the spatial structure and temporal dimension into account are more meaningful.
To further study the affected of dynamic temporal block, which has been added based on the ST+DS.The accuracy of DT+DS is 5.4% more than the model without DT module, which the accuracy is about 83.78%.The other metrics are also improved.

D. The Difference of Brain Region
The distinction of brain region among three groups were visualized as shown in Fig 5 , which box plots calculating the statistically significance differences between different brain regions based on the independent sample t test.Moreover, the brain regions with statistical significance can be found in the region shown in Fig 5 .It could reveal that some brain region that the precentral gyrus, the orbital middle frontal gyrus, olfactory cortex, medial superior frontal gyrus, insula, anterior cingulate gyrus and collateral cingulate gyrus, posterior cingulate, hippocampus, lingual gyrus, transverse temporal 6 gyrus, middle temporal areas inferior cerebellum and Vermis7 are significantly discriminative areas for AD, MCI and NC classification tasks.
As described in the previous section, we further investigated the specific differences between the different groups in data driven manner and transformed the interaction between the group of the brain regions into the one shown in Fig 4 according to the p value.The pictures of functional connections between different areas on glass brain.The strength of the connections between different brain regions is indicated by the thickness of the lines.The thicker the lines, the darker the colors, shown that the stronger the connections between brain regions.It is worth noting that, the discrimination between AD and NC in the above brain area is the largest, followed by MCI and NC, and finally AD and MCI.It is demonstrated that the functional connectivity was easily suffer from the influence of the progression of diseases.We can derive several interesting observations by comparing the glass brain with different groups from Fig 4 .Apparently, the difference between the disease subjects and the healthy normal control subjects is significant.First, there are the most significance difference brain regions between AD and NC, which proved that the NC has more discriminative connectivity regions associated with cognitive.Second, comparing the connectives among three subfigure found that the connectivity of the orbital middle frontal gyrus and the olfactory cortex with other brain regions is decreasing, which indicated that these regions might also be link with AD/MCI and could provide crucial information for the AD/MCI prediction.
There findings are consistent with previous studies on AD/MCI classification, and the difference among three groups also illustrates that the functional connectivity mainly affected by the disease progression.Although deep neural networks are usually regarded as black box, we further study the efficiency of our model in identifying functional connectivity brain regions that are significant biomarkers in NC, MCI and AD patients.These results illustrated that model could extract the information of spatial structure and temporal dimension from rs-fMRI data in the classification of AD, MCI and NC.

IV. DISCUSSION
We propose a graph representation model to integrate the information of spatial structure and temporal dimension using the dynamic functional connectivity of rs-fMRI data, which taking the advantages of TCN and GNN to learn spatio-temporal information in for the classification task.This framework mainly relies on temporal block and spatial block to explore the common and the complementary information between subjects, which leverage time series signals and relationship between brain nodes and edges.Temporal block integrates the message of BOLD signals and generates features with lower dimensions to characterize the temporal domain signals that not only could refine information of global and local level from the aspect of temporal but also boost robust the generalization performance of our model.
In this section, we first evaluate the impact of different elements on model performance, then discuss the brain regions with statistically significant difference between different groups, and finally present the limitations of possible future research directions.

A. The Influence of Spatio-Temporal on Model Performance
Static methods [40], [41] have been widely developed to capture the feature of functional connectivity in previous researches, which are not sufficient to identify significantly changes during the period of MRI scan.Currently, researchers interest has been shifting towards dynamic [4], [16], [26], [28] features, which is called dynamical functional connectivity(dFC).DFC could reveal the dynamical change of the functional brain networks and represents complex and high-order cognitive function information, which can be a supplement for the traditional sFC.
The temporal information is highly mixed in the original fMRI signal.Thus, the temporal convolution network has been realized to capture the dynamic changes in temporal block that includes the advanced modules such as dilations and residual connections.As shown in Table II, ST+SS show weaker performance than the other three.The reason might be that the original BOLD signal data contains redundant feature and noise, which affects classification performance.However, conventional methods shown a few power to identify the changes in the time course.Graph neural network is a promising data-drive tool to dispose non-Euclidean problem in different fields.Recently, some of graph neural networks [10], [24], [42] attempt to predict the status of the patients and understand the human brain network [24].The DIFFPOOL method that we applied will help support efforts to better understand the similarity of brain regions, which is pretty need in the field.
Comparing the model proposed in this paper with other models, it is found that the effect of graph neural network is better than that of deep neural network is better than that of deep neural network applied Euclidean space.Using the dynamic spatio-temporal of graph neural network can pretty good classification results.The indicator data is shown in Table III.It can be found that the experimental results of the traditional CNN in the three categories of AD, MCI and NC are not better.At the same times, it can also be found that STNets, Spatial-temporal dependency modeling and network, is slightly inadequate compared with GNEA, which adopts the GNN architecture, because GNEA can fully consider the structural relationship between different brain regions.

B. Statistical Analysis
According to Salvador's study [43], we merge 116 brain regions into 7 areas, such as frontal, parietal, occipital, temporal, medial temporal, subcortical and cerebellum, each parts contains the corresponding ROI.The average values Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.including ROI in the seven regions were calculated respectively, and then normalized and threshed to obtain the results as shown in Fig 6 .The color bar represents the average functional connection values corresponding to different regions.The functional connectivity decreases with the progression of the disease in certain brain regions, while the other increases.The strength of the functional connectivity between the parietal area and the frontal area decreased.There is synchronization between brain regions when the correlation coefficient is positive.The stronger the correlation, the larger the value.Strength of functional connectivity corresponding to different brain regions were weakened with the development of the disease.And the degree of self-correlation within the medial temporal lobe and temporal region was gradually weakened with developing of the disease.

TABLE III THE PERFORMANCE OF EXPERIMENT RESULTS OF DIFFERENT DEEP LEARNING MODELS
Comparing MCI and NC in Fig 6 , it was found that the functional connectivity between the parietal and temporal regions, as well as between the temporal and frontal regions, were significantly weakened and did not appear below the specified threshold value of 0.5.The functional connectivity of frontal and subcortical region was also decreased in AD and MCI, which indicate that this region gotten weak, leading to decline the functions of the relevant brain regions.As shown in Fig 7 , the figure shows the brain regions with significant differences between the three groups of samples.On the whole, the variety of the box plots of different brain regions in each sub-figure in Fig 7 , which could be in line with the tendency, rising or falling at the same time.In particular, the picture of significant difference graph between the central anterior gyrus and the transverse temporal gyrus where located in the first row and fourth column, in which the median values of AD, MCI, and NC increase in sequence, which indicates that as the disease progresses there is a significant difference between this region and the functional connection of the brain is declining.In contrast, although the insula and the middle temporal gyrus located in the fourth row and the first column also have significant differences among the three groups, the median values of AD, MCI, and NC decrease successively, which shows that the functional connectivity of the patients is stronger than the normal control, which may be due to the reduced functionality between certain brain regions that shows a compensatory characteristic.
In  A superiority of an interpretability constructing using the variant of functional connectivity adjacency is the clearly to describe the brain region used to select the cluster from our module.When looking at how the connection of the brain regions from the pictures of glass brain Fig 4, we found that compared with normal control, MCI and AD patients have significant differences in the number of functional connections in the above-mentioned brain regions.At the same time, we can see that the degree of nodes in the central anterior gyrus and other brain regions of patients is significantly lower than that of normal control subjects.Different from these previous research findings, we accepted that the two areas of the orbitofrontal cortex and the olfactory cortex may also be involved in changes in the cognitive function of the brain, which in turn may be factors affecting mild cognitive impairment and Alzheimer disease.

V. CONCLUSION
In this paper, we propose a novelty end-to-end dynamic temporal and spatial graph neural network that can simultaneously learn temporal and spatial information of dynamic functional connectivity for classification of NC, MCI and AD to improve the discriminative ability of the learned representation from the rs-fMRI time course data.In this experiment both dynamic spatiotemporal block provide advantages when compared to static temporal and spatial block, respectively.Specifically speaking, temporal block and spatial block are contained to capture the variation of the dynamic functional connectivity along multiple sub-segments, and coarsen the representation embedding of graph-level as node-level by pooling block.Experimental studies had illustrated that our dynamic temporal and spatial graph neural network not only enhance the classification performance, but also provide a novel perspective to understand biomarkers of NC, MCI and AD prediction.

VI. LIMITATIONS AND FUTURE WORK
Sever technical issue need to be considered in the future to further improve the performance of the proposed method.One meaningful next step is to use more modality data and more powerful feature extractor to boost the performance.It's takes into account the heterogeneous of features between different modality data, which can enrich the information contained in the features more comprehensive.Second, in our previous work, we were interested in exploring the relationship between different subjects by projecting the similarity of the spatial structure of the brain between patients and combining the other features for classifying different groups.However, the time information of the brain area and the hierarchical structure between the brain areas have been ignored.It is worth noting that Transformer architecture has attracted the attention of researchers.Thus, we will investigate the integration the graph attention mechanism with the dynamic spatiotemporal model for predicting the progression of AD/MCI and evaluate the model on the multisite dataset with other brain disease in the future work.

Fig. 1 .
Fig. 1.The pipeline of construct graph structure.Node represents ROI and the temporal information obtained through temporal blocks processing is used as node feature.

Fig. 2 .
Fig. 2. The architecture of dynamic spatio-temporal graph neural network.The module of graph structure with temporal information is referred to as DT, and the module of spatial block is referred to as DS.

Fig. 3 .
Fig. 3. Model performance corresponding to different threshold value.The results of models with different τ thresholds are shown in colors.

Fig. 4 .
Fig. 4. The pictures of functional connections between different areas on glass brain.The strength of the connections between different brain regions is indicated by the thickness of the lines.The thicker the lines, the darker the colors, shown that the stronger the connections between brain regions.

Fig. 5 .
Fig. 5.The boxplot of relationships between different brains with statistical significance based on the independent sample t test.

Fig. 6 .
Fig. 6.The area of functional connectivity strength greater than 0.5.

Fig 7 ,
shown the pairs of brain areas that significantly differ between the groups, that statistical analysis revealed 16 pairs, consist with shown in Fig5 and Fig 4.

Fig. 7 .
Fig. 7.A view of the functional connections between different brain regions.In order to conveniently display the functional connections between different brain regions, we divided them into three groups: coronal plane, sagittal plane and horizontal plane.Each group of bar plots corresponds to the functional connections between different brain regions.

TABLE II THE
PERFORMANCE OF ABLATION EXPERIMENT RESULTS OF DIFFERENT MODULES