Functional connectivity network estimation with an inter-similarity prior for mild cognitive impairment classification

Functional connectivity network (FCN) analysis is an effective technique for modeling human brain patterns and diagnosing neurological disorders such as Alzheimer’s disease (AD) and its early stage, Mild Cognitive Impairment. However, accurately estimating biologically meaningful and discriminative FCNs remains challenging due to the poor quality of functional magnetic resonance imaging (fMRI) data and our limited understanding of the human brain. Inspired by the inter-similarity nature of FCNs, similar regions of interest tend to share similar connection patterns. Here, we propose a functional brain network modeling scheme by encoding Inter-similarity prior into a graph-regularization term, which can be easily solved with an efficient optimization algorithm. To illustrate its effectiveness, we conducted experiments to distinguish Mild Cognitive Impairment from normal controls based on their respective FCNs. Our method outperformed the baseline and state-of-the-art methods by achieving an 88.19% classification accuracy. Furthermore, post hoc inspection of the informative features showed that our method yielded more biologically meaningful functional brain connectivity.

Mathematically, FCN can be formulated in a graph format, in which each node corresponds to a specific region-of-interest (ROI) in the brain and each edge delineates the relation between the blood-oxygen-leveldependent (BOLD) signals associated with a pair of ROIs. The most widely-used FCN estimation models are based on second-order statistics (or correlations) and, according to a recent review [24], these correlation-based methods are generally more sensitive than complex highorder methods. Therefore, in this paper, we mainly focus on correlation-based methods, and will briefly review some of them, including Pearson's correlation (PC) [25], sparse representation (SR) [26,27], and their variants. However, the FCN commonly has more "topological structures" than just sparsity (Sporns 2011). Currently, several studies have proposed more discriminative FCNs with improved estimations to diagnose neurodegenerative diseases. Most of these can be explained under a regularization framework, which illustrates that a reliable FCN estimation model should not only fit the data well, but also effectively encode priors of the brain organization [28]. In practice, the commonly-used priors include sparsity, modularity, group-sparsity, low-rank and scale-free [19,25,26,28,29], which can be transformed into corresponding regularization terms for FCN estimation. Moreover, the priors can also be transferred from the data modelling [23] or other highquality data [30]. Such approaches commonly improve the performance of FCNs and their diagnostic accuracy.
In this study, inspired by the fact that similar ROIs in FCNs tend to have similar connection patterns (i.e., inter-similarity structure), we present a novel FCN estimation scheme by encoding such a prior in the form of a graph regularizer. We formulated this prior into a graph-learning model with an additional graph/ manifold regularizer for FCN estimation, and further proposed an efficient global optimization algorithm. Additionally, the proposed method is not competing with any other FCN estimation model, since it only provides an effective inter-similarity module in FCN estimation.

Network visualization
For visual comparison of the FCN by PC, SR, GR and SGR methods, we constructed an FCN adjacency matrix W for each method (Figure 1), with all weights normalized between −1 and 1, for ease of comparison across the different methods. Figure 1 shows that the full correlation-based FCNs have different topology from the partial correlation-based FCN (i.e., SR, GR and SGR), since they adopt different statistical information by using different data-fitting terms. In addition, compared with SR and GSR, the FCN estimated by SGR tends to be better organized, illustrating the effectiveness of the performance.

MCI identification
A set of quantitative measurements, including accuracy, sensitivity, specificity, and area under the curve (AUC), are used to evaluate the classification performance of four different methods (PC, SR, GR and SGR). The mathematical definition of the first three measures are as follows: Here, TruePositive is the number of the positive subjects that are correctly classified in the MCI identification task. Similarly, TrueNegative, FalsePostive and FalseNegative are the numbers of their corresponding subjects, respectively.
The MCI vs NC classification results on the ADNI dataset are given in Table 1 and Figure 2, with SGR achieving the best results. As seen in Table 1, the partial correlation-based methods work better than the PC method, which reveals the effectiveness of partial correlation information. In addition, the SGR method strongly outperforms the SR and GR methods, which demonstrates the effectiveness of both sparsity and inter-similarity priors.

Sensitivity to network model parameters
The ultimate classification accuracy is particularly sensitive to the network model parameters. In Figure 3, we show the classification accuracy corresponding to different parametric combinations in the proposed SGR method. In addition, the classification accuracy is computed by the LOO test on all of the subjects. Consequently, Figure 3 shows that we achieve the best accuracy (93.70%) with λ = 2 1 (for sparsity) and γ = 2 5 (for inter-similarity).

Consensus connections
As the selected connections in each inner loop might be different, we recorded the consensus connections for the classification model in each inner LOOCV loop. As mentioned above, we selected the consensus connections with p-value < 0.01 in each loop, and the consensus connections are shown in Table 2 and  Table 2. Most of these discriminative connections were distributed in the frontal, occipital, and parietal lobes. All consensus connections had both enhanced and weakened functional connections in MCI patients. Furthermore, we projected them into the corresponding subnetworks and found that most consensus connections were mainly distributed in the default mode network (DMN), frontoparietal task control network, and sensory/somatomotor hand network.

Hub regions of functional network
According to the definition of "hubs", we identified hub nodes of the FCN estimated by SGR with λ = 2 1 and γ = 2 5 in MCI patients and NCs. As shown in Table 3, the common hubs of MCI and NCs were located mainly in bilateral middle frontal gyrus, bilateral inferior temporal gyrus, right superior frontal gyrus, right insula and right fusiform gyrus. Most of them were mainly distributed in the DMN, fronto-parietal task control and salience network. Furthermore, it is notable that some hubs were present only in MCI patients and absent in NCs, such as left superior frontal gyrus and left insula. Meanwhile, some hubs were present only in NCs and not in patients with MCI. They were located in the right middle temporal gyrus, left precentral gyrus and left postcentral gyrus. These discriminative brain regions between MCI and NCs were distributed mainly in the DMN, fronto-parietal task control and sensory/somatomotor hand network.

Altered topological properties of functional networks in MCI patients
Based on the FCNs estimated by SGR with λ = 2 1 and γ = 2 5 , several global graph theory metrics as shown in Table 4, including clustering coefficients (C p ), shortest path length (L p ), normalized clustering coefficient (γ), normalized characteristic path length (λ), small-world (σ), global efficiency (E global ) and modularity (Q), were calculated to elaborate on the topological properties of functional networks in MCI and NC groups. As shown in Table 3, both groups fit γ=C p real / C p rand > 1, λ=L p real / L p rand ≈1 and σ=γ/λ > 1. Therefore, FCNs estimated by SGR in MCI patients and NCs showed small-world topological attributes [59]. This means that the brain networks of the two groups maintain an economic and efficient brain network that optimizes the balance between local specialization and global integration [60][61][62]. Further comparison suggested that the L p values of MCI patients were lower than those in the NC groups (P<0.01), which indicated a reduction of network integration in global information processing in MCI patients. Moreover, the decreased values of γ and Q in MCI patients suggest a reduction of network segregation in local information processing.

DISCUSSION
Here, we proposed a new method to estimate functional brain networks (FCNs) to improve the accuracy of FCN-aided disese diagnosis. To test the effectiveness of www.aging-us.com 17331 AGING We introduced a graph regularizer into the proposed FCN learning framework for estimating intersimilarity FCNs, and combined it with sparse penalty for constructing both sparse and intersimilarity FCNs, which illustrated that the proposed method scales well. www.aging-us.com

AGING
We used the estimated FCNs to identify MCIs from NCs, and our experimental results showed that the proposed method outperforms state-of-the-art methods. Indeed, it achieved an 88.19% classification accuracy based on a simple feature selection (by means of t-tests with a fixed p value) and classification (via linear support vector machines (SVMs) with default parameter C = 1) pipeline.
We explored the selected consensus features (i.e., network connections) in our method and found that most of the selected features tend to be biologically meaningful according to recent studies (Greicius, 2008; Albert et al., 2011), which further illustrated our method's effectiveness. Moreover, the analysis of graph theory attributes based on our method can be used to further characterize altered patterns and pathological mechanisms underlying the topological properties of brain networks in MCI.
Our simple graph/manifold regularizer was used to estimate an inter-similarity FCNs for each subject. However, FCNs from different subjects tend to share some similar structures [18,19] and thus the proposed method may lose group information. Therefore, we proposed the development and application of a "group constraint", such as Group LASSO [63] or tensor low rank [5] to improve FCN computation.
The experiments in our methodological study here constitute a simple verification method for validating the effectiveness of the inter-similarity scheme, without considering other factors (e.g., similarity matrix or classification model). Therefore, we adopted the simplest Pearson Correlation matrix and linear SVM model. Future studies could further improve MCI classification performance.
The distribution of consensus connections and hub nodes indicated that the discriminative features obtained by our proposed method were mainly distributed in the frontal lobe, occipital lobe and parietal lobe of MCI patients. Projecting them into the corresponding subnetworks, we found that most of these brain regions were mainly distributed in the DMN, frontal parietal task control network, and sensory/somatic motor hand network, especially the DMN. Previous studies, such as [64] and [65], have pointed out that DMN facilitated the early diagnosis and prediction of AD. Our results also showed that DMN provided the most discriminating information, which was verified by our proposed method, whose reproducibility we demonstrated here.
The topological properties analyzed in our study suggested that both MCI patients and NCs fitted the small-world attribute in the global topological property. That is, the brain network of MCI and NC groups    conform to "economic small-world", which can provide rapid, real-time information processing across separate brain regions to maximize efficiency with minimal cost, eliciting resilience against pathological attacks [60,61,66]. Further comparison suggested that the value of Lp in MCI patients was lower than that in NC groups, which indicated a reduction of network integration in global information processing in the former. Moreover, the decreased values of γ and Q in MCI patients further suggested a reduction of network segregation in local information processing. Therefore, the altered pattern of topological properties obtained by our proposed method indicated a disruption of network integration and segregation of functions in MCI patients, which further demonstrated the pathological mechanisms of FBN.

AGING
In summary, the FCN commonly has more topological structures than just sparsity [13,14]. Due to the limited understanding of the human brain, estimation of the "ideal" FCNs to explore brain pattern or neuro-disease diagnosis is still an active field of research. Here, we focused on the inter-similarity of the FCNs and formulated it into graph regularizer constraints and validated the proposed method on MCI classification. Our results illustrated that additional topological priors can effectively improve diagnosis performance. Our www.aging-us.com 17335 AGING post-hoc analyses further showed that more biologically meaningful functional brain connections were obtained by incorporating the inter-similarity prior.

Data acquisition
To test the proposed method, we analyzed publiclyavailable neuroimaging data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.ucla.edu) [31]. ANDI was launched in 2003 by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, the Food and Drug Administration, private pharmaceutical companies and nonprofit organizations. Initially, the goal of ADNI was to define biomarkers for use in clinical trials and to determine the best way to measure the treatment effects of AD therapeutics. For alleviating the head motion effect and artifacts, we followed previously published strategies [33,34]. We calculated framewise displacement (FD) and excluded subjects with more than 2.5 min (50 frames) data of FD>0.5 from subsequent analyses [35]. Finally, depending on the automated anatomical labeling (AAL) atlas [36], the pre-processed BOLD time series signals were partitioned into 116 ROIs. At last, we put these time series into a data matrix 137 116 XR   .

Functional brain network estimation
After obtaining the fMRI data matrix X from the R-fMRI data, the subsequent task is the FCN estimation. The most commonly used FCN estimation methods are those based on correlation, and since they are more sensitive than some complex higher-order methods [14], we focused on the former in this study. For better notation, we first define the data matrix (i.e., BOLD signal matrix), X where T is the number of volumes and N is the number of ROIs. The fMRI time series associated with the ith ROI is represented by x , 1, , In addition, such approach can also be adopted on data of different modality, such as EEG [37,38].

Related methods
As the simplest FCN estimation scheme, Pearson's Correlation (PC)-based FCN estimation methods are widely using to study FCNs [39].
In Eq. (1), ii xx  is a centralized counterpart of x i . Due to the effect of the noises mixed in the fMRI data, PC always generates dense FCNs. Thus, a threshold is often used to sparsify the PC-based FCNs for filtering out noisy or weak connections.
Compared with PC measures, the full correlation across ROIs, the interaction among multiple ROIs is neglected due to their cofounding effects. In contrast, the partial correlation is proposed by regressing out the confounding effects from other ROIs. However, partial correlation-based methods can be easily ill-posed due to the need to invert the covariance matrix Σ T  XX. A base solution is to incorporate an l 1 -norm regularizer into the partial correlation model [26], which also naturally incorporates the sparsity prior (SR) of FCN. The model of SR is shown as follows: Note that the l 1 -norm regularizer in Eq. (4) below plays a key role in achieving a sparse and stable solution [26].
According to a recent review [1], functional brain network (FBN) estimation methods, from simple to complex, include Pearson's Correlation (PC), partial correlation [40], regularized partial correlation [41], Bayesian network [42], structural equation modeling [43], and dynamic casual modeling [44]. Each of these methods, in our view, can be considered as a trade-off among biological interpretability, computational efficiency, and statistical robustness. Consequently, we www.aging-us.com 17336 AGING can naturally incorporate a regularized term and statistical information into the objective function for constructing a new platform to estimate FCNs. More specifically, the platform can be formulated using a matrix-regularized learning framework as follows: where f(X, W) models the statistical information of FCN, and R(W) is the regularization term for incorporating biological priors of FCN and stabilizing the solutions. In addition, some specific constraints such as symmetry or positive semi-definiteness may be included in Δ for shrinking the search space of W, which provides an effective way for obtaining a better FCN. The λ is a hyper-parameter for controling the balance between the first (data-fitting) term and the second (regularization) term.
In fact, many recently-proposed FCN estimation models [45][46][47][48] can be unified under this regularized framework with different design of the two terms in Eq. (5) below. The popular data-fitting terms include 2 T F  W X X used in Eq. (2) and 2 F  X XW used in Eq. (4), while the popular regularization term is l 1 -norm [49]. Beyond unifying the existing methods, the regularized framework also provides a platform for developing new FCN estimation methods. In the following section, we will explain our proposed our model based on this framework.

Our methods
As we mentioned above, the regularization-based FCN estimation framework provides an effective scheme for incorporating the biological or physical priors of FCN. In this paper, we try to encode the inter-similarity prior (similar nodes tend to have similar connections) into the FCN estimation. The basic motivation is given in Figure 5.
In particular, we supposed that if two ROIs are defined to be similar, indicating that the connections from these two ROIs should have a similar connection pattern. In this way, we naturally formulate the inter-similarity  where L is the Laplacian matrix computed as 11 22   L I D SD , and D is a diagonal matrix with each item . The graph S can be defined in many ways such as Pearson's Correlation, morphological network [50], the network from the high quality data [30] or the predefined graph (must connect or must cannot connect). In particular, in this study, we only consider the positive connection of PC to construct L.
Moreover, since the estimated FCN should also be sparse, we further incorporate the l 1 -norm penalty into the FCN estimation, and the sparse and graph regularizer (namely SGR) is estimated as follows: . In addition, we adopt the partial correlation for the datefitting term due to its efficiency and effectiveness.  [51][52][53][54]. Here, we select the proximal method [55,56] Table 5.

Experimental setting
To validate the proposed FCN method, we conducted experiments on training a classifier for identifying MCI from NCs, based on estimated FCNs. Also, we adopted the SR and PC methods as a baseline for comparison. Since the FCN matrix is symmetric, we used its upper triangular elements as input features for classification. Unfortunately, in our experiment, each FCN had 116 nodes, and thus could produce 6,670 features (corresponding to 6,670 functional connections between 116 ROIs). Compared to the sample size (less than two hundred), the feature dimension was very high, which not only implied expensive computations but would also affect the generalization of the proposed methods. As pointed out in [18], both the feature selection and classifier design have a big influence on accuracy. Thus, in this study, we adopted the simplest feature selection method (t-test with p value < 0.01) and the most popular used SVM classifier [58], since our main focus was FCN estimation. In other words, had we not done www.aging-us.com 17338 AGING Due to the small sample size, we used the leave one out (LOO) cross-validation strategy to assess the performance of the methods, in which only one subject is left out for testing while the others are used to train the models and get the optimal parameters. To choose optimal parameters, an inner LOO cross-validation was conducted on the training data by grid-search strategy.
More specifically, for the regularized parameters  and  , the candidate values ranged in 5 4 4 5 [2 , 2 , , 2 , 2 ]   ; for the hard threshold of PC threshold , we used 11 sparsity levels ranging in [1%,1 0%, ,90%,100%] . For example, 90% means that 10% of the weak edges were filtered out from the FCN. It should be note that selected variables with p-values can be highly complementary to other features, improving the classification result. Thus, to alleviate this issue, the feature selection approach was only applied to the training data.

AUTHOR CONTRIBUTIONS
All authors developed proposed algorithm, architecture. Wei-kai Li and Xiao-wen Xu designed the evaluation experiments. Xin Gao and Wei Jiang preprocessed the fMRI. Pei-jun Wang revised the manuscript. All authors contributed to preparation of the article, figures, and charts.

CONFLICTS OF INTEREST
There are no conflicts of interest including any financial, personal, or other relationships with people or organizations for any of the coauthors related to the work described in the article.