Keywords

1 Introduction

Brain connectivity depicts the functional relations between different brain regions [1]. Investigating time-varying dynamic changes in brain connectivity has been increasingly studied in recent years [2]. Many works [3,4,5] have studied brain connectivity and investigated how brain connectivity changes during adolescence and how it differs between different age groups, e.g., children and young adults.

A number of statistical learning models, e.g. group independent component analysis [6] and canonical correlation analysis (CCA) [7], have been applied to multi-modal study to analyze complimentary information between different imaging modalities and also applied to imaging-genetic study to detect interactions between genetic factors [8], e.g. single nucleotide polymorphisms (SNP), and endo-phenotypes, e.g., functional magnetic resonance imaging (fMRI). Among these methods, CCA has been widely used to detect multivariate correlations between two datasets. CCA reduces data dimensionality by projecting higher dimensional data into lower dimensional spaces. Many variants of CCA, e.g., multiple CCA [9], multi-set CCA [10], sparse CCA [11], structured sparse CCA [12], have been developed to address more specific challenges in real data applications. Despite the wide application of CCA, canonical variables lack label related information, which may be a limitation to CCA’s application and restrict the interpretation of its output. To address the limitation, Gross et al. [13] proposed a model, collaborative regression, which identifies label related correlations by incorporating regression into CCA’s objective function. However, according to the simulation results in [13], collaborative regression may result in poor performance for prediction. This may be due to the restriction on coefficient vectors which requires the projection of correlation and that of the regression to be in the same direction.

In this paper, we proposed a novel model, deep collaborative learning (DCL), which addresses the limitation of collaborative regression by combining correlation analysis and regression method via deep networks which may lead to higher classification accuracies and better correlation detection. The performance of DCL model was verified by the experiments in our work. In addition, many interesting discoveries about brain connectivity were found.

The rest of the paper is organized as follows. The limitation of existing methods and how the proposed model addresses the limitations were introduced in Sect. 2. Section 3 introduces the collection and preprocessing of brain connectivity data. Conclusions and discussion of the results and possible limitations/extensions of the work were in Sect. 4.

2 Method

2.1 Overview of Linear Canonical Correlation Analysis (CCA)

Canonical correlation analysis (CCA) [7] is a model widely used for analyzing linear correlations between two data. It projects original data into the optimal directions (canonical loading vectors) with the highest Pearson correlation.

Suppose we have two data matrices \(X_1 \in \mathbb {R} ^{n\times p}, X_2 \in \mathbb {R} ^{n\times q},\) CCA seeks two projection matrices \(U_1\) and \(U_2\) by optimizing the following objective function

(1)

2.2 Deep CCA

Deep CCA was proposed by Andrew et al. [14] to detect nonlinear cross-data correlations. As illustrated in Fig. 1(a), deep CCA introduces a deep network representation before applying CCA framework. Unlike linear CCA, which seeks the optimal loading matrices \(U_1,U_2\), deep CCA seeks the optimal network representation \(f_1(X_1),\ f_2(X_2)\), as shown in Eq. (2).

$$\begin{aligned}&(f_1^{*},f_2^{*}) = \mathop {\text{ argmax }}\limits _{f_1,f_2} \big \{ \mathop {\text{ max }}\limits _{U_1,U_2} \frac{U_1'f'_1(X_1)f_2(X_2)U_2}{\Vert f_1(X_1)U_1 \Vert _2 \Vert f_2(X_2)U_2 \Vert _2} \big \} \end{aligned}$$
(2)

where \(f{_1}\), \(f{_2}\) are two deep networks as illustrated in Fig. 1(a).

Fig. 1.
figure 1

A figure showing the work-flows of deep CCA and deep collaborative learning. Data \(X_1\), \(X_2\) are input; deep networks \(f_1\), \(f_2\) work on \(X_1\), \(X_2\) and yield \(H_1\), \(H_2\) as output, to which CCA is or collaborative regression was applied subsequently. For deep CCA, the optimization problem is to find the optimal network \(\hat{f_1}\), \(\hat{f_2}\) with the highest canonical correlation. For deep collaborative learning, the optimization problem is to find the optimal network \(\hat{f_1}\), \(\hat{f_2}\) which give both the highest canonical correlation and the smallest prediction error

The introduction of deep network representation leads to a more flexible ability to detect both linear and nonlinear correlations. According to experiments on both speech data and handwritten digits data [14], deep CCA’s representation was more correlated than that by other correlation analysis methods, e.g., linear CCA, kernel CCA.

2.3 Deep Collaborative Learning (DCL)

CCA, as well as deep CCA, is a method of data representation. However, CCA based methods have not found wide application compared with PCA based methods. As a method of dimension reduction, CCA’s output (canonical variables) lacks connections to label information and the detected correlations may be difficult to interpret consequently. To address the limitation of CCA, Gross et al. [13] proposed a new model, called collaborative regression, whose formulation is shown in (3). Specifically, given a label data \(Y \in \mathbb {R} ^{n\times 1}\), collaborative regression maximizes the following objective function

(3)

Collaborative regression addresses CCA’s limitations by taking advantage of label information so that it can detect canonical correlations which are label related. However, according to the simulation in [13], collaborative regression may lead to poor performance in terms of classification accuracies and therefore may not be suitable for brain connectivity study. This may be due to the coupled restriction on coefficient vectors \(u_1,u_2\) which requires the projection of correlation and that of the regression to be in the same direction.

To address these limitations of both CCA and collaborative regression method, we propose a novel model, deep collaborative learning (DCL), which incorporates regression into CCA in an uncoupled way via deep networks. Suppose we have two modality data \(X_1 \in \mathbb {R} ^{n\times p}, X_2 \in \mathbb {R} ^{n\times q}\) and a label data \(Y \in \mathbb {R} ^{n\times 1}\), where n denotes sample size (number of subjects) and p, q are the dimensionality of feature of \(X_1,X_2\) respectively. The formulation of deep collaborative learning is shown in Eqs. (4) and (5) and its framework is illustrated in Fig. 1(b).

(4)
(5)

where \(H_1 = f_1(X_1) \in \mathbb {R} ^{n\times r},\; H_2 = f_2(X_2) \in \mathbb {R} ^{n\times s}; f_1,\ f_2\) are two deep networks as illustrated in Fig. 1(b); \( \varSigma _{ij} := H_i'H_j\); and \(\Vert A \Vert _{tr} := \text {Trace} (\sqrt{A'A}) = \varSigma \sigma _i\); \( U_1,\; U_2 \) in Eq. (4) subject to \(U_1'\varSigma _{11}U_1=U_2'\varSigma _{22}U_2 = \mathbf {I}\).

As shown in Eqs. (4) and (5), deep collaborative learning seeks the optimal network representation \(H_1 = f_1(X_1), H_2=f_2(X_2) \) instead of the optimal projection vectors \(u_1,u_2,\beta _1,\beta _2\) and the coupled restriction can be relaxed consequently. Relaxation of the coupled restriction leads to a better performance on both prediction/classification and correlation analysis compared with linear collaborative regression.

3 Application to Brain Connectivity Study

3.1 Introduction of Brain Connectivity

We next apply the DCL model to the study of brain connectivity and development. Brain connectivity depicts the anatomical or functional associations between different brain regions or nodes [1]. It is of interest to investigate how brain connectivity changes during adolescence and how it differs between different age groups, e.g., children, young adults, which may further contributes to the study of normal and pathological brain development. The proposed model, deep collaborative learning, is a network representation based model which can detect signals having both strong correlations (reflecting brain connectivity) and good discriminative power (reflecting differences between age groups) and therefore is very suitable for the study of brain connectivity and development.

3.2 Brain Connectivity Data

Several brain fMRI modalities from the Philadelphia Neurodevelopmental Cohort (PNC) [15] were used in the experiments. PNC cohort is a large-scale collaborative study between the Brain Behavior Laboratory at the University of Pennsylvania and the Children’s Hospital of Philadelphia. It contains multi-modal neuroimaging data (e.g., fMRI, diffusion tensor imaging) and multiple genetic factors (e.g., singular nucleotide polymorphisms of SNPs) from adolescents aged from 8 to 21 years. There were three types of fMRI data in PNC cohort which were collected during different task states: resting-state fMRI (rs-fMRI), emotion task fMRI (emoid t-fMRI), and nback task fMRI (nback t-fMRI). Two types of labels, age and Wide Range Achievement Test (WRAT) score [16], which is a measure of comprehensive cognitive ability, were used for classification and correlation analysis.

3.3 Results

We compared the performance of the DCL model to that of CCA, deep CCA (DCCA), collaborative regression (CR) for both age classification and the classification of cognitive ability. For age groups, the top 20% (in terms of age) subjects were extracted as young adults group (aged 18 to 22) while the bottom 20% were extracted as children group (aged 8 to 11). For cognitive ability group, the top 20% (assess via the WRAT score) of individuals were extracted as a high cognition group (WRAT 114–145) while the bottom 20% were extracted as a low cognition group (WRAT 55–89). Data were separated into a training set (60%) and a testing set (40%). The training set was used for DCL’s network training and the trained network was applied to testing set for classification subsequently. All preprocessing methods, including data augmentation, data standardization, etc., were performed on training set and testing set separately. All hyper-parameters, including momentum, activation function type, learning rate, decay rate, batch size, maximum epochs, the number of layers, the number of nodes in each layer, and the dimensionality of canonical variables, were chosen using grid search based on the training data. To verify the performance of the DCL model, we also included the results of other competitive methods, including deep CCA and collaborative regression (CR). As CCA based methods require at least two datasets as input, different data-pair combinations were used as data input: rs-fMRI and nback t-fMRI (rest-nback); rs-fMRI and emoid t-fMRI (rest-emoid); nback t-fMRI and emoid t-fMRI (rest-emoid). For each data combination, we tested the performance of deep CCA, CR, and DCL, and the results were shown in Fig. 2 (classifying age groups) and Fig. 3 (classifying WRAT groups). We only included accuracy as a criterion for evaluating classification performance as the two groups had balanced numbers of subjects (top 20% versus bottom 20%).

Fig. 2.
figure 2

A figure showing the comparison of the performances of different methods on classifying different age groups (young adults (aged 18–22) vs. children (aged 8–11)). The full names of the methods are deep CCA (DCCA), collaborative regression (CR), deep collaborative learning (DCL). The numbers appearing in the figure were classification accuracies (%).

Fig. 3.
figure 3

A figure showing the comparison of the performances of different methods on classifying high/low WRAT scores (cognitive ability). The full names of the methods can be found in the caption of Fig. 2. The numbers appearing in the figure were classification accuracies (%).

From Figs. 2 and 3, the proposed model, deep collaborative learning, achieved higher classification accuracies than two CCA based models and collaborative regression for both classifying age groups and classifying cognition groups, which may be a result of the nonlinear representation of deep network and the combination of prediction and correlation detection. Collaborative regression performed better than deep CCA but worse than DCL in terms of classification, which may be due to the incorporation of label information. The high classification accuracy (over 90%) indicates that different age groups (e.g. young adults and children) and different cognition groups (high WRAT scores and low WRAT scores) may exhibit different brain functional connectivity patterns and functional brain connectivity might be used as a finger-print to identify different subjects. In addition, it can also be seen from Figs. 2 and 3 that the classification accuracy of age groups is higher than that of cognition groups which might be due to the fact that age is a fixed phenotype while cognition score is just a rough measure which is not as accurate and consistent as age.

4 Discussion and Conclusion

In the work we propose a new model, DCL, which captures label related correlations and performs well on classification by combining correlation analysis and regression using deep networks. According to the results, DCL performed better than deep CCA and collaborative regression, which may demonstrate that the relaxation of restriction on projections using deep networks help achieve higher classification accuracies. The superior power of DCL on both correlation detection and classification makes DCL a suitable model for brain connectivity study, whose research interest focuses on analyzing correlations of functional networks and how different subject groups exhibit different brain connectivity patterns. From the results, both different age groups and different cognition groups exhibit significant differences in brain connectivities. In addition, brain connectivity tends to be more discriminative when used to classify age groups than to classify WRAT/cognition groups. The framework of DCL can be easily extended to more than three datasets integration as in [17] and may become more suitable to deal with brain imaging data if replacing fully connected networks with convolutional neural networks.