Elsevier

NeuroImage

Volume 186, 1 February 2019, Pages 557-569
NeuroImage

Decentralized temporal independent component analysis: Leveraging fMRI data in collaborative settings

https://doi.org/10.1016/j.neuroimage.2018.10.072Get rights and content

Highlights

  • djICA enables temporal independent component analysis (tICA) of decentralized data.

  • Analyses of simulated and real fMRI perform well compared to the pooled case.

  • djICA provides for analysis of extremely large data sets using tICA.

  • Temporal components from djICA compare well to components from previous work.

Abstract

The field of neuroimaging has recently witnessed a strong shift towards data sharing; however, current collaborative research projects may be unable to leverage institutional architectures that collect and store data in local, centralized data centers. Additionally, though research groups are willing to grant access for collaborations, they often wish to maintain control of their data locally. These concerns may stem from research culture as well as privacy and accountability concerns. In order to leverage the potential of these aggregated larger data sets, we require tools that perform joint analyses without transmitting the data. Ideally, these tools would have similar performance and ease of use as their current centralized counterparts. In this paper, we propose and evaluate a new Algorithm, decentralized joint independent component analysis (djICA), which meets these technical requirements. djICA shares only intermediate statistics about the data, plausibly retaining privacy of the raw information to local sites, thus making it amenable to further privacy protections, for example via differential privacy. We validate our method on real functional magnetic resonance imaging (fMRI) data and show that it enables collaborative large-scale temporal ICA of fMRI, a rich vein of analysis as of yet largely unexplored, and which can benefit from the larger-N studies enabled by a decentralized approach. We show that djICA is robust to different distributions of data over sites, and that the temporal components estimated with djICA show activations similar to the temporal functional modes analyzed in previous work, thus solidifying djICA as a new, decentralized method oriented toward the frontiers of temporal independent component analysis.

Introduction

The benefits of collaborative analysis on fMRI data are deep and far-reaching. Research groups studying complex phenomena (such as mental disorders) often gather data with the intent of performing specific kinds of analyses. However, researchers can often leverage the data gathered to investigate questions beyond the scope of the original study. For example, a study focusing on the role of functional connectivity in mental health patients may collect a brain scan using magnetic resonance imaging (MRI) from all enrolled subjects, but may only examine one particular aspect of the data. The scans gathered for the study, however, are often saved to form a data set associated with that study—they therefore remain available for use in future research. This phenomenon often results in the accumulation of vast amounts of data, distributed in a decentralized fashion across many research sites. In addition, since technological advances have dramatically increased the complexity of data per measurement while lowering their cost, researchers hope to leverage data across multiple research groups to achieve sufficiently large sample sizes that may uncover important, relevant, and interpretable features that characterize the underlying complex phenomenon.

The standard industry solution to data sharing involves each group uploading data to a shared-use data center, such as a cloud-based service like the OpenfMRI data repository (Poldrack et al.) or the more-recently proposed OpenNeuro service (Gorgolewski et al., 2017). Despite the prevalence of such frameworks, centralized solutions may not be feasible for many research applications. For example, since neuroimaging uses data taken from human subjects, data sharing may be limited or prohibited due to issues such as (i) local administrative rules, (ii) local desire to retain control over the data until a specific project has reached completion, (iii) a desire to pool together a large external dataset with a local dataset without the computational and storage cost of downloading all the data, or (iv) ethical concerns of data re-identification. The last point is particularly acute in scenarios involving genetic information, patient groups with rare diseases, and other identity-sensitive applications. Even if steps are taken to assure patient privacy in centralized repositories, the repository maintainers are often forced to deal with monumental tasks of centralized management and standardization. This can require many hours of additional processing, occasionally reducing the richness of some of the contributed data (Poldrack and Gorgolewski, 2014).

In lieu of centralized sharing techniques, a number of practical decentralization approaches have recently been proposed by researchers looking to perform privatized analyses. For example, the “enhancing neuroimaging genetics through meta analysis” (ENIGMA) consortium (Thompson et al., 2014) allows groups to share local summary statistics rather than gathering all the original imaging data at a single site for a centralized analysis. This method has proven very successful when using both mega- and meta-analysis approaches (Thompson et al., 2014; Jack et al., 2008; Thompson et al., 2017; van Erp et al., 2016). Particularly, the meta-analysis at work in ENIGMA has been used for large-scale genetic association studies, with each site performing the same analysis, the same brain measure extraction, or the same regressions, and then aggregating local results globally. Meta-analyses can summarize findings from tens of thousands of individuals, so the summaries of aggregated local data need not be subject to institutional firewalls or even require additional consent from subjects (van Erp et al., 2016; Hibar et al., 2015). This approach represents one proven, widely used method for enabling analyses on otherwise inaccessible data.

Although ENIGMA has spurred innovation through massive international collaborations, there are some challenges which complicate the approach. Firstly, the meta-analyses at work in ENIGMA are effectively executed manually: a very time-consuming process. For each experiment, researchers have to write analysis scripts, coordinate with personnel at all participating sites to make sure these scripts are implemented there, adapt and debug scripts at each site, and then gather the results through the use of proprietary software. In addition, an analysis using the ENIGMA approach described above is typically “single-shot,” i.e., it does not iterate among sites to compute results holistically, as informed by the global data. From a statistical and machine learning perspective, single-shot model averaging has asymptotic performance with respect to the number of subjects for some types of analysis (Mcdonald et al., 2009; Zinkevich et al., 2010). However, simple model averaging does not account for variability between sites driven by small sample sizes and cannot leverage multivariate dependence structures that might exist across sites. Furthermore, the ability to iterate over local site computations allows not only continuous refinement of the solution at the global level but also greater algorithmic complexity, enabling multivariate approaches like group ICA (Calhoun and Adalı, 2012) and support vector machines (Plis et al., 2016), and increased efficiency due to parallelism, facilitating the processing of images containing thousands of voxels.

These, together with the significant amount of manual labor required for single-shot approaches to decentralization, motivates decentralized analyses which favor more frequent communication. For example, sites running a global optimization Algorithm can communicate following each iteration or after a number of iterations. In this paper, we further previous work in this direction (Baker et al., 2015) to develop iterative algorithms for collaborative, decentralized feature learning. Namely, we implement a real-data application of a successful algorithm for decentralized independent component analysis (ICA), a widely-used method in neuroimaging applications. Specifically, we show that our decentralized implementation can help further advance the as-of-yet mostly unexplored domain of temporal ICA of functional magnetic resonance imaging (fMRI) data. The resulting method is a ready fit for decentralized collaboration frameworks, such as the COINSTAC neuro-imaging analysis platform (Plis et al., 2016), which promises innovation in privacy-sensitive decentralized analysis.

Decentralized approaches such as ENIGMA allow research sites to maintain control over data access, thus providing plausible privacy protection at the cost of additional labor in implementing and updating a distributed architecture. For many applications, keeping data stored on sites without transfer of entire data samples may provide substantial privacy. These decentralized methods, however, are amenable to quantifiable measures of privacy, such as differential privacy (Dwork et al., 2006). In this work, we leave the addition of differential privacy aside, and focus on the presentation of djICA as a separate Algorithm first, with plausible privacy; however, we have pursued the addition of differential privacy to djICA elsewhere (Imtiaz et al., 2016).

One widespread analysis which stands to benefit from decentralization is temporal independent component analysis (tICA). In resting-state fMRI studies, we can assume that the overall spatial networks remain stable across subjects and experiment duration, while the activation of certain neurological regions varies over time and across subjects. Temporal ICA, first utilized for fMRI by Biswal et al. (Biswal and Ulmer, 1999), locates temporally independent components corresponding to independent activations of a subjects’ intrinsic common spatial networks (Stone et al., 1999). Both spatial and temporal ICA evidently provide reliable estimates of these intrinsic networks from fMRI data (Calhoun et al., 2001a; Gao et al., 2011; McKeown et al., 2003; Smith et al., 2012), but, unlike its spatial counterpart, temporal ICA allows spatial correlation between them (i.e. overlaps in the spatial maps) (Friston, 1998). Spatial and temporal ICA can result in similar estimated networks (Dodel et al., 2000; Calhoun et al., 2001a; Petersen et al.; Gao et al., 2011), while temporal ICA provides estimates not otherwise available to spatial ICA (Smith et al., 2012; Calhoun et al., 2001b), specifically for task-related data. Temporal ICA has also proven particularly useful for extracting information from high-resolution fMRI scans with overlapping spatial activations, a feature not available to spatial ICA (Boubela et al., 2013). Beyond estimation of novel temporal components, temporal ICA can also aid in isolating and removing noise from fMRI signals (Glasser et al., 2017; Beall and Lowe, 2007).

While useful, the existing literature for temporal ICA is limited. This can be partially attributed to computational complexity and dependence on statistical sample size, since temporal ICA requires more data points in the time dimension than the typical fMRI time series can offer (Calhoun et al., 2001a; Gao et al., 2011). Specifically, the ratio of the spatial to the temporal dimension often requires the temporal dimension to be at least similar to the voxel dimension. This often motivates the temporal aggregation of datasets composed of many temporally concatenated subjects. This temporal aggregation is also a key feature of the well-established group spatial ICA in the fMRI literature (Calhoun et al., 2001c; Correa et al., 2007; Calhoun et al., 2009). Beyond accumulation of subjects, other studies implementing temporal ICA for fMRI utilize higher-resolution scans to perform temporal ICA with fewer subjects (Boubela et al., 2013). Further methods reduce the spatial dimension to make a temporal ICA tractable: Seifritz et al. (Seifritz et al., 2002) use an initial spatial ICA to reduce spatial dimensional by locating a region of interest on which to perform temporal ICA, and Van et al. restrict the temporal analysis to a predetermined region of voxels deemed relevant to their particular problem of speech pattern monitoring (van de Ven et al., 2009).

Although temporal ICA would benefit tremendously from increasing the temporal frequency of scanners, or analyzing a large number of subjects at a central location, as mentioned above, this is not always feasible. To overcome the challenges of centralized temporal ICA, we present a novel method, decentralized joint Independent Component Analysis (djICA), which allows for the computation of aggregate spatial maps and local independent time courses across decentralized data stored at different servers belonging to independent labs. Our approach combines individual computations performed locally with global processes to obtain both local and global results. The resulting method for temporal ICA produces results with similar performance to the pooled-data case and provides estimated components in line with previous literature, demonstrating the effectiveness of decentralized collaborative algorithms for this difficult task.

In sum, the contributions of this paper are as follows:

  • In Section 2, we present decentralized joint independent component analysis (Algorithm 1, Section 2.2), which is closely related to Infomax ICA (Section 2.1) with decentralized PCA preprocessing (Section 2.3).

  • In Section 3 we include experiments and evaluation of djICA over different subject and site distributions for simulated data sets, including simulated fMRI data, thus providing a baseline result and proper motivation for real-data experiments.

  • In Section 4, we perform experiments which evaluate djICA on a real set of fMRI data in a simulated decentralized environment, using a novel pseudo-ground-truth evaluation scheme to compare our results with the pooled case.

  • Finally, in Section 5, we discuss the performance of djICA as a novel method for performing temporal ICA in decentralized settings, comparing our results with previously estimated results from the pooled temporal ICA literature.

Section snippets

Materials and methods

In this section, we provide the details of our method for decentralized joint independent component analysis and provide a basis for its evaluation. We first review Independent Component Analysis for the pooled case (where all samples are located on a single site) in Section 2.1, which provides basis for our presentation of the djICA Algorithm in section 2.2. In section 2.3 we discuss performing PCA preprocessing in a decentralized setting, and finally, in section 2.4, we discuss our methods

Experiments with simulated data

First, we test djICA in a simulated environment where we can manufacture a known ground-truth and use djICA to reconstruct this ground-truth under different mixing and site configurations. For this simulated case, we explicitly construct the signal matrices, S, and the mixing matrix A (using the methods described in section 3), such that the source matrices are statistically independent and provide, thus providing the assurance that a solution to underyling BSS problem exists. If djICA performs

Experiments with real data

The simulated experiments illustrate the clear benefit djICA provides by enabling the joint analysis of large decentralized data sets. In this section, we describe the methods utilized for real-data experiments with resting-state fMRI datasets. These experiments are intended to illustrate the effectiveness of djICA (Algorithm 1) in the particular domain of exploratory analysis of fMRI data. As mentioned earlier, the benefits of using this algorithm for fMRI analysis are numerous, and the

Discussion

In contrast to systems optimized for processing large amounts of data by making computation more efficient (Apache Spark, H2O and others), we focus on a different setting common in research collaborations: data are expensive to collect, are spread across multiple sites, and possibly not shareable directly. To that end, we proposed a distributed data joint ICA Algorithm that, in synthetic experiments, finds underlying sources in decentralized data nearly as accurately as its centralized

Conclusions & future work

We have presented djICA, a novel method for decentralized temporal Independent Component Analysis, which represents a step toward facilitating large, collaborative analyses of data in a decentralized fashion. We evaluated djICA on simulated and real fMRI data, with both experiments illustrating the benefits of djICA, namely the increased availability of a larger, otherwise inaccessible, subject pool shared across multiple sites. Additionally, since djICA does not communicate subject data across

Funding

This work was supported by grants from the NIH grant numbers R01DA040487, P20GM103472, and R01EB020407 as well as NSF grants 1539067 and 1631838. The author(s) declare that there was no other financial support or compensation that could be perceived as constituting a potential conflict of interest.

References (70)

  • M. Lindquist et al.

    Evaluating dynamic bivariate correlations in resting-state fMRI: a comparison study and a new approach

    Neuroimage

    (2014)
  • M.J. McKeown et al.

    Independent component analysis of functional mri: what is signal and what is noise?

    Curr. Opin. Neurobiol.

    (2003)
  • J. Sui et al.

    An ICA-based method for the identification of optimal fMRI features and components using combined group-discriminative techniques

    Neuroimage

    (2009)
  • M. Svensén et al.

    ICA of fMRI group study data

    Neuroimage

    (2002)
  • P.M. Thompson et al.

    ENIGMA and the individual: predicting factors that affect the brain in 35 countries worldwide

    Neuroimage

    (2017)
  • V. van de Ven et al.

    Neural network of speech monitoring overlaps with overt speech production and comprehension networks: a sequential spatial and temporal ica study

    Neuroimage

    (2009)
  • E. Allen, E. Erhardt, E. Damaraju, W. Gruner, J. Segall, R. Silva, M. Havlicek, S. Rachakonda, J. Fries, R. Kalyanam,...
  • S. i. Amari et al.

    A new learning Algorithm for blind signal separation

    Adv. NIPS

    (1996)
  • Z.-J. Bai et al.

    Principal component analysis for distributed data sets with updating

  • B.T. Baker et al.

    Large scale collaboration with autonomy: decentralized data ICA

  • R. Balan

    Estimator for number of sources using minimum description length criterion for blind sparse source mixtures

  • A.J. Bell et al.

    An information-maximization approach to blind separation and blind deconvolution

    Neural Comput.

    (1995)
  • B.B. Biswal et al.

    Blind source separation of multiple signal sources of fMRI data sets using independent component analysis

    J. Comput. Assist. Tomogr.

    (1999)
  • R.N. Boubela et al.

    Beyond noise: using temporal ICA to extract meaningful information from high-frequency fMRI signal fluctuations during rest

    Front. Hum. Neurosci.

    (2013)
  • Z. Boukouvalas et al.

    Sparsity and independence: balancing two objectives in optimization for source separation with application to fmri analysis

    J. Franklin Inst.

    (2017)
  • V.D. Calhoun et al.

    Unmixing fMRI with independent component analysis

    IEEE Eng. Med. Biol. Mag.

    (2006)
  • V.D. Calhoun et al.

    Multisubject independent component analysis of fMRI: a decade of intrinsic networks, default mode, and neurodiagnostic discovery

    IEEE Rev. Biomed. Eng.

    (2012)
  • V. Calhoun et al.

    Spatial and temporal independent component analysis of functional MRI data containing a pair of task-related waveforms

    Hum. Brain Mapp.

    (2001)
  • V.D. Calhoun et al.

    A method for making group inferences from functional MRI data using independent component analysis

    Hum. Brain Mapp.

    (2001)
  • V. Calhoun et al.

    Method for multimodal analysis of independent source differences in schizophrenia: combining gray matter structural and auditory oddball functional data

    Hum. Brain Mapp.

    (2006)
  • V. Calhoun et al.

    Independent component analysis for brain fMRI does indeed select for maximal independence

    PloS One

    (2013)
  • I. Daubechies et al.

    Independent component analysis for brain fMRI does not select for independence

    Proc. Natl. Acad. Sci. Unit. States Am.

    (2009)
  • S. Dodel et al.

    Comparison of temporal and spatial ica in fmri data analysis

  • C. Dwork et al.

    Calibrating noise to sensitivity in private data analysis

  • E. Egolf et al.

    Group ICA of fMRI toolbox (GIFT)

  • Cited by (9)

    View all citing articles on Scopus
    View full text