Elsevier

NeuroImage

Volume 199, 1 October 2019, Pages 351-365
NeuroImage

Quantifying performance of machine learning methods for neuroimaging data

https://doi.org/10.1016/j.neuroimage.2019.05.082Get rights and content

Highlights

  • The choice of machine learning algorithm influenced prediction accuracy.

  • Sample size was important: prediction accuracy generally increased once N ≥ 400.

  • The Elastic Net performed well at a range of effect sizes, relative to other methods.

  • Random Forest performed well at small effect sizes.

  • Gaussian Process Regression performed well at large effect sizes.

Abstract

Machine learning is increasingly being applied to neuroimaging data. However, most machine learning algorithms have not been designed to accommodate neuroimaging data, which typically has many more data points than subjects, in addition to multicollinearity and low signal-to-noise. Consequently, the relative efficacy of different machine learning regression algorithms for different types of neuroimaging data are not known. Here, we sought to quantify the performance of a variety of machine learning algorithms for use with neuroimaging data with various sample sizes, feature set sizes, and predictor effect sizes. The contribution of additional machine learning techniques – embedded feature selection and bootstrap aggregation (bagging) – to model performance was also quantified. Five machine learning regression methods – Gaussian Process Regression, Multiple Kernel Learning, Kernel Ridge Regression, the Elastic Net and Random Forest, were examined with both real and simulated MRI data, and in comparison to standard multiple regression. The different machine learning regression algorithms produced varying results, which depended on sample size, feature set size, and predictor effect size. When the effect size was large, the Elastic Net, Kernel Ridge Regression and Gaussian Process Regression performed well at most sample sizes and feature set sizes. However, when the effect size was small, only the Elastic Net made accurate predictions, but this was limited to analyses with sample sizes greater than 400. Random Forest also produced a moderate performance for small effect sizes, but could do so across all sample sizes. Machine learning techniques also improved prediction accuracy for multiple regression. These data provide empirical evidence for the differential performance of various machines on neuroimaging data, which are dependent on number of sample size, features and effect size.

Introduction

An increasing number of projects and consortia are now collecting large neuroimaging datasets. These include IMAGEN (Schumann et al., 2010), the Alzheimer's Disease Neuroimaging Initiative (ADNI, Jack et al., 2008), the Human Connectome project (Van Essen et al., 2012), ENIGMA (Thompson et al., 2017), the 1000 Functional Connectomes project (Biswal et al., 2010) and the Adolescent Brain Cognitive Development study (ABCD; https://abcdstudy.org/, see Vol. 32 of Developmental Cognitive Neuroscience, which is dedicated to the ABCD study). In addition, there are data-sharing facilities such as NeuroVault (neurovault.org, Gorgolewski et al., 2015), OpenNeuro (openneuro.org, Gorgolewski et al., 2017), and the Neuroimaging Informatics Tools and Resources Clearinghouse (NITRC; Kennedy et al., 2016). These sources of high-dimensional imaging data offer exciting opportunities to produce generalizable and reproducible research findings in arenas such as predicting disease trajectories, or linking behavioral and personality factors to functional and structural imaging data.

As large samples become more commonplace in neuroimaging, analytical tools developed for data science, such as machine learning, are more frequently applied to neuroimaging data (Jollans and Whelan, 2017; Woo et al., 2017). A wide variety of studies have used machine learning algorithms to classify individuals based on structural or functional imaging data using, among other algorithms, Support Vector Machines (e.g. Costafreda et al., 2009; Davatzikos et al., 2011; Koutsouleris et al., 2012), Random Forest (e.g. Ball et al., 2014; Ramirez et al., 2010), and Naïve Bayes classifiers (e.g. Adar et al., 2016; Wang et al., 2016; Zhou et al., 2015). There have also been successful efforts to predict continuous outcome variables, mostly using Relevance or Support vector regression, such as age (Dosenbach et al., 2010; Franke et al., 2010; Mwangi et al., 2013), cognitive ability (Stonnington et al., 2010), language ability (Formisano et al., 2008), and disease severity in patients with major depression (Mwangi et al., 2012). While these algorithms have been increasingly used in neuroimaging research, none of them were specifically developed for neuroimaging data, which have high dimensionality, inherent multicollinearity, and typically small signal-to-noise ratios. Below we briefly review important considerations when analysing large neuroimaging datasets, and how machine learning methods may address these issues.

Several authors have emphasized the importance of moving away from explanatory and univariate analysis procedures and towards multivariate outcome prediction in psychology and neuroscience (Gabrieli et al., 2015; Jollans and Whelan, 2016; Poldrack, 2011; Westfall and Yarkoni, 2016). Using regression approaches, effective outcome prediction requires that accurate outcome estimations can be achieved for new (i.e., unseen) cases. Prediction models exploit between-subject heterogeneity to make individual-level predictions rather than utilizing differences in group means (Lo et al., 2015). Embracing machine learning for outcome prediction would significantly contribute to the generalizability and reproducibility of neuroimaging research, and improve the ability of neuroimaging to explore individual differences (Dubois and Adolphs, 2016). There are several methods used to estimate and improve the generalizability of a regression model. Most common among these is cross-validation (CV) in which the data are split into ‘training’ and ‘test’ sets. Models are developed using only the training set, and model performance is assessed using the test set. The training and test set must be kept separate for all analysis steps (Cawley and Talbot, 2010). Typically, this split is carried out multiple times, alternating the data that are included in the test set. While some neuroimaging studies use test sets comprised of only one observation (leave-one-out CV; e.g. Brown et al., 2012; Clark et al., 2014; Duff et al., 2012; Niehaus et al., 2014), larger test sets (leave-k-out, where k typically equals 5 or 10; e.g. Wang et al., 2013; Whelan et al., 2014) are preferable as they provide more accurate model performance estimates (Kohavi, 1995). CV can also be used to provide an out-of-sample estimate of model performance within the analysis pipeline itself, in order to optimize parameters for the regression model. When multiple layers of CV are used for internal and external validation of model performance this is referred to as ‘nested’ CV (see Fig. 1). By embedding a layer of CV within the training set of the ‘outer layer’ of CV it is made possible to train and validate a model within the training set itself. The test set remains a separate dataset used to carry out a final validation of model performance, removing the need for a separate validation set. The reader is referred to Varoquaux et al. (2017) for an empirical investigation of CV on neuroimaging data. In large datasets comprised of data from multiple sites, generalizability can be further quantified by using leave-site-out CV, where data from one site is withheld as a test set and the model is developed using data from the remaining sites (Dwyer et al., 2018). Assessing model performance on the withheld test set is then a more rigorous test of generalizability and the obtained measure of generalizability can be further optimized using nested leave-site-out CV (Dwyer et al., 2018). Such complex CV techniques have been used to build generalizable and accurate prediction models of treatment outcomes in psychosis using multisite psychosocial, sociodemographic, and psychometric data (Koutsouleris et al., 2016) and a combination of clinical and neuroimaging data (Koutsouleris et al., 2018).

Depending on the voxel size, single MRI images can contain from 100,000 to a million voxels. As sample sizes in neuroimaging are often modest even very large studies will have more voxels than participants. A higher ratio of features to cases increases the tendency of the model to fit to noise in that sample (i.e., overfitting; see Whelan and Garavan, 2014 for a discussion specific to neuroimaging). Overfitting will result in the model fitting poorly when it is applied to a new dataset. Even when using a smaller number of regions of interest (ROIs) instead of voxels, combining multiple data sources (such as neuroimaging data and cognitive data or demographics), imaging modalities, or conditions will result in a large number of features. Feature selection and regularization are two approaches that are commonly adopted for dealing with high-dimensional data. A further potentially useful method for neuroimaging data is bootstrap aggregation (bagging).

Reducing the number of features in a regression model (i.e., dimension reduction), will almost always be beneficial for attenuating overfitting when working with neuroimaging data. There exists a wide array of methods for reducing the number of input variables in neuroimaging data (Mwangi et al., 2014). These methods work by selecting a subset of features or by summarizing features in new variables. Some of these methods, such as principal and independent component analysis (PCA and ICA), have long been standard tools in neuroscience. Dimension reduction techniques can be separated by whether they preserve the original values of features (this is not the case for ICA and PCA), whether they consider each feature in isolation or not, and whether they are unsupervised (using only the feature values) or supervised (using the feature and dependent variable values). Feature selection is a dimension reduction technique that is often favored in neuroimaging studies that use machine learning approaches. Feature selection is an umbrella term for supervised methods that do not alter the original feature values. Feature selection methods can broadly be categorized into ‘filter’ methods, ‘wrapper’ methods, and embedded methods (see Chandrashekar and Sahin, 2014). Filter methods are unimodal, considering each feature individually. The application of filter methods involves evaluating the outcome of each feature on some statistical test (e.g., a t-test or Pearson's correlation with the outcome variable), and only retaining those features with the highest values. A key benefit of filter methods is low computational cost when compared to much more computationally expensive wrapper methods (Nnamoko et al., 2014), which are multimodal and consider subsets of features. Popular wrapper methods include forward selection, backward elimination, and recursive feature elimination, all of which carry out step-wise search procedures including or excluding features in each step to arrive at the feature set which maximizes algorithm performance. Wrapper methods lend themselves well to embedding within optimization of the regression model (e.g. an adaptive forward-backward greedy algorithm integrated within a model; Jie et al., 2015). Embedded methods integrate feature selection directly into optimization of the regression model by choosing the feature selection criterion through hyperparameters. The most widely used embedded feature selection methods use regularization (discussed further below). A key advantage of embedded methods is that any researcher input regarding minimum effect sizes or desired feature set size which is typically necessary in filter and wrapper methods is eliminated, reducing the possible bias introduced to the model at this step. Novel wrapper methods, such as an adaptive forward-backward greedy algorithm, can also be integrated within models (Jie et al., 2015). Sophisticated pipelines can combine feature selection techniques and other dimension reduction methods. For example, Koutsouleris et al. (2018) implemented principal components analysis to reduce the dimensionality of MRI data and then used a wrapper method, specifically a greedy sequential backward elimination algorithm, to identify the principal components that optimally predicted the outcome.

In neuroimaging, good outcome predictions may rely on large feature sets, as any cognitive or behavioral variable of interest will most likely be best explained by a network of spatially correlated brain regions. Good regression models with neuroimaging data may therefore include interaction effects between features. To account for this, the feature selection methods that should be used with neuroimaging data will consider feature sets rather than individual features. Accordingly, previous work has shown that both wrapper methods (Tangaro et al., 2015) and embedded methods (Tohka, Moradi, Huttunen & ADNI, 2016) are preferable to filter methods with neuroimaging data. However, wrapper methods are sometimes prone to overfitting and are typically more computationally intense than embedded methods (Saeys et al., 2007). Furthermore, as neuroimaging data have an inherently low signal-to-noise ratio, the individual predictive power of each voxel or ROI is likely to be quite small. It may therefore be advantageous to consider complex regression models that allow for the inclusion of some predictors with low effect sizes. Due to the amount of unknown factors relevant to the selection of a feature selection method (such as the unknown ideal number of features and the optimal threshold for inclusion of features with low effect sizes), the focus of this paper with regard to dimension reduction will be on embedded methods, which can be implemented without much researcher input.

Regularization is a method that attenuates overfitting by penalizing the size of the regression weights as model complexity increases. Regularization is often achieved through the L1-or the L2-norm. The L1-norm, as implemented in the Least Absolute Shrinkage and Selection Operator (LASSO), penalizes regression weights based on their absolute size, and results in sparse models (i.e., some regression weights can be set to zero). The L2-norm (also known as Ridge Regression or Tikhonov Regularization) penalizes regression weights based on their squared values, and does not result in sparse models. However, with highly multicollinear data (such as neuroimaging data) neither L1-nor L2-norm regularization are ideal because the large number of non-zero coefficients in models using the L2-norm is unable to produce parsimonious solutions, and the L1-norm is inadequate in accounting for highly correlated groups of predictors (Ogutu et al., 2012; Mwangi et al., 2014). The Elastic Net (EN; Zou and Hastie, 2005) combines L1-norm and L2-norm regularization, and has the advantage of being an embedded feature selection algorithm, and thus produces a sparse solution in which groups of correlated features are included or excluded. The Elastic Net has gained popularity among neuroimaging researchers in recent years, and has been successfully used in several studies with large samples (e.g. Chekroud et al., 2016; Whelan et al., 2014).

The low signal-to-noise ratio of neuroimaging data calls for a tool to increase the stability of findings and reduce error in outcome estimates. Stability can be estimated using bootstrapping (Efron and Tibshirani, 1997), where the dataset is randomly sampled with replacement many times to minimize the effect of outliers and estimate the true population mean (Hall and Robinson, 2009). Like CV, bootstrapping serves a purely descriptive purpose when used to estimate population metrics. However, a related approach termed bootstrap aggregation (bagging; Breiman, 1996), uses bootstrapping to improve stability within the model optimization framework. Bagging uses bootstrapped samples to generate multiple estimates of a calculation or metric, and an aggregate of these estimates is created. These aggregated estimates can be used instead of singular outcome estimates at every step of the analysis. An important application of this method is in unsupervised learning, where stability of clustering applications with neuroimaging data can be greatly improved through bagging (Bellec et al., 2010). Bagging has also previously been used for embedded feature selection with large genetic datasets and showed significant improvements over standard non-bagged embedded methods in terms of model accuracy and stability (Abeel et al., 2010). Bagging is an effective way to decrease error, particularly with datasets that have a low signal-to-noise ratio and high multicollinearity (Zahari et al., 2014).

Another important consideration in neuroimaging is that flexible or ‘exploratory’ analysis introduces a high risk of false positive results or overestimated effect sizes (Button et al., 2013). Therefore, predetermined analysis pipelines and analytical decisions aid in producing reproducible results. The tendency for researchers to screen data before data collection is completed, to carry out multiple iterations of analyses without reporting the findings (e.g., with and without covariates), or to tweak parameters for group inclusion to better represent the problem has been termed ‘researcher degrees of freedom’ (Simmons et al., 2011; Loken and Gelman, 2017; Westfall and Yarkoni, 2016). In the case of machine learning frameworks, the researcher input can potentially be greatly reduced, limiting the room for subjectivity and reducing the researcher degrees of freedom. To enhance objectivity, the role of the researcher should be confined to collecting and preparing the best data possible to describe the problem of interest, based on domain knowledge (Dubois and Adolphs, 2016). Ideally, dimension reduction, model building, and parameter optimization should be data-driven.

The choice of which ML methods to use, or whether to use them at all, is an important consideration but it is not clearly defined in the literature. A parameter to be defined before commencing any machine learning analysis is the CV framework to use. While leave-one-out CV yields more accurate predictions from neuroimaging data than split-half or two-fold CV (Price et al., 2013), it also generates more variable and biased estimates of out of sample accuracy (Varoquaux et al., 2017). Ten-fold CV produces more stable accuracy estimates and is recommended (Kohavi, 1995).

The use of feature selection, and the method used, can also impact model performance with neuroimaging data. Some feature selection techniques have little impact on model performance and may only increase computational expense (e.g., classification of individuals with mild cognitive impairment or Alzheimer's disease in Chu et al., 2012). The Elastic Net (embedded method), yields more accurate predictions than filter and wrapper methods for some classification problems (Tohka, Moradi, Huttune & ADNI, 2016). When using embedded methods like the Elastic Net, additional prior dimension reduction steps routinely employed with neuroimaging data (such as initial ROI selection or PCA) likely also become redundant, although this remains to be empirically investigated.

Bagging has been used with neuroimaging data for Alzheimer's disease detection (Shen et al., 2012), discrimination between Alzheimer's disease and mild cognitive impairment (Ramirez et al., 2018) and between Parkinson's disease and atypical Parkinsonian syndrome (Garraux et al., 2013). Bagging outperforms boosting algorithms, another class of sophisticated ensemble technique (Ramírez et al., 2016). As these studies did not specifically investigate the effect bagging had on the analysis results, and how bagging interacts with algorithm and dimension reduction choices, the use of bagging with neuroimaging data remains to be further examined. Moreover, Munson and Caruana (2009) demonstrated that while bagging can continue to improve model performance with increasing feature set size, performance does eventually plateau for most data. The relationship between feature set size and model performance with bagging has not been formally tested using neuroimaging data. Additionally, Munson and Caruana (2009) reported that feature selection can also reduce model performance when combined with bagging. This potentially negative interaction between feature selection and bagging is also unclear for neuroimaging data.

A final crucial parameter known to impact model performance is sample size. Sample size greatly affects prediction accuracy in ML models using neuroimaging data (Arbabshirani et al., 2017). The accuracy of ML models in predicting age (Franke et al., 2010) and identifying schizophrenia (Schnack and Kahn, 2016) from neuroimaging data increases with training set size. This is likely because smaller training sets are more heterogenous (Schnack and Kahn, 2016). While some models, such as the Elastic Net (Zou and Hastie, 2005) are relatively robust to smaller sample sizes where the number of predictors is far bigger than the number of observations, it is unclear how changes in the ratio of features to observations impact performance of the Elastic Net and other algorithms with neuroimaging data.

Here we have selected a number of machine learning algorithms (see Bzdok et al., 2018 for a treatment of the overlap between statistics and machine learning) as the target of a structured quantitative examination of their performance on the same neuroimaging datasets. The selected algorithms have been applied to linear regression problems in neuroimaging research to date and are implemented in machine learning toolboxes intended for use with neuroimaging data. The statistical tool historically used most often for linear regression and prediction problems in psychological and biological science – multiple regression (MR) - is used as a ‘baseline’ against which to compare the machine learning algorithms. In MR, it is assumed that the output variable is a linear combination of all input variables, and regression weights are determined for each variable based on this assumption. MR is a non-sparse method and may thus not be suitable for very high-dimensional data. A non-sparse machine learning algorithm evaluated here is Gaussian Process regression (GPR). GPR is a non-parametric probabilistic Bayesian method that uses a predefined covariance function (‘kernel’) to optimize the function of input values describing the output. While GPR has been applied with some success to various prediction problems using MRI data (e.g. Momte-Rubio et al., 2018), choosing the kernel in GPR appropriately for neuroimaging data may prove challenging. A Multiple Kernel Learning (MKL) approach implemented here uses the L1 norm to create a sparse combination of multiple kernels (Rakotomamonjy et al., 2008). MKL was previously found to outperform support vector machine models when using fMRI data to classify stimulus types (Schrouff et al., 2018). Another kernel method is Kernel Ridge Regression (KRR), which uses a kernel to make ridge regression (regularization via the L2 norm) non-linear (Shawe-Taylor & Cristianini, 2004). KRR can be thought of as a specific case of GPR but lacks the ability to give confidence bounds. KRR has been used to predict treatment outcomes in children with autism spectrum disorders based on fMRI to biological motion stimuli (Yang et al., 2016). The Elastic Net (EN) combines the L1 and L2 penalties to arrive at a linear solution and has been used to predict substance use outcomes in a large sample of adolescents based on functional and structural MRI (Whelan et al., 2014). For MR, GPR, MKL, KRR, and EN, each input feature is assigned a weight, which may be zero when regularization is used (EN and KRR). This is not the case for Random Forest (RF) models. Rather, a number of decision trees are grown based on the input features and the output. The predicted outcomes from multiple trees are aggregated using bootstrap aggregation, and in this way the tendency to overfit is greatly attenuated using RF. RF has been used in many neuroimaging studies for a variety of applications such as classification of patients (e.g. Fredo et al., 2018; Zhu et al., 2018).

Here, the efficacy of the five machine learning tools outlined above for use with large neuroimaging datasets was assessed. The performance of a number of machine learning algorithms used for linear regression problems in neuroimaging was compared to standard multiple regression as a baseline to evaluate the added value of choosing each machine learning algorithm. We conducted an empirical evaluation of the extent to which feature selection and resampling procedures influenced results. The effect that data dimensionality has on accuracy was quantified by varying both sample size and number of features. Using simulated neuroimaging data with varying predictor effect sizes as well as real neuroimaging data, this study first compared performance of the Elastic Net, standard multiple regression, a state-of-the-art machine learning toolbox for imaging data (PRoNTo, Schrouff et al., 2013), and an implementation of the Random Forest method available in Matlab. Furthermore, we examined how the addition of bagging and feature selection affected the accuracy of results from simulated and real data, using an embedded feature selection approach developed with the intention of minimizing researcher degrees of freedom. Based on previous work, it was anticipated that both feature selection and regularization would improve predictions for datasets with large feature sets by creating less complex models, and that bagging would reduce overfitting for small samples by reducing the effect of outliers.

Section snippets

Machine learning protocol

The analysis steps outlined below were implemented in MATLAB 2016b using custom analysis scripts for EN, MR, and RF, and the PRoNTo Toolbox for GPR, MKL, and KRR. Analysis scripts used are available at github.com/ljollans/RAFT. Specific aspects of the steps are described below.

Machine comparison

Median out-of-sample model performance (i.e., correlation between prediction for the test set and truth) for all regression algorithms is shown in Fig. 4, Fig. 5. There was a clear effect of predictor effect sizes on prediction accuracy, with more accurate predictions for both SimulatedLarge, and ImagingLarge, relative to SimulatedSmall, and ImagingSmall, for all analysis methods.

RF had the least amount of variation between data types, although it produced poorer predictions for datasets with

Discussion

Analytical tools developed for data science have become frequently used in neuroimaging (Woo et al., 2017), but none of these tools were specifically developed for neuroimaging data. With the small samples, large feature sets, and low signal-to-noise that are characteristic of neuroimaging data, prediction models built using neuroimaging data are at a high risk of overfitting. In this paper, the merit of six different regression approaches for prediction analysis was empirically evaluated and

Conclusion

A number of recommendations for future machine learning studies using neuroimaging data can be made based on these findings. Datasets with at least 400 observations have the highest likelihood of uncovering meaningful findings. When at least 400 observations and 400 or more predictor variables are included in the analysis, regularized regression via the Elastic Net was shown to be the best analysis approach for ROI data. When the sample or feature set size is smaller, standard Multiple

Disclosures

Dr. Banaschewski has served as an advisor or consultant to Bristol-Myers Squibb, Desitin Arzneimittel, Eli Lilly, Medice, Novartis, Pfizer, Shire, UCB, and Vifor Pharma; he has received conference attendance support, conference support, or speaking fees from Eli Lilly, Janssen McNeil, Medice, Novartis, Shire, and UCB; and he is involved in clinical trials conducted by Eli Lilly, Novartis, and Shire; the present work is unrelated to these relationships. The other authors report no biomedical

Funding Acknowledgements

LJ and RB are supported by the Irish Research Council under Grant Number GOIPG/2014/418 and EPSPG/2017/277 respectively. This work received support from the following sources: the European Union-funded FP6 Integrated Project IMAGEN (Reinforcement-related behaviour in normal brain function and psychopathology) (LSHM-CT- 2007-037286), the Horizon 2020 funded ERC Advanced Grant ‘STRATIFY’ (Brain network based stratification of reinforcement-related disorders) (695313), ERANID (Understanding the

References (94)

  • K. Franke et al.

    Estimating the age of healthy subjects from T1-weighted MRI scans using kernel methods: exploring the influence of various parameters

    Neuroimage

    (2010)
  • J.D.E. Gabrieli et al.

    Prediction as a humanitarian and pragmatic contribution from human cognitive neuroscience

    Neuron

    (2015)
  • G. Garraux et al.

    Multiclass classification of FDG PET scans for the distinction between Parkinson's disease and atypical parkinsonian syndromes

    Neuroimage: Clinic

    (2013)
  • D.N. Kennedy et al.

    The NITRC image repository

    Neuroimage

    (2016)
  • N. Koutsouleris et al.

    Multisite prediction of 4-week and 52-week treatment outcomes in patients with first-episode psychosis: a machine learning approach

    Lancet Lancet

    (2016)
  • G.C. Monté-Rubio et al.

    A comparison of various MRI feature types for characterizing whole brain anatomical differences using linear pattern recognition methods

    Neuroimage

    (2018)
  • B. Mwangi et al.

    Prediction of individual subject's age across the human lifespan using diffusion tensor imaging: a machine learning approach

    Neuroimage

    (2013)
  • R.A. Poldrack

    Inferring mental states from neuroimaging data: from reverse inference to large-scale decoding

    Neuron

    (2011)
  • J.D. Power et al.

    Spurious but systematic correlations in functional connectivity MRI networks arise from subject motion

    Neuroimage

    (2012)
  • C.J. Price et al.

    Predicting IQ change from brain structure: a cross-validation study

    Dev. Cogn. Neurosci.

    (2013)
  • J. Ramírez et al.

    Computer aided diagnosis system for the Alzheimer's disease based on partial least squares and random forest SPECT image classification

    Neurosci. Lett.

    (2010)
  • J. Ramirez et al.

    Ensemble of random forests One vs. Rest classifiers for MCI and AD prediction using ANOVA cortical and subcortical feature selection and partial least squares

    J. Neurosci. Methods

    (2018)
  • K. Shen et al.

    Detecting global and local hippocampal shape changes in Alzheimer's disease using statistical shape models

    Neuroimage

    (2012)
  • X. Shen et al.

    Groupwise whole-brain parcellation from resting-state fMRI data for network node identification

    Neuroimage

    (2013)
  • C.M. Stonnington et al.

    Predicting clinical scores from magnetic resonance scans in Alzheimer's disease

    Neuroimage

    (2010)
  • P.M. Thompson et al.

    ENIGMA and the individual: predicting factors that affect the brain in 35 countries worldwide

    Neuroimage

    (2017)
  • N. Tzourio-Mazoyer et al.

    Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain

    Neuroimage

    (2002)
  • D.C. Van Essen et al.

    The Human Connectome Project: a data acquisition perspective

    Neuroimage

    (2012)
  • G. Varoquaux et al.

    Assessing and tuning brain decoders: cross-validation, caveats, and guidelines

    Neuroimage

    (2017)
  • X. Zhu et al.

    Random forest based classification of alcohol dependence patients and healthy controls using resting state MRI

    Neurosci. Lett.

    (2018)
  • T. Abeel et al.

    Robust biomarker identification for cancer diagnosis with ensemble feature selection methods

    Bioinformatics

    (2010)
  • N. Adar et al.

    Feature selection on MR images using genetic algorithm with SVM and naive Bayes classifiers

  • T.M. Ball et al.

    Single-subject anxiety treatment outcome prediction using functional neuroimaging

    Neuropsychopharmacology

    (2014)
  • B.B. Biswal et al.

    Toward discovery science of human brain function

    Proc. Natl. Acad. Sci. Unit. States Am.

    (2010)
  • L. Breiman

    Bagging predictors

    Mach. Learn.

    (1996)
  • K.S. Button et al.

    Power failure: why small sample size undermines the reliability of neuroscience

    Nat. Rev. Neurosci.

    (2013)
  • D. Bzdok et al.

    Statistics versus machine learning

    Nat. Methods

    (2018)
  • G.C. Cawley et al.

    On over-fitting in model selection and subsequent selection bias in performance evaluation

    J. Mach. Learn. Res.

    (2010)
  • V.P. Clark et al.

    Reduced fMRI activity predicts relapse in patients recovering from stimulant dependence: prediction of Relapse Using fMRI

    Hum. Brain Mapp.

    (2014)
  • S.G. Costafreda et al.

    Prognostic and diagnostic potential of the structural neuroanatomy of depression

    PLoS One

    (2009)
  • C. Davatzikos et al.

    Prediction of MCI to AD conversion, via MRI, CSF biomarkers, and pattern classification

    Neurobiol. Aging

    (2011)
  • I.J. Deary et al.

    The neuroscience of human intelligence differences

    Nat. Rev. Neurosci.

    (2010)
  • A. Di Martino et al.

    Enhancing studies of the connectome in autism using the autism brain imaging data exchange II

    Sci. Data

    (2017)
  • N.U.F. Dosenbach et al.

    Prediction of individual brain maturity using fMRI

    Science

    (2010)
  • D.B. Dwyer et al.

    Machine learning approaches for clinical psychology and psychiatry

    Annu. Rev. Clin. Psychol.

    (2018)
  • B. Efron et al.

    Improvements on cross-validation: the .632+ bootstrap method

    J. Am. Stat. Assoc.

    (1997)
  • A.J. Fredo et al.

    Diagnostic classification of autism using resting-state fMRI data and conditional random forest

    Age

    (2018)
  • Cited by (0)

    View full text