Hippocampal unified multi-atlas network (HUMAN): protocol and scale validation of a novel segmentation tool

In this study we present a novel fully automated Hippocampal Unified Multi-Atlas-Networks (HUMAN) algorithm for the segmentation of the hippocampus in structural magnetic resonance imaging. In multi-atlas approaches atlas selection is of crucial importance for the accuracy of the segmentation. Here we present an optimized method based on the definition of a small peri-hippocampal region to target the atlas learning with linear and non-linear embedded manifolds. All atlases were co-registered to a data driven template resulting in a computationally efficient method that requires only one test registration. The optimal atlases identified were used to train dedicated artificial neural networks whose labels were then propagated and fused to obtain the final segmentation. To quantify data heterogeneity and protocol inherent effects, HUMAN was tested on two independent data sets provided by the Alzheimer’s Disease Neuroimaging Initiative and the Open Access Series of Imaging Studies. HUMAN is accurate and achieves state-of-the-art performance (DiceADNI⁢ =0.929±0.003 ?> and DiceOASIS⁢  =0.869±0.002 ?>). It is also a robust method that remains stable when applied to the whole hippocampus or to sub-regions (patches). HUMAN also compares favorably with a basic multi-atlas approach and a benchmark segmentation tool such as FreeSurfer.

Series of Imaging Studies. HUMAN is accurate and achieves state-of-the-art performance (Dice = ± 0.929 0.003 ADNI and Dice 0.869 0.002 OASIS = ± ). It is also a robust method that remains stable when applied to the whole hippocampus or to sub-regions (patches). HUMAN also compares favorably with a basic multi-atlas approach and a benchmark segmentation tool such as FreeSurfer.
Keywords: hippocampus segmentation, machine learning, multi-atlas Online supplementary data available from stacks.iop.org/PMB/60/8851/mmedia (Some figures may appear in colour only in the online journal)

Introduction
The hippocampus is a brain structure of great importance for the pathogenesis of a number of neurodegenerative diseases. Hippocampal atrophy is an established primary biomarker in Alzheimer's disease (Sabuncu et al 2011. The gold standard for hippocampal segmentation is manual tracing, which is time consuming, and subject to protocol and rater variability. This, along with the intrinsic difficulty of the task, has generated the need for automated segmentation techniques. Several methodologies have been put forward, including the state-of-the-art multi-atlas approaches, which are based on the non-linear co-registration of the target image with expert-segmented examples ( atlases).
Several studies have demonstrated that multi-atlas accuracy is significantly related to the 'similarity' between the target image and the training atlases (Aljabar et al 2009, Lötjönen et al 2010, Kwak et al 2013, but an objective definition of the optimal similarity is lacking. In the initial studies such similarity was based on demographic and intensity based criteria after linear (Leung et al 2010) or a non-linear registration (Klein et al 2008). More recently, non parametric manifold strategies, such as Isomap or Laplacian Eigenmaps, were investigated for atlas selection (Wolz et al 2010, Duc et al 2013. However, in some cases, parametric techniques, such as the Stochastic Neighbor Embedding, perform better than non-parametric ones (Van der Maaten and Hinton 2008). Overall, it is fair to say that an optimal atlas selection strategy is yet to be established, which is why we performed a comparison of different strategies to evaluate their effectiveness.
Multi-atlas approaches have some intrinsic drawbacks. First of all, errors during the registration phase, in the warp estimation or the label resampling can limit the reliability of the results (Pipitone et al 2014). In general, registration strategies incorporating tissue classification information can limit these issues, at the expense of increased processing times (Heckemann et al 2010). In addition, as multi-atlas accuracy depends on the similarity between training and test sets, large training sets ('complete' in a mathematical sense), requiring a vast amount of computational resources, are necessary to avoid poor performance. In principle, machine learning approaches can overcome these issues by generalizing the models learned by training samples. However, so far classification-based approaches (Morra et al 2010, Maglietta et al 2015, Tangaro et al 2013 have not attained performances comparable to multi-atlas methods. An effective combined strategy would seem a natural and elegant solution, as suggested by recent work (Wang et al 2011, Hao et al 2014. Interestingly, these studies show how voxel-wise learning can effectively introduce shape or context information in the segmentation process, improving its accuracy. Nonetheless, they have focused on label fusion, a particular aspect of multi-atlas approaches. In general the comparison of different segmentation algorithms is arduous, due to the fact that most studies have different data sources or validation techniques. Also, when dealing specifically with hippocampal segmentation the differences in segmentation protocols represent a particularly limiting constraint (Bellotti and Pascazio 2012, Nestor et al 2013. Although in recent years a considerable effort has been invested in the creation of a unified segmentation protocol (Frisoni andJack 2011, Frisoni et al 2015), consensus has not been reached. Therefore, a segmentation algorithm with the ability to adapt to different protocols is very desirable.
Based on the previous considerations, in this paper we present a novel and fully automated hippocampal segmentation algorithm, named HUMAN (Hippocampal Unified Multi-Atlas-Networks), which combines in a unified framework the accuracy of multi-atlas methods with the robustness of artificial neural networks classification. The performance of this methodology was assessed with two independent test sets, segmented with two independent protocols. The first set was provided by the Alzheimer's Disease Neuroimaging Initiative (ADNI) and the second one by the Open Access Series of Imaging Studies (OASIS). Different manifold strategies for atlas selection were explored in order to identify an optimal setup for learning. The performance of HUMAN when applied to sub-regions ( patches) of the hippocampus was then investigated. Finally, HUMAN was compared to the publicly available segmentation tool FreeSurfer (Fischl 2012) and to a basic multi-atlas pipeline, consisting of registration and label fusion.

Materials
A data set of 100 T1 MRI scans from the ADNI database (1.5 T and 3.0 T), including normal control (NC), mild cognitive impairment (MCI) and Alzheimer's disease (AD) subjects, was used in preparation of this article. The relative hippocampal labels were provided by the EADC-ADNI harmonized segmentation protocol 7 (Boccardi et al 2015a(Boccardi et al , 2015b. The ADNI images were divided in two data sets matched for size and demographic features. The first one ADNI T , consisting of 45 images, was used for training and parameter tuning. The second one ADNI D , of 55 images, was used as a test set. ADNI T and ADNI D shared common acquisition characteristics and the same harmonized segmentation protocol. A further evaluation was performed on an independent OASIS D set, consisting of 35 T1 MRI scans (1.5 T), provided by the OASIS initiative 8 (Marcus et al 2007) in occasion of the MICCAI SATA challenge workshop 2013 9 with the relative labels provided by the brain-COLOR protocol 10 (Klein et al 2010). Both the ADNI and OASIS D sets consisted of MPRAGE MRI brain scans with a resolution of 1 1 1 × × mm 3 (in the following paragraphs voxels and mm 3 are interchangeably used).
Data size, clinical status, age and gender information for the three sets ADNI T , ADNI D and OASIS D are summarized in table 1. Left and right hippocampal volume averages are reported with the relative standard deviations. The age range for OASIS D is consistently broader then ADNI D as the OASIS project was not limited to elderly subjects. However, this difference does not affect the reliability of the results.
Clinical and gender information for the OASIS D set was not available. ADNI T and ADNI D were matched in terms of demographic and clinical composition. The volume distributions in the training set and test set were also matched, thus excluding any volume-based bias in the analysis. The image processing and the learning phases were carried out blindly to subject status.

Methods
The rationale underlying the HUMAN approach is to emulate the manual segmentation of a human expert within a multi-atlas framework. It cannot be considered a machine learning segmentation method, as its goal is not the generalization of models learned from training examples, nor a label fusion strategy, as the core of the method is the generation of putative segmentations and not the fusion of propagated labels. The novel algorithm combines multiatlas and classification approaches and involves three main phases: • Nonlinear registration. MRI scans are intensity normalized and non-linearly registered with a data driven template. The goal of this processing step is to increase the similarity among the scans as far as possible. Volumes of Interest (VOIs) are extracted from each warped scan to define a peri-hippocampal region of interest. • Atlas selection. The VOIs and the displacement fields resulting from non-linear registration are used to perform linear and non-linear similarity measurements between the test image and the training scans. Accordingly this step defines which atlases should be used as base of knowledge for subsequent learning and classification. • Classification. VOIs of selected atlases undergo a feature extraction process, the resulting statistical and textural features are then used to train a voxel-based classifier for each VOI. A test VOI undergoes the same feature extraction process then the selected classifiers are used to estimate whether a voxel belongs or not to the hippocampus. The hippocampal segmentation in the test images is finally obtained by label fusion. Figure 1 shows a synthetic overview of the algorithm. The full method is illustrated in the following and further methodological aspects are discussed in the supplementary material (stacks.iop.org/PMB/60/8851/mmedia).

Nonlinear registration
Since registration is sensitive to the initial conditions, the intensities of the brain scans were normalized and the bias field removed with the improved N3 MRI bias field correction algorithm (Tustison et al 2010). After pre-processing, one image a v was repeatedly extracted from the ADNI T set to perform a leave-one-out analysis. The healthy controls from the remaining training set t D were used to build a data driven template t M (see figure 2) to facilitate data registration using the advanced normalization tools 11 (ANTs) (Avants et al 2009(Avants et al , 2011. Leave-one-out was adopted for template construction in order to faithfully reproduce the segmentation process of test scans and maximize the computational efficiency of the method, not requiring a dedicated template for each test scan. For each cross-validation round, the t D brain scans and the validation image a v were linearly registered to the t M template with FSL-FLIRT (Jenkinson et al 2012). Then, a non-linear registration (Klein et al 2009) was performed with ANTs, and the warp fields i F were stored for later use.
After registration a gross peri-hippocampal region VOI i ( ) ω and the corresponding field were extracted, from both training and test, using FAPoD (Amoroso et al 2012, Tangaro et al 2014) (a fully automated hippocampal shape analysis algorithm). The VOI i ( ) ω contained a probable hippocampal region of about 17 000 voxels, and laid in a rectangular region of interest of dimensions 50 70 70 were used for the subsequent atlas selection.

Atlas selection
Two different strategies were adopted in order to select the optimal atlases. In the first strategy, as suggested by previous studies (Gerber et al 2010), we used embedding techniques to project the peri-hippocampal VOI i ( ) ω voxel intensities, and the related voxel-wise warp displacements VOI i ( ) F (accounting for ∼17 000 voxels), into low dimensional manifolds. Subsequently, the k atlases nearest to the volume of interest of the validation image VOI v ( ) ω Figure 1. Synthetic overview of the proposed method. Healthy subjects of the training set are used to build a data driven template, then all training scans are non-linearly registered and hippocampal volumes of interest (VOI) extracted. Warped atlases and warping fields are stored for later use. A test MRI scan is warped to the template and the most similar examples are selected according to a similarity metric. Each optimal atlas is used to train a dedicated classifier and to obtain a putative segmentation. Finally, the test segmentation is obtained by averaging the putative segmentation according to the adopted similarity metric.
were selected as optimal. Several parametric and non-parametric techniques were explored for this task: Sammon mapping (SAM) (Sammon 1969), Isomap (ISO) (Tenenbaum et al 2000), locally linear embedding (LLE) (Roweis and Saul 2000), Laplacian Eigenmaps (LAP) (Belkin and Niyogi 2001), stochastic neighbor embedding (SNE) (Hinton and Roweis 2002) and its improved version (t-SNE) (Van der Maaten and Hinton 2008). The second strategy consisted in using the Pearson's correlation to measure directly the similarity among the peri- . For each dimensionality reduction technique different parameter configurations were considered, resulting in the selection of different atlases. In particular, for each manifold we explored the number of atlases to be selected, ranging from 1 to 30, and the embedding manifold dimension. This range was chosen based on the fact that multi-atlas performances usually degrade when using more than 15 20 ∼ atlases (Aljabar et al 2009). The Dice similarity index (see section 3.3) was used to evaluate the leave-one-out best configuration.

Classification and segmentation
The hippocampal VOI i ( ) ω belonging to the k selected optimal atlases underwent a statistical and textural feature extraction process . For each voxel Haralick, Haar-like and statistical features such as, average, standard deviation, kurtosis, skewness and gradients were computed. The relationships between each voxel and the voxels surrounding it were taken into account using varying size windows centered on the examined voxel with dimensions ranging from 3 3 3 × × voxels to 9 9 9 × × voxels, for a whole set of 315 features , thus each scan was described as a matrix of approximate dimensions 17 000 315 × . Subsequently, a k-tuple of neural network classifiers Since the aim of this approach is to use warping to increase as much as possible the similarity between the test scan and the training images, we trained the networks i C to exactly represent the corresponding training data VOI i ( ) ω . We also investigated whether the classification performance was locally robust when training the The best classification results were obtained with artificial neural networks, trained with the backpropagation algorithm, consisting of one hidden layer with ten neurons and standard sigmoid activation functions. With this design, the networks achieved Dice indexes ranging from 0.98 to 1.00. Networks trained with a lower number of neurons could not achieve such performances, while no significant improvement could be obtained with higher numbers of neurons. Each atlas was used simultaneously as a training and testing scan, in order to build a model of the atlas itself; then these trained models were used to generate putative segmentations of the test scans. The same configuration was maintained for the networks trained on the hippocampal patches. In the HUMAN approach the test images are processed to increase the similarity with the training atlases. Therefore, the training of the classifier was aimed to model the atlases rather than generalize to unseen data samples, this is why we chose to adopt a more versatile classifier (artificial neural networks) instead of a more robust classifier, such as Random Forests. The trained models were finally stored.
For each validation scan, the segmentation was obtained by propagating the putative segmentations, as obtained from each network, onto the native target image space through the displacement inverse field t 1 − F and finally fusing the putative labels. More in detail, for each voxel the relative label was calculated as a weighted average of the k predicted labels, the weight being the pairwise distance between the selected atlases and the target image.
Several studies have shown that majority voting strategies for label fusion can yield hippocampal volumes significantly minor than those obtained by manual segmentation (Sabuncu et al 2010, Khan et al 2011, Wang et al 2011. This is mainly caused by the monotonic decrease of signal to noise ratio when moving from inner to outer hippocampal regions and consequently by an unbalanced error rate in favor of false negatives. To overcome this systematic error we used an adaptive threshold in the voxel classification phase, based on the Bayes theorem. We used the probability assigned by FAPoD Tangaro et al (2014) to each voxel to belong or not to the hippocampus as a priori probabilities P(H). For a two class problem, given the average training sensitivity S, the classifier probability to correctly label hippocampal voxel P h H ( ) | , and specificity s, the classifier probability to correctly label background voxel P h H ( ) ¬ |¬ , the probability of a voxel to be assigned to the hippocampus is: Following the Bayes theorem the a posteriori probability for a voxel to belong to the hippocampus when positively labeled P H h ( ) | is given by: This probability is then used as decision threshold. Accordingly, inner voxels which have higher a priori probability to belong to the hippocampus are assigned lower thresholds then outer ones, as a consequences the probability to have false negatives and the statistical error in the hippocampal volume evaluation are both reduced.

Computational infrastructure
This method requires a complex software framework, involving processing tools developed with different languages and in different environments. This could hinder the diffusion of its use in clinical or research settings lacking strong technological background. To overcome these challenges we developed a user friendly environment exposing Human as a Service, a schematic overview is presented in figure 3. The computational resources for this study were provided by the ReCaS computer center (Bari, Italy) 12 , a computing infrastructure, consisting of about 5000 CPU and allowing up to 2.2 PB storage. The data processing and monitoring was performed by exploiting a dynamic job submission tool (JST) facility . JST is a job management tool particularly useful to manage the submission and monitoring of applications, when a large number of independent executions are needed to solve the required tasks. Different distributed infrastructures are suitable for HUMAN analyses, such as computer grids or clouds.
Hence this approach is flexible and easy to be implemented on dedicated destination machines. Moreover, the necessary software can be easily interfaced with web portals or common workflow manager tools. The HUMAN pipeline is also available, like a web service/ cloud solution at the following link: https://recasgateway.ba.infn.it.

Results for atlas selection
To identify the best atlas selection method the embedding techniques listed in section 3.2 were compared to a Pearson's correlation with a leave-one-out analysis. The best results, in terms of Dice index, were achieved by HUMAN using the Pearson's correlation between the training and the validation VOI i ( ) ω . For the left hippocampi, the median Dice index was 0.910 0.004 ± , and for the right hippocampi 0.914 0.004 ± . The analysis was performed using the optimal configuration found with a basic multi-atlas approach, previously defined as a multi-atlas consisting of just registration and label fusion.
The performance of the basic-multi atlas was never as good as that of HUMAN. The best results, obtained in this case with Pearson's correlation, indicated for the left hippocampi a median Dice index 0.869 0.006 ± and for the right hippocampi 0.873 0.005 ± . Once established that the best method for atlas selection was the use of Pearson's correlation for the similarity measurements, we explored how the number of selected atlases would affect the segmentation performance.
In figure 4, the Dice index is represented as a function of the number of atlases. The best outcome was achieved with ∼10 atlases, after which a plateau was reached, with no significant difference (Wilcoxon p > .05). As a consequence further analyses were carried out considering only the best ten atlases.

Segmentation results for ADNI scans
The ADNI D test set shared appearance features and segmentation protocol with the training set ADNI T . Moreover, they were matched in terms of demographic and clinical composition. To assess the method performances and the relative segmentation quality, a Bland-Altman analysis (Bland and Altman 1995) was performed (see figure 5) along with the Dice index measurements.
Segmented volumes and manual tracings showed a very high correlation (0.95 and 0.96 for respectively left and right hippocampi). The 95% confidence interval limits were almost the same for both left and right hippocampi [−400, 400]. These values along with the high correlation suggest good agreement between segmented and manual volumes. Moreover we evaluated the manual versus segmented volume difference distribution obtaining a mean value of 12.6 207 − ± voxels, which did not significantly differ from zero. It is evident that no significant bias affected the segmented volumes. The segmentation agreement was also measured in terms of Dice index, 0.926 0.003 ± and 0.931 0.002 ± for left and right hippocampi respectively (0.929 0.003 ± on average). The performance on the test set ADNI D was also compared with the multi-atlas segmentation procedure described in section 4.1 and FreeSurfer segmentations. Also in this case the optimal configuration determined in training was adopted. The results of this comparison are presented in the following table 2.

The segmentation protocol effect on MICCAI scans
The performance of HUMAN was evaluated with the protocol independent test set OASIS D provided, as previously mentioned, by OASIS in occasion of the MICCAI SATA challenge workshop 2013. As described in section 4.2 the method was assessed with a correlation and a Bland-Altman analysis. The results for both left and right segmentations are shown in figure 6.
The correlation between the volumes segmented with HUMAN and those segmented manually was high (left correlation is 0.83, right 0.79) even if lower than in the former case. Bland-Altman analysis showed a quite broad 95% confidence interval with similar values for both hippocampi ∼[−500,500], nevertheless satisfactory levels of accuracy in terms of median Dice index were achieved (0.856 0.002 ± and 0.862 0.002 ± respectively for left and right hippocampi; on average 0.869 0.002 ± ). For the left difference distribution (manualsegmented volumes) we found on average 82 222 − ± voxels, for right hippocampi 26 296 − ± ; for both cases no significant bias was detected.
HUMAN performances were also significantly better than those achieved by multi-atlas and FreeSurfer. A summary of the results is presented in table 3.

HUMAN scale robustness
In the final phase of this work, we explored the performance of HUMAN within a patch-based segmentation framework. In this case, the goals of the analysis were twofold: to investigate  whether the classification would be affected by local hippocampal shape effects, and to investigate whether HUMAN performances would be uniformly distributed over the whole hippocampal shape. This involved segmenting each α test patch with the most correlated k 1,..., { } α C models. Also in this case, the final prediction was obtained by averaging the scores obtained by the k optimal classifiers, i.e. those trained on the patches better correlated with the patch to be segmented. The final segmentation was obtained by merging the patch segmentations.
The results confirmed the robustness of HUMAN throughout the whole hippocampus for both ADNI D and OASIS D . This is illustrated in figure 7, where each patch is color-coded according to the relative dice obtained by averaging the ADNI D and OASIS D patch-based results.    The poorest results were obtained for patches situated at the head of the hippocampus. Figure 8 shows a qualitative comparison among segmentations for 12 randomly sampled subjects (from ADNI D ): 4 NC, 4 MCI and 4 AD subjects.

Discussion and conclusion
In this study we presented a novel segmentation algorithm-HUMAN-based on a combined multi-atlas and machine learning strategy. HUMAN produced accurate segmentation on two independent test sets. In the first test set, ADNI D , with manual labels traced with the same protocol of ADNI T , the segmentation results were excellent with median Dice index 0.929 0.003 = ± for both left and right hippocampi. In the second test set OASIS D , with tracings obtained with a different protocol, the performance of HUMAN (Dice index 0.869 0.002 ± ) were less impressive, but yet satisfactory if compared with other recently reported studies (Cardoso et al 2013, Kwak et al 2013, Pipitone et al 2014. While the best performing methods of MICCAI SATA challenge reported Dice indexes approaching 0.90 median values, based on its performance on OASIS D , HUMAN would have still placed itself among the best five performing algorithms 13 . Besides, one should take into account that the performances reported (Iglesias et al 2012, Wang and Yushkevich 2013, Zikic et al 2013 were achieved with training and test sets sharing the same segmentation protocol, while HUMAN was trained with a different hippocampal segmentation protocol. In addition, we tried to tackle the following questions: (i) can machine learning strategies bring a substantial improvement to state-of-the-art segmentation strategies, such as the multiatlas approaches? (ii) to which degree are machine learning strategies affected by the use of We demonstrated that multi-atlas could be significantly improved if combined with a machine learning strategy. Moreover this improvement was robust to segmentation protocol differences between training and test. In fact, for both test sets ADNI D and OASIS D , a significant improvement (of about 5.5%) was found. The differences in protocol, as expected, affected the method accuracy. In particular, this could be observed with a loss of correlation between manual and HUMAN segmentations. As expected FreeSurfer performances resulted lower than those achieved by HUMAN, however FreeSurfer is trained with a different segmentation protocol and so this comparison biased against it. Nonetheless, it is interesting to note that HUMAN performances showed a greater stability with a variation of about 6.4% against the 11.4% of FreeSurfer. Finally, the use of local training (patch based) demonstrated the robustness of the method when dealing with sub-hippocampal regions. With this latter patch-based method the segmentation performances resulted slightly improved and almost uniformly distributed over the hippocampus. The results also confirmed the heads of the hippocampus are the regions presenting more difficulties in terms of segmentation.
In our previous work  we discussed an entirely machine learning based segmentation procedure. In particular, we used a peri-hippocampal region to actively determine a set of training images, which were then used to train a unique classifier. In this work we propose a complete change of paradigm. HUMAN exploits intensity and spatial normalization techniques to best fit the test data to the training set and to define an optimal base of knowledge (putative segmentations) to be combined in a multi-atlas framework.
As previously remarked, other recent work has already shown interesting results on this approach (Hao et al 2014). However, the fundamental difference between the method here described and the above mentioned combined label fusion strategies, is that they introduce voxel-wise machine learning strategies for label fusion, while our method uses machine learning to determine new putative atlases, obtaining the label fusion through a weighted majority voting procedure.
The slight performance deterioration on OASIS D confirmed that segmentation protocols play a key-role when it comes to multi-center studies. To the best of our knowledge, this was the first study directly addressing the effects of two independent segmentation protocols on fully automated segmentation techniques.
It is worthwhile to note that the proposed method is fully automated (it does not require user intervention) and computationally efficient, requiring a processing time of about 10 min per test image. The possible exploitation of cloud infrastructures implies that it could be adopted for large clinical trials. A limitation of the study lies in the absence of clinical evaluation, which was outside the goals of this work. Recent literature has shown that hippocampal sub-regions could be important as quantitative bio-markers for a number of neurodegenerative diseases, and especially Alzheimer's disease. Bland-Altman plots, Dice index and correlation measurements all confirm that HUMAN segmentation are consistent with the manual tracings. The adoption of a Bayesian strategy for adaptive thresholding has consistently improved the segmentation performance, nonetheless a further improvement of the presented method could consider the exploration of more refined label fusion strategies. Another refinement of the method could eventually investigate which features contribute the most to an optimal label combination.
In terms of future developments, we plan to apply HUMAN to a clinical data set with the aim of assessing its validity as a tool aiding the diagnosis of Alzheimer's disease, as we did in our previous work on pattern recognition , Sensi et al 2014, Bron et al 2015. Both the high Dice index and correlation values, especially obtained for ADNI D scans, would suggest HUMAN applicability to both studies based on changes of volume due to disease progression and group-wise comparisons with fixed reference volumes, for example for MCI-AD transition. 800 subjects but ADNI has been followed by ADNI-GO and ADNI-2. To date these three protocols have recruited over 1500 adults, ages 55 to 90, to participate in the research, consisting of cognitively normal older individuals, people with early or late MCI, and people with early AD. The follow up duration of each group is specified in the protocols for ADNI-1, ADNI-2 and ADNI-GO. Subjects originally recruited for ADNI-1 and ADNI-GO had the option to be followed in ADNI-2. For up-to-date information, see www.adni-info.org. Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Alzheimer's Association;