Head-to-Head Comparison of Two Popular Cortical Thickness Extraction Algorithms: A Cross-Sectional and Longitudinal Study

Background and Purpose The measurement of cortical shrinkage is a candidate marker of disease progression in Alzheimer’s. This study evaluated the performance of two pipelines: Civet-CLASP (v1.1.9) and Freesurfer (v5.3.0). Methods Images from 185 ADNI1 cases (69 elderly controls (CTR), 37 stable MCI (sMCI), 27 progressive MCI (pMCI), and 52 Alzheimer (AD) patients) scanned at baseline, month 12, and month 24 were processed using the two pipelines and two interconnected e-infrastructures: neuGRID (https://neugrid4you.eu) and VIP (http://vip.creatis.insa-lyon.fr). The vertex-by-vertex cross-algorithm comparison was made possible applying the 3D gradient vector flow (GVF) and closest point search (CPS) techniques. Results The cortical thickness measured with Freesurfer was systematically lower by one third if compared to Civet’s. Cross-sectionally, Freesurfer’s effect size was significantly different in the posterior division of the temporal fusiform cortex. Both pipelines were weakly or mildly correlated with the Mini Mental State Examination score (MMSE) and the hippocampal volumetry. Civet differed significantly from Freesurfer in large frontal, parietal, temporal and occipital regions (p<0.05). In a discriminant analysis with cortical ROIs having effect size larger than 0.8, both pipelines gave no significant differences in area under the curve (AUC). Longitudinally, effect sizes were not significantly different in any of the 28 ROIs tested. Both pipelines weakly correlated with MMSE decay, showing no significant differences. Freesurfer mildly correlated with hippocampal thinning rate and differed in the supramarginal gyrus, temporal gyrus, and in the lateral occipital cortex compared to Civet (p<0.05). In a discriminant analysis with ROIs having effect size larger than 0.6, both pipelines yielded no significant differences in the AUC. Conclusions Civet appears slightly more sensitive to the typical AD atrophic pattern at the MCI stage, but both pipelines can accurately characterize the topography of cortical thinning at the dementia stage.


Introduction
Structural imaging has had a long role as biomarker of progression among entry criteria for AD trials [1]. The advent of disease-modifying therapies has led to interest in the use of magnetic resonance imaging (MRI) as a possible "surrogate" measure of outcome. The two most established markers of progression on MRI are the hippocampal and the whole brain atrophy rates [2]. However, the first study assessing the effects of β-amyloid immunotherapy reported surprising findings, i.e. greater hippocampal and whole-brain atrophy rates in patients treated with AN1792 vaccination [3]. On the contrary, cortical thickness might be a promising "global" measure of disease progression, as it could represent a marker more specifically related to the evolution of AD evolution [4,5] and might be useful to evaluate the efficacy of new diseasemodifying therapies [6].
Several tools for the automatic extraction of cortical thickness have been developed, each based on different levels of complexity, robustness, and automation. Among others, the Civet-CLASP pipeline [7] and Freesurfer [8] are the two most exploited algorithms within the neuroscientific community. Obtaining an accurate thickness measurement requires the explicit reconstruction of the outer boundary on the base of the inner boundary [9], which can be done along two different approaches: (I) a skeleton method or (II) a model-based deformation of the inner surface. CIVET makes use of the skeleton mesh-based approach called constrained Laplacian anatomic segmentation using proximity. The pial surface is expanded from the white surface up to the boundary between gray matter and CSF, along a Laplacian map [10]. Terms for stretch and self-proximity are included to regularize the deforming mesh and avoid mesh self-intersection inside sulci. Differently, Freesurfer makes use of iterative and adaptive deformation and segmentation methods, deforming the mesh to reconstruct the inner and the pial surfaces. Freesurfer uses a routine function to find and correct the topological defects in the initial inner surface. The deformable model is constrained by a second-order smoothing term [11] and by a mesh self-intersection prevention routine [8], which both help to correctly establish the boundaries between adjacent banks in tight sulci. Unfortunately, some relevant problems hamper the use of these techniques. Both tools measure the cortical thickness from two 3D cortical sheets, each of which is composed by thousands of vertices and faces, making the reconstruction of the cortical mantle a complex and time consuming procedure [12].
Although several methods have been proposed in the past decades, little work has been done to compare their performances on real clinical datasets [13]. The aim of this study was to perform a head-to-head comparison between Civet-CLASP and Freesurfer. This can be considered a mandatory step toward the standardization of cortical thickness biomarkers, which in turn will pave the way to effectively translate a three-dimensional cortical marker to innovative disease modifying trials.

Materials and Methods Subjects
The sample group we selected consisted of 185 subjects (69 normal elderly controls (CTR), 37 stable MCI (sMCI), 27 progressive MCI (pMCI), and 52 Alzheimer (AD) patients), belonging to the Alzheimer's Disease Neuroimaging Initiative (ADNI1). Demographics and clinical data are summarized in Table 1. MMSE and CDR scores differed significantly among the four groups (P<0.001), while age and educational levels were not significantly different. There was a significant difference in sex (P < 0.002) with a higher prevalence of male. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. ADNI1 study is conducted in accordance with the Good Clinical Practice guidelines, the Declaration of Helsinki, and U.S. 21

Research infrastructures and pipelines
The evaluation of the cortical thickness is a computationally demanding task. We used two online e-infrastructures, namely neuGRID (https://neugrid4you.eu) [14] and VIP (http://vip. creatis.insa-lyon.fr) [15] to massively distribute job analyses, thus reducing the overall processing. Civet's and Freesurfer's main features are summarized as follow: • Civet-CLASP uses an iterative morphing method and intensity non-uniformity correction; spatial normalization to stereotaxic space; tissue classification; cortical surface extraction; cortical thickness measurement. The correspondence among subjects is granted by the nonlinear registration of the sulcal geodesic depth map with an average sulcal depth sphere surface [10].
• Freesurfer uses iterative adaptative morphing/segmentation methods and relies on similar preprocessing steps, although differently arranged. The white matter derives from the segmentation and topology correction. Gray matter is derived along T1 intensity gradient. Correspondence among subjects is obtained through surface registration to the Freesurfer reference atlas. In this study, we used the longitudinal processing stream, where the variability is reduced using repeated measures from the same subject (i.e.: baseline, month 12 (data not shown), and month 24 cross-sectional analyses) as common information to initialize the process [16]. Table 2 reports the main features of the two pipelines.

Study design
The workflow of the study is reported as supplementary figure (see S1 Fig.).

MRI acquisition
The Alzheimer's Disease Neuroimaging Initiative (ADNI) has a specific protocol for the acquisition and harmonization of MR images. The ADNI 3D T1-weighted structural images are acquired using selected systems from GE Healthcare, Philips Medical Systems and Siemens Medical Solutions, with an eye toward minimizing cross-platform differences. The Magnetization Prepared RApid Gradient Echo (MPRAGE) acquisition sequence has nominal T1 = 1000 ms, TR = 2400 ms and TE = 5 ms. The B2B acquisition set in ADNI1 is composed of a MPRAGE scan and a MPRAGE-repeat scan.  3) Higher, but not significant, AUC to discriminate CTR versus pMCI or AD 4) Sensitive in expected but also scattered unexpected cortical regions affected by disease neuropathology

LONGITUDINAL LONGITUDINAL
1) Higher disease effect in pMCI and AD 1) Higher disease effect trend in CTR 2) More sensitive to significant atrophic patterns in frontalparietal regions (especially in pMCI) 2) Better correlation with hippocampal volumetric atrophy 3) Sensitive to detect statistical significant atrophic differences between: AD vs CTR; AD vs sMCI; pMCI vs CTR 3) Sensitive to detect statistical significant atrophic differences between: AD vs CTR; AD vs sMCI 4) Sensitive enough to detect statistical significant atrophic differences in many temporal ROIs between: sMCI vs pMCI 4) Higher, but not significant, AUC to discriminate pMCI due to AD in a time span of 2 years

Visual quality control
All the post-processed scans output by neuGRID and VIP were quality controlled by an expert evaluator, who visually inspected them using the Matlab Imaging toolbox for 3D surfaces, which enables the user to rotate, zoom in and out the cortical surface along all the possible orientations. A reconstructed mesh was judged accurate when all the following 23 Sulci were visible and correctly reconstructed: (

Hybrid Template Generation enabling head-to-head (H2H) comparison
Cortex surfaces as extracted by Civet and Freesurfer are morphologically and topographically different. For an accurate comparison to be possible, it was necessary to deform the surface morphology of at least one algorithm. To map each point of one surface onto the other, we adopted an elastic non-rigid registration to get the right displacement vector. To our knowledge, Gradient Vector Flow (GVF) has not been used before to control 3D free form deformation. The vector field computed via GVF provided the directions along which each vertex of our source surface could evolve to match a corresponding point on the target surface. Once registered, space coordinates of each face vertices are coincident and vertices are spatially aligned. Subsequently, in order to compare the correct cortical index value at each vertex, we adopted the Closest Point Search (CPS) technique, essential to establish the correct topographical match of the same morphological points obtained with 3D GVF. For each point, CPS returned the mutual match between Civet's and Freesurfer's cortical thickness array. The entire process enabling the head-to-head comparison is illustrated in Fig. 1. The procedure was implemented using Matlab (v2009b). The data generated in this study are made publicly available to promote the evaluation of cortical thickness tool (https://neugrid4you.eu/datasets).

Atlases and ROIs Definition
The head-to-head comparison and the ROI analyses between pipelines were done using the Harvard-Oxford cortical structural atlas. We chose 28 out of the 48 cortical areas provided [17], consistently with those used by other reference work groups [18][19][20][21]. For a complete list of the selected ROIs, see Table 3.

Statistical analysis to compare Cortical Thinning patterns
Cortical thinning within the same diagnostic groups was assessed using paired samples t-tests. P-maps were corrected for multiple comparisons using the False Discovery Rate (FDR; α = 0.01) method [22]. Tukey-Kramer post-hoc testing of ANOVA (α = 0.05 in cross sectional comparison and α = 0.01 in longitudinal analysis) was used to test thinning differences among the diagnostic groups and the different ROIs analyzed. Effect sizes were computed as Hedge's g and Z-tests were performed to assess significant discrepancies between the performances of each pipeline. Correlations of cortical thickness to MMSE scores and hippocampal volumes were investigated, Steiger's Z was used to assess significant differences between Pearson's r values. Logistic regressions were applied on pre-selected thickness ROIs, and Receiver Operating Characteristic (ROC) curves were used to assess discriminative accuracy of the two pipelines. AUCs were statistically compared using the method adopted by Hanley and McNeil [23], setting the threshold for significance at a p value of 0.05. Kendall's tau coefficients were calculated and the derived z-test converted into the Pearson's correlation coefficient. Statistical analysis was performed with Matlab (v2009b).

Cortical Metrics
Both pipelines define thickness as the Euclidean distance and both can produce maps not restricted to the original MRI voxel resolution: thus, they can detect sub-millimeter differences between and within groups [8,24]. For the sake of this article, we defined the concept of "disease effect" as the relative predominance of one pipeline over the other to detect atrophy when comparing two groups (G) or two time-points (T): The values of the disease effect are mapped vertex by vertex on the hybrid template previously created (see Figs. 2 and 3 panel b).

Comparison of cortical metrics
The reconstruction of cortical thickness from B2B scans provided identical outcomes within the same pipeline (see S2 Fig.).
Compared to Civet, Freesurfer provided absolute values systematically lower by about 30% (see S3 Fig.). The difference between Civet and Freesurfer with respect to between-subjects variability (CoV) [25] ranges between 17-26% in the different diagnostic groups. The whole cortical thickness value at baseline and at month 24 is reported as S2 Table; both Civet and   Freesurfer showed increasing values of thinning rates with the progression of the pathology. The relative percentage of thinning in paired diagnostic groups at baseline is reported as S3 Table; no statistical differences among the groups were detected in neither pipelines. The percentage of longitudinal thinning rate across the four different diagnostic groups is reported as S4 Table; both pipelines detected differences between AD versus CTR, and between AD versus sMCI; moreover, Civet was able to detect a significant longitudinal thinning difference between pMCI versus CTR.
Cross-sectional and longitudinal thinning differences between Civet and Freesurfer Fig. 2 compares CTR with sMCI, pMCI, and AD at baseline, and shows the details of the differences between Civet and Freesurfer at the individual vertex level. Fig. 3 compares, for each diagnostic group, the longitudinal (2 years) cortical thinning rate at the individual vertex level as computed by Civet and Freesurfer. Table 3 represents the comparison of the cross-sectional thickness differences at baseline, while Table 4 represents the longitudinal thinning rates with respect to the 28 selected ROIs. Crosssectionally, the multiple comparison procedure highlighted small differences. Civet indicated as significant the temporal planum ROI, while Freesurfer identified as significant the superior parietal lobe. Longitudinally, Civet appeared to be much more sensitive in detecting significant thinning rate differences between CTR and AD in all the 28 ROIs considered, as opposed to only 22 ROIs as detected by Freesurfer (check symbol ¥). Comparing sMCI to AD, Civet was able to detect significant longitudinal thinning rate changes in all the 28 ROIs, compared to only 7 ROIs in Freesurfer (check symbol •). Again, Civet was able to detect significant longitudinal thinning rate changes between CTR and pMCI in 18 ROIs, as opposed to only 10 ROIs in Freesurfer (check symbol ¢). Lastly, Civet detected significant longitudinal thinning rate changes also between sMCI and pMCI in 10 ROIs (check symbol X) while Freesurfer could not find any variations. P values for multiple comparisons were always more significant in Civet (P < 0.0001).

Effect sizes
The effect sizes were derived as the Hedge's g (Fig. 4). In the cross-sectional analysis, we decided to represent only CTR versus pMCI and versus AD, being these the combinations of highest interest when defining populations for disease-modifying and clinical trials. The effect size was always above 0.8 in those cortical regions expected to be heavily affected by the disease neuropathology. In CTR versus pMCI, Freesurfer's effect size was always higher. Only the posterior division of the temporal fusiform cortex was found to be statistically different (p<0.05) between the two pipelines. In CTR versus AD, the Hedge's g values followed the same trend for both algorithms without any statistical difference. Longitudinally, Hedge's g trends were pretty similar for the two algorithms and increasing with the disease progression. No statistical differences were found in any ROIs or groups.

Cortical thickness versus cognitive impairment and hippocampal volumetry
Pearson's r correlation coefficients of regional cortical thickness with MMSE scores and quantitative hippocampal volume measurements (NeuroQuant- [26]) were investigated in each There is a consistent delta (±0.3 mm) among the compared groups. Negative value means higher disease effect for Freesurfer (i.e.: parietal-temporal and precuneus areas); positive value means higher disease effect for Civet (i.e.: association areas and limbic parts of the cortex). C) Statistical difference maps (p<0.01 FDR-corrected). No significant voxels were found comparing CTR to sMCI. Atrophic areas were found contrasting pMCI with CTR (i.e.: the posterior cingulate, temporal lobe and frontal gyrus) with both tools. Comparing CTR versus AD the ROI (see Fig. 5 panels A and B) within the CTR and pMCI patients, which represent the most appropriate population for innovative clinical trial designs.
In the CTR group, the relationship between pipelines' cortical thickness and cognitive function or hippocampal atrophy was generally weak (-0.2 < r < 0.2), cross-sectionally and longitudinally. This was expected due to the absence of the disease in these completely asymptomatic subjects. However, significant differences between Civet and Freesurfer were found in few areas (i.e.: frontal, parietal, occipital, and temporal).
In pMCI, the product momentums grew up to a medium and high levels (-0.27 < r < 0.64) especially for some expected ROIs, such as: precuneus cortex, cingulate and parahippocampal gyri. Significant differences between Civet and Freesurfer were found in a number of ROIs (i.e.: frontal, parietal, occipital, limbic, and temporal). Both Civet and Freesurfer cortical thickness measurements correlate better with hippocampal atrophy measurements than with neuropsychological tests. Fig. 6 shows the Receiver Operating Characteristic (ROC) curves used to discriminate pMCI and AD patients from the CTR group at baseline, together with the longitudinal cortical pattern used to discriminate pMCI. Identifying the most informative ROI was mandatory to reduce the dimensionality problem. In order to maximize the discriminatory power, we adopted a sequential forward search strategy (i.e., adding successive ROIs to the target set) as feature selection criterion. The goal was to find the best combination of ROIs for both tools with the highest discriminatory power. The best ROIs used to generate the final ROCs were different in each curve and for each algorithm. We started selecting those ROI with the highest effect size; at each further step, we assessed other ROIs with a medium-large effect size (d > 0.8 in cross sectional analysis; d > 0.6 in longitudinal analysis). This process reduced the inherent noise of high-resolution data, as well as the risk of over-fitting. Logistic regressions on regional cortical thickness in the selected combinations of ROIs were performed to build ROC curves, AUCs and the relative Intervals of Confidence (CI). No statistical difference (p>0.05) was found between the AUCs derived with Civet and those derived with Freesurfer. At baseline, CTR versus pMCI yielded 0.8953 and 0.9313 (z = -0.46, r = 0.31), while CTR versus AD yielded 0.9568 and 0.9677 respectively (z = -0.38, r = 0.46). In the longitudinal framework, pMCI yielded 0.7503 and 0.7874 (z = -0.34, r = 0.21). Freesurfer performed slightly better in terms of classification accuracy, both on cross sectional and longitudinal analyses.

Discussion
This study could be considered as a first attempt to verify the mutual strengths and weaknesses of Civet and Freesurfer in a real head-to-head challenge, at the precision level of the single voxel. In the literature, only phantom-based validation methods have been used [27,28] but this kind of approach does not take into consideration every aspects of real data. We investigated and compared the performances of Civet and Freesurfer when applied to the same ADNI1  In CTR and sMCI, both pipelines report a very mild and widespread cortical thinning rate in the motor, somatosensory, verbal and visual association cortex. In pMCI, the atrophy peaks at rates around 0.3 mm in the medial temporal cortex, temporal-parietal-frontal neocortices, with sparing of the sensorimotor strip and of the visual cortex. In AD, the atrophy in the same areas accelerates beyond 0.4 mm. B) Disease effect maps. The mean estimate of the longitudinal disease effect in CTR and sMCI as computed by Freesurfer is greater, although Civet shows higher results in few scattered areas. Furthermore, in the entire disease spectrum, Freesurfer exhibited higher disease effect in the motor cortex. In pMCI, Civet exhibits a greater disease effect except for the cingulate gyrus, while in the AD group the exception is represented by the precuneus. C) Statistical difference maps (p<0.01 FDR-corrected). In CTR, Civet detects an atrophic cluster in the angular gyrus; while Freesurfer in the precuneus and in the temporo-occipital lobe. The pattern in sMCI was more reduced than in CTR. In pMCI Freesurfer was not able to find many regions detected by Civet with the same significance and extension (i.e.: orbital, triangulal, and opercular portion of the inferior frontal gyrus, transverse-temporal and groups which included subjects on the entire disease spectrum, as monitored in a 2-year time frame. The analyses showed commonalities and differences.
Civet and Freesurfer are characterized by specific and distinctive procedures, making it difficult to compare their outputs. This problem was solved adopting a combined approach, applying both the GVF and CPS to ensure a robust comparison of meshes characterized by different morphometry and topography completely different. Thanks to the direct vertex-by-vertex cross-algorithm comparison, the differences between the two algorithms, with regard both to cross-sectional and longitudinal analysis, were analytically mapped.
Differences between thickness evaluation of the first test (MPRAGE) and that of the retest (MPRAGE-Repeat) did not appear, suggesting high repeatability. Both Civet's and Freesurfer's performances changed according to the disease stage, pointing out that neither algorithm can be considered better than the other, or the best acting. Freesurfer systematically underestimated the absolute thickness by about 1 mm if compared to Civet's performance. Explanations for this evidence are not trivial. However, the restriction of Freesurfer to 1.0 mm as resolution for the volumes to be processed could be one possible reason. Civet, relying on the volumetric Laplacian approach, can use higher resolutions (e.g.: 0.8 or 0.9 mm) often adopted in ADNI1. An important role might be also played by the different mathematical procedures used by the two tools when reconstructing the gray matter sheet. Moreover, the skeleton reconstruction method adopted by Civet to build the GM sheet tends to overestimate the cortical thickness in case of blurred regions (i.e.: regions affected by noise where CSF volume is small); on the other hand, Freesurfer relies on the inner white deformation surface approach, which can be strongly influenced by the anatomical accuracy of the surface reconstruction at both inner and outer boundaries, thus giving a partially unfair anatomical accuracy of the surface reconstruction and assessment of the cortical thickness.
Cross-sectionally, both algorithms were sensitive to cortical thinning in those cortical regions heavily affected by the neuropathology. Comparing CTR to pMCI, the regions of significance found by both tool were overlapping with the those found comparing CTR and AD, albeit smaller, indicating that the differences in cortical thinning are progressive and well detectable even before a formal diagnosis of AD. This means that both tools can detect the characteristic signature of AD. Both Civet and Freesurfer were able to efficiently differentiate CTR from the AD and pMCI. All the ROIs granting such a good discrimination rate belonged to the temporal lobe. An interesting consideration for future works is the possibility to use Civet and Freesurfer to differentiate AD in particular subclasses, namely familial AD, early onset AD, and late onset AD [29,30].
Longitudinally, both pipelines showed more statistically atrophic clusters in CTR than in sMCI, but this should be considered as a confounding phenotypic effect due to demographic, numerosity, clinical and other genetic characteristics. Further analyses with a larger sample will be conducted to clarify this particular behaviour. In pMCI, Civet was able to highlight a characteristic atrophic pattern involving expected temporal areas, such as the inferior margin of central gyrus and extended lateral frontal-parietal areas, as expected. The Civet's mesial part of the superior frontal cortex, inferior parietal cortex, the superior temporal gyrus). Freesurfer was more sensitive in few scattered expected and unexpected regions. For both pipelines, the longitudinal AD shrinkage showed significant areas throughout the temporal, frontal and parietal lobes, consistently with the progression of the disease. Some shrivelling differences were detected in the anterior division of the cingulate, in the limbic lobe and in the cuneus. D) Overlapping and not-overlapping atrophic regions are shown. Significant voxels detected by both pipelines are in yellow; voxels detected only by Civet are in blue; voxels detected only by Freesurfer are in red. CV: Civet; FS: Freesurfer; L: Left hemisphere; R: Right hemisphere; CTR: Normal elderly controls; sMCI: stable MCI; pMCI: progressive MCI; AD: Alzheimer's Disease.
doi:10.1371/journal.pone.0117692.g003 Table 4. Longitudinal ROI-based analysis.   In the CTR group, no significant differences between ROIs were detected in the two pipelines at BSL. At M24, significant differences between the two pipelines were found in the: middle frontal gyrus; inferior frontal gyrus-pars triangularis; superior parietal lobule; anterior division of the supramarginal gyrus; anterior and posterior division of the superior temporal gyrus. Longitudinally, no significant differences between ROIs were detected higher effect size and its more representative cortical signature suggest that this tool can detect the typical atrophic patterns in subject that will convert to AD within 2 years more efficiently. In the discriminant analysis, Civet produced an AUC slightly lower than that produced by Freesurfer; but this was probably due to random noises that confuses classifiers, producing changes hard to predict and control. Additional explanation can be related to the fact that longitudinally, on a vertex-by-vertex basis, Civet showed a more extensive effect than Freesurfer, while on a ROI basis the differences between the pipelines were not significant. In the AD cohort both Freesurfer and Civet were analogously sensitive to the thinning in the two pipelines. In the pMCI group, significant difference between the two pipelines was found at BSL in the: anterior division of the superior temporal gyrus. At M24, significant difference between the two pipelines was found in the: superior division of the lateral occipital cortex. Longitudinally, no significant differences between ROIs were detected in the two pipelines. Pearson's r coefficient of cortical thickness versus NeuroQuant hippocampal volume (panel B): In the CTR group, significant difference between the two pipelines at BSL was found in the: anterior division of the parahippocampal gyrus. At M24, significant differences between the two pipelines were found in the: inferior frontal gyrus-pars opercularis; anterior and posterior division of the parahippocampal gyrus; anterior division of the temporal fusiform cortex. Longitudinally, significant differences between the two pipelines were found in the: Heschl's gyrus and temporal planum. In the pMCI group, significant difference between the two pipelines was found at BSL in the: precuneus cortex. Longitudinally, significant differences between the two pipelines were found in the: anterior division of the supramarginal gyrus, superior division of the lateral occipital cortex, posterior division of the superior temporal gyrus, posterior division of the inferior temporal gyrus, temporo-occipital part of the inferior temporal gyrus. In panels A and B, * symbol stands for p<0.05 (Steiger's z-test patterns. As far as the correlation between the cortical thinning and hippocampal atrophic rate is concerned, Freesurfer showed a better trend, probably due to the exploitation of the longitudinal stream. Given its progressive alteration along the MCI-to-AD course, cortical thickness seems to be a promising neuroimaging candidate marker. With few exceptions, the two algorithms showed robust multi-ROI correlation patterns fairly consistent with the usual clinical and regional neuroimaging biomarkers, thus producing new, 3D, global profiles of the disease progression.
Ultimately, having reliable 3D diagnostic markers would enable clinicians to identify and treat MCI patients who will evolve into AD patients in a timely manner, as disease-modifying treatments will become available.
Future studies, including the MR 3.0 Tesla field strength, additional time points, extended age range of subject, larger and additional groups, might be helpful to further address the spatial and temporal atrophic pattern of the Alzheimer's changes.
Freesurfer and Civet have been validated against either histological analysis or manual measurements [31][32][33][34], but none of them has been contrasted against different stages of the Alzheimer's pathology. Future works should focus on further validating both pipelines against a database of cortical thickness derived from a population of normal and abnormal cadaveric brains, such as those recently defined in the BigBrain initiative (https://bigbrain. loris.ca/). Some limitations should be considered in the interpretation of the present results. First, the tools here described need to be further compared with other recent available techniques, such as: Toads-Cruise [35], ARCTIC [36], MILXCTE [37], DiReCT [38], or CLADA [39]. Second, as expert manual rater in neuroimaging represents the gold standard, independent evaluators should compare the performance and accuracy of each automatic pipeline. Third, each tool should be validated against harmonized MR datasets, such as: standardized ADNI analysis dataset [40], WW-ADNI [41], AddNeuroMed [42] and OASIS [43]. Fourth, computational time is worth consideration: the extensive use of Civet or Freesurfer to analyse large volumes of data mandatorily requires HPC, Grid or Cloud resources, due to the protracted processing time needed. Additional developing and programming can make these algorithms more reliable, faster and slighter.

Conclusion
Both Civet and Freesurfer demonstrated high sensitivity to cortical gray matter changes crosssectionally and longitudinally. Additional efforts are needed to clarify the ability of these tools to address particular clinical and research questions concerning the future use of cortical thickness as a biomarker, and in particular their ability to: (I) predict cortical decline along different time points, (II) reduce the number of patients needed for future clinical trials, (III) help monitoring the efficacy of disease modifying drugs.   Table. Whole brain absolute mean cortical thickness (mm) ± standard deviation (σ) for each diagnostic group at baseline and month 24. (TIF) S3 Table. Cross-sectional thinning percentages (%) ± standard deviation (σ) in paired diagnostic groups at baseline. (TIF) S4 Table. Longitudinal thinning percentage (%) ± standard deviation (σ) in each diagnostic group in a time span of two years. (TIF) investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ ADNI_Acknowledgment_List.pdf