Significance of Normalization on Anatomical MRI Measures in Predicting Alzheimer's Disease

This study establishes a new approach for combining neuroimaging and neuropsychological measures for an optimal decisional space to classify subjects with Alzheimer's disease (AD). This approach relies on a multivariate feature selection method with different MRI normalization techniques. Subcortical volume, cortical thickness, and surface area measures are obtained using MRIs from 189 participants (129 normal controls and 60 AD patients). Statistically significant variables were selected for each combination model to construct a multidimensional space for classification. Different normalization approaches were explored to gauge the effect on classification performance using a support vector machine classifier. Results indicate that the Mini-mental state examination (MMSE) measure is most discriminative among single-measure models, while subcortical volume combined with MMSE is the most effective multivariate model for AD classification. The study demonstrates that subcortical volumes need not be normalized, whereas cortical thickness should be normalized either by intracranial volume or mean thickness, and surface area is a weak indicator of AD with and without normalization. On the significant brain regions, a nearly perfect symmetry is observed for subcortical volumes and cortical thickness, and a significant reduction in thickness is particularly seen in the temporal lobe, which is associated with brain deficits characterizing AD.


Introduction
Alzheimer's disease (AD) is a neurodegenerative disease and is the most common form of dementia. Estimates from the Alzheimer Association as of March 2012 indicate that 5.4 million Americans are diagnosed with AD, and over 95% of this population are 65 years of age or older. Also, nearly half of the population over 85 years of age is affected by AD [1]. The worldwide societal cost of dementia is enormous, which is estimated to be 315.4 billion USD on the basis of a 29.3 million population diagnosed with dementia [2]. AD patients display disease-related regional cerebral atrophy, which can be distinguished from normal aging [3,4]. In AD, atrophy is often observed in regions which are closely related to neurodegeneration. Various studies have shown that atrophy in regions like the hippocampus [5][6][7][8], amygdala [5,9] and ventricles [10,11] is correlated with AD. Moreover, determination of the key atrophied regions across the entire brain could be used as parameters for the delineation of AD patients from cognitively normal (CN) subjects.
Freesurfer is a popular highly automated MRI image processing software widely used to generate regional measures from MRI scans. The advantages of Freesurfer over traditional manual segmentations and measures are its high automation and independence of operator subjectivity. Freesurfer is also accurate, precise and has been tested on large cohorts of studies in AD classification research [12][13][14][15].
Mini-mental state examination (MMSE) is a neuropsychological test that is most often administered to screen patients for cognitive impairment and dementia [16]. MMSE is used to judge the severity of cognitive impairment by administrating 30 questions aimed at testing the subject's orientation to time and place, attention, and calculation capabilities, as well as response to recall, language, and 2 The Scientific World Journal complex commands. The frequent use of MMSE in clinical environments makes it interesting to investigate its discriminative power in classifying AD subjects as compared to MRIbased measures.
Important tasks to be considered in AD classification studies include the choice of parameters, the way these parameters ought to be combined, and determining the preprocessing techniques to be employed in order to enhance the prospects of classification. Two essential questions that need to be addressed for AD classification studies are (1) which regional MRI measures produced by Freesurfer are statistically significant for classification of AD subjects? and (2) which normalization approach should be employed to minimize bias due to differences in head size and brain structure in order to enhance the classification performance?
Westman and his colleagues have investigated some aspects of the aforementioned issues using a supervised multivariate data analysis using the orthogonal projections to latent structures (OPLS) model [15]. OPLS is similar to principal component analysis (PCA) as they both are linear decomposition techniques and project the original data to the found latent variables. The approach of this study is an extension of a previous study [14], which proposes constructing for each classification model an optimal decisional space using the most statistically significant variables. The number of dimensions in the classifier is determined by an incremental error analysis, which in turn defines and ranks variables on their statistical significance to be used as input to an SVMbased classification process.
In this study, single-measure models and hierarchical models with and without normalization are both examined to find the optimal model. Single measure models include one of the regional MRI measures (subcortical volume, cortical thickness and surface area) or the neuropsychological test, mini-mental state examination (MMSE). A hierarchical model combines two or more of the single-measure models to examine if the interaction augments the classification process. The specific aims of this study are, thus, to determine (1) the impact of neuropsychological test (MMSE) towards the classification; (2) the combination of regional measures and MMSE that yields the best classification performance; and (3) which normalization scheme should be employed to achieve a better classification performance. All subjects had (1) a neurological and medical evaluations by a physician and (2) a full battery of neuropsychological tests [17], according to the National Alzheimer's Coordinating Center protocol, and the following additional  [12][13][14][15]20], was applied to all the MRI scans to produce 55 volumetric variables, including 45 volumetric measures of subcortical parcellation and 10 morphometric statistics. For cortical thickness, 34 regional variables were determined for each hemisphere, resulting in 68 variables for cortical thickness measures. Also, surface area was estimated from

Materials and Methods
The Scientific World Journal 3 35 regions of the brain for each hemisphere resulting in 70 measures for the entire brain.

Feature Extraction and Incremental Error Analysis.
All the variables in a given model are first ranked based on statistical significance between AD and CN. Following this ranking, an incremental error analysis is used whereby the SVM classifier is trained and tested adding a single variable at a time to the classifier to determine the combination of topranked variables that yield the optimal classification outcome. This rigorous blind feature selection technique differs from others as it does not rely on prior assumptions of regions of interest (ROI) and thus assigns equal weights to all the variables. The above process was performed on all models to compare their discriminative power and consequently identify the optimal model for AD classification. It should be noted that although regional atrophy among AD patients is what is generally sought, the statistical test considers both cases of atrophy and enlargement of these specific brain regions, since volumetric enlargement can be experienced in regions like the ventricles, which has been shown to be important in differentiating AD and its prodromal stages [7,11,21].

Normalization and Classification Experiment.
To explore the effect of normalization on the classification performance, MRI measures are normalized by the widely accepted morphometric measures like intracranial volume (ICV) for regional subcortical volumes, ICV and mean cortical thickness of the subject for regional cortical thickness, and ICV and the total surface area of the subject for regional surface area. A summary of the normalization measures is presented in Table 2. ICV is derived from the MRI and is one among the 10 morphometric statistics obtained by the Freesurfer pipeline. Mean cortical thickness is estimated by averaging the thickness of all the 68 regions of the brain for each subject. Similarly, total surface area is the sum of all regional surface area measures for a given subject.
Classification was performed using a support vector machine (SVM) classifier, which is shown to be effective as a classification tool for AD [22][23][24]. The kernel function of the SVM used for this particular study is the Gaussian radial basis function kernel (rbf ) with a scaling factor ( ) of 3. All the classification results reported here are based on a 5-fold cross validation process. Each classification experiment was run 50 times, the results of which are averaged to evaluate the performance in terms of accuracy, sensitivity, specificity, and precision.

Classification Performance and Model Selection.
Single measure models using only one type of the regional measures or MMSE were created for subcortical volume, cortical thickness, surface area, and neuropsychological data (MMSE) for both raw and normalized data. Hierarchical models were also created by combining two or more of the single models for both raw and normalized data. Feature selection based on statistical testing was performed for all the models created. The results of models with raw data are shown in Table 3 and the results for models with normalized data are shown in Table 4. All the results display an average of 50 runs with minimum and maximum values shown in parentheses.
Results of the different models are highly consistent as results of the 50 independent repetitions of classification fall within a small range as shown by the minimum and maximum values in Tables 3 and 4. This small range is a clear indication of the replicability of results, both essential attributes in any classification process. These results also indicate that MMSE is an important factor that should be included in the classification process. Inclusion of MMSE with other measures improves significantly the classification results. For example, in the case of the optimal model, hierarchical model using subcortical volumes (SV) with the inclusion of MMSE resulted in an improvement of 9.2% as compared to using SV alone. In retrospect, an average improvement of 13.3% is seen on comparing analogous models with and without MMSE when using raw data and 12.8% when using normalized data.
The classification results given in Tables 3 and 4 show that cortical thickness should be normalized by either the mean thickness of all the measured regions or ICV, while normalizing subcortical volumes to ICV does not have any significant effect. In a recent study, Westman et al. explored the normalization effect of regional MRI measures using orthogonal partial least square to latent structures (OPLS) models and concluded that both cortical thickness and subcortical volumes should not be normalized [15]. Both studies, thus, suggest that subcortical volumes should not be normalized to ICV. The divergence is seen in the normalization of cortical thickness. This could be potentially explained by the difference of the technique being used. Westman and his colleagues used an all variables inclusive model (OPLS) and the proposed method is feature selection based. The cause might be that normalization of cortical thickness brings down the variation of all the regions in general which OPLS model rely on but enhances variation in some specific regions that feature selection method might have selected. Thus, normalization of cortical thickness depends on the processing technique used. Also, the divergence can be due to the subtle differences in the data that is used for the study.
Since some models have very close performance in terms of the 4 recorded performance metrics (accuracy, sensitivity, 4 The Scientific World Journal  specificity and precision), models that give more than 90% accuracy are considered as good models and are italicized in Tables 3 and 4. Inclusion of additional measures does not guarantee a significant performance enhancement. A tradeoff exists between models with some displaying better accuracy at the cost of sensitivity and vice versa. In terms of accuracy, the model of "MMSE + SV" is the best, whereas, in terms of sensitivity, the model of "MMSE + CT (Mean)" is more appropriate.
A comparison of classification performance with recent studies in the literature is provided in Table 5. The results indicate that the proposed technique using MMSE and MRI can yield competitive classification performance as those using two or more imaging modalities or biomarkers. As Westman et al. described the concept of cost-benefits to assess the increased cost of combining biomarkers as the potential limitation [25], the proposed approach has the advantage of low cost yet high accuracy. In addition, the results in this The Scientific World Journal 5 study are based on a larger cohort than most other studies in the table.

Univariate Analysis of Anatomical
Measures. This section investigates how normalization affects the statistical significance of the variables that are used in the classification model. The effect of normalization can be determined by observing the change in the significance of the MRI measures when normalization is carried out. To illustrate the effect of normalization approaches on the statistical significance of region of interests (ROIs), univariate analysis was performed for subcortical volumes as shown in Table 6, on surface area for left and right hemisphere, respectively, as shown in Table 7, and on cortical thickness for left and right hemispheres, respectively, as shown in Table 8. Univariate analysis was created for the two hemispheres separately for both cortical thickness and surface area in order to inspect the possible pattern differences between left and right hemisphere. In Tables 6-8, the regions of the brain for which the significance of the variable differs between raw and normalized data are bolded. Please note that only those regions which show such a behavior for both the normalization techniques are highlighted in Tables 7 and 8. Table 6 shows that ICV normalization to the subcortical volumes does not change the statistical significance of the variables, particularly for the top-ranked variables, suggesting that normalizing subcortical volumes with ICV might not be necessary, which is consistent with the conclusion made previously that subcortical volumes are not recommended to be normalized to ICV as seen from the results provided earlier in Tables 3 and 4. More importantly, subcortical volumes and cortical thickness show symmetry between the left and right hemispheres for the top-ranked variables as shown in Tables 6  and 8. In other words, regions of the brain that are significant towards classification of AD subject are symmetrically located on either lobes of the brain. A typical example is seen in the top 5 ranked regions according to subcortical volumes which include both the right and left hippocampus and the right and left inferior lateral ventricles.
However, Table 7 shows that for the surface area there is almost no symmetry at all between the left and right hemispheres for both the raw and normalized data. This could possibly be explained by the fact that all variables found to be significant using surface area possess a -value close to the significance level threshold (0.05). Another point to be noted is that for both raw and normalized data, surface area has a smaller number of significant variables and relatively high -values, indicating that surface area may be generally regarded as a weaker biomarker of AD atrophy than the other two measures which are SV and CT.
The regions of the brain which are determined to be statistically significant are displayed in Figures 1-4. Figure 1 represents the top 5 significant subcortical volumes based 6 The Scientific World Journal  on raw data. Figures 2 and 3 represent the cortical regions of the brain which are found to be significant for AD classification using cortical thickness (CT) and surface area (SA) respectively on raw data. Figure 4 illustrates the change that is seen in the significant regions of the brain when surface area normalized to the total surface area is used as a measure, as compared to raw data as shown in Figure 3.
One interesting finding about cortical thickness in Figure 2 is that most of the significant regions belong to the temporal lobe, suggesting that the temporal lobe undergoes the most significant thickness change. This is consistent with the result found by some other studies [26,27], particularly the finding that large degree of thinning of temporal cortical thickness seen in AD while thinning is relatively reserved in normal aging [27]. The nonsymmetric atrophy pattern of surface area can be easily observed anatomically in Figures 3  and 4.

Spatial Distribution of Subjects under the "Best Model".
Model of "MMSE + SV" without normalization gives the highest classification accuracy which utilizes the top 3 variables found within the model (i.e., MMSE, righthippocampus volume, and left-inferior-lateral-ventricle volume). One typical distribution of the data points for this classification model is plotted in Figure 5 to show the clustering characteristics of the data when MMSE and subcortical volumes are employed. Using this optimal decisional space, it can be observed that all the normal subjects are grouped into a very compact cluster, whereas AD subjects are more sparsely distributed in context of these dimensional parameters. This indicates the complex pattern of atrophy undergoing among the AD patients, which renders the classification task extremely difficult.

Model Efficiency Estimation and Normalization.
Variation in measures can come from many sources, including variation due to AD atrophy ( 2 ADa ), which is of primary interest for classification purposes, as well as other variation noise ( 2 ) like individual difference in brain size, structure of brain regions, MRI measure error, region segmentation error, atrophy due to normal aging, and resistance to brain atrophy (e.g. cognitive reserve). Generally, the total variance can be described as follows: where 2 total is the total variance of dataset, 2 ADa stands for variance due to AD atrophy and 2 is the variance due to what is termed here as an overall source of noise. Also, discriminative power of a model depends on the amount of variance due to AD atrophy captured by the model used in contrast to the variance due to noise. A relevant term called discriminative power (Dp) can be estimated using where2 ADa is an estimate of the variance due to AD atrophy captured by the model and2 stands for the estimated variance due to noise captured by the model. Our results, thus, show that normalization in general does not enhance the classification performance significantly, which could be explained through (2) which shows that normalization does bring down correlated noise (2) experienced through brain size difference, but it also lowers the correlated variance due to AD atrophy( 2 ADa ). A supporting finding of this assumption is that proportional volumes of the superior temporal cortex, expressed as a proportion of total cerebral volume, were significantly different between females and males [28], which exemplifies the fact that normalization may be intrinsically biased. A similar finding by Barnes et al. is that normalization of all volumes by head size is not adequate due to their nonproportional relationship [29]. Also, Ross et al. found that males generally have a larger overall brain size than females, and males have larger cerebral cortical volumes than females except for left parietal [30]; thus, normalization will at least bring in noise to the regions in left parietal as the regions in that area for males have a smaller size but, normalized to a larger head size. However, the Dp value could still serve as a measure of a model's performance if relevant sources of the variance are known and are quantifiable, which is not the case in most practical scenarios.

Conclusion
This paper studied the effect of normalization on the proposed statistical feature selection approach using ROIs segmented by Freesurfer and a neuropsychological test in terms of classification performance. The results shows that subcortical volume should not be normalized and surface area does not bear much discriminative information as compared to subcortical volumes or cortical thickens. Also, subcortical volumes and cortical thickness based brain maps of significant regions show symmetry between the two hemispheres which is not seen in the brain maps generated using surface  area. Moreover, the feature selection method implemented on cortical thickness measures show that normalization to either ICV or mean thickness exhibits an enhancement on the classification performance, and the most pronounced changes in the cortical thickness related to AD are seen in the temporal lobe of the brain, which is shown to be related to symptoms in AD patients regarding organization, language, understanding, and so forth. A comparison of results using the optimal model which combines MMSE with subcortical volumes shows that the proposed study achieved competitive accuracy of 92.3% using fewer biomarkers, which makes it costeffective and convenient.