Multiple Visual Rating Scales Based on Structural MRI and a Novel Prediction Model Combining Visual Rating Scales and Age Stratification in the Diagnosis of Alzheimer's Disease in the Chinese Population

Objective: To explore the value of multiple visual rating scales based on structural MRI in the diagnosis of Alzheimer's disease (AD) in the Chinese population. Materials and Methods: One hundred patients with AD and 100 age- and gender- matched cognitively normal controls were enrolled in this study. All the participants underwent neuropsychological tests and a structural MRI scan of the brain, among them, 42 AD cases and 47 cognitively normal controls also underwent 3D-T1 weighted sequence used for the analysis of voxel-based morphometry (VBM). The AD cases were divided into mild and moderate–severe groups according to the mini-mental state examination. Each participant was evaluated by two trained radiologists who were blind to the clinical information, according to the six visual rating scales, including for medial temporal lobe atrophy (MTA), posterior atrophy (PA), anterior temporal (AT), orbitofrontal (OF) cortex, anterior cingulate (AC), and fronto-insula (FI). Finally, we estimated the relationship between the visual rating scales and the volume of corresponding brain regions, using correlation analysis, and evaluated the specificity and sensitivity of every single scale and combination of multiple scales in the diagnosis of AD, using a receiver operating characteristic (ROC) curve and establishing a logistic regression model. Results: The optimal cutoff of all six visual rating scales for distinguishing AD cases from normal controls was 1.5. Using automated classification based on all six rating scales, the accuracy for distinguishing AD cases from healthy controls ranged from 0.68 to 0.80 (for mild AD) and 0.77–0.90 (for moderate–severe AD), respectively. A diagnostic prediction model with a combination of MTA and OF results was made as follows: Score = BMTA(score) + BOF(score) −1.58 (age < 65 years); Score = BMTA(score) + BOF(score) −4.09 (age ≥65 years). The model was superior to any single visual rating scale in the diagnosis of mild AD (P < 0.05). Conclusion: Each of the six visual rating scales could be applied to the diagnosis of moderate-severe AD alone in the Chinese population. A prediction model of the combined usage of MTA, OF, and age stratification for the early diagnosis of AD was preliminarily established.

Objective: To explore the value of multiple visual rating scales based on structural MRI in the diagnosis of Alzheimer's disease (AD) in the Chinese population.
Materials and Methods: One hundred patients with AD and 100 age-and gender-matched cognitively normal controls were enrolled in this study. All the participants underwent neuropsychological tests and a structural MRI scan of the brain, among them, 42 AD cases and 47 cognitively normal controls also underwent 3D-T1 weighted sequence used for the analysis of voxel-based morphometry (VBM). The AD cases were divided into mild and moderate-severe groups according to the mini-mental state examination. Each participant was evaluated by two trained radiologists who were blind to the clinical information, according to the six visual rating scales, including for medial temporal lobe atrophy (MTA), posterior atrophy (PA), anterior temporal (AT), orbitofrontal (OF) cortex, anterior cingulate (AC), and fronto-insula (FI). Finally, we estimated the relationship between the visual rating scales and the volume of corresponding brain regions, using correlation analysis, and evaluated the specificity and sensitivity of every single scale and combination of multiple scales in the diagnosis of AD, using a receiver operating characteristic (ROC) curve and establishing a logistic regression model.

Results:
The optimal cutoff of all six visual rating scales for distinguishing AD cases from normal controls was 1.5. Using automated classification based on all six rating scales, the accuracy for distinguishing AD cases from healthy controls ranged from 0.68 to 0.80 (for mild AD) and 0.77-0.90 (for moderate-severe AD), respectively. A diagnostic prediction model with a combination of MTA and OF results was made as follows: Score = B MTA(score) + B OF(score) −1.58 (age <65 years); Score = B MTA(score) + B OF(score) −4.09 (age ≥65 years). The model was superior to any single visual rating scale in the diagnosis of mild AD (P < 0.05).

INTRODUCTION
Alzheimer's disease is a neurodegenerative disorder mainly characterized by an insidious but progressive loss of memory, accompanied by personality changes and behavior disorders. It is the most common type of dementia in the elderly, and the prevalence is 11% in those over the age of 65 years, and as high as 32% in those over the age of 85 years (1). Early diagnosis of AD is of great importance to the treatment, management, and prognosis (2,3). Molecular biomarkers contributing to the diagnosis of AD are becoming available but are not widely used in clinical practice. As a common screening means of AD, structural magnetic resonance imaging (MRI) plays a key role in the diagnosis of AD and has been included in the diagnostic guidelines (4)(5)(6). Although a number of sophisticated analysis methods are available to quantify global and regional atrophy from MRI, visual rating scales are highly efficient, rapid, and practical tools in clinical practice.
At the earliest, Scheltens et al. put forward a visual rating scale used to evaluate medial temporal lobe atrophy (MTA) in 1992 (7). The sensitivity and specificity of MTA were 81 and 67%, respectively, so it was considered one of the image markers of AD. In many studies, the rating scale (MTA) was subsequently applied. Recently, it has been included into the diagnostic guidelines for AD (4,6,8). In 2011, Koedam et al. put forward another evaluation method, posterior atrophy (PA), focusing on the structural changes of the posterior cingulate sulcus, precuneus, parieto-occipital sulcus, and the parietal cortex (9). The sensitivity and specificity were 58 and 95%, respectively. In addition, several other visual rating scales including anterior temporal (AT), orbitofrontal cortex (OF), anterior cingulate (AC) and fronto-insula (FI) were consecutively put forward (10)(11)(12)(13)(14). Recently, through evaluation of T1-weighted imaging in 184 post-mortem confirmed dementia patients, Harper et al. found that the combination of six visual rating scales was better than any single rating scale in the diagnosis and differential diagnosis of AD (15). The sensitivity and specificity of the established equation based on six visual rating scales were 94 and 89%, respectively, in distinguishing AD patients from normal controls.
However, no study has stated the value of multiple visual rating scales in the diagnosis of AD in China. To address this gap, we conducted a study to explore the value of multiple visual rating scales based on structural MRI in the diagnosis of AD in the Chinese population and to combine the aforementioned visual rating scales to establish a simple and effective prediction model for early diagnosis of AD.

Subjects
This study included 102 AD cases (mild AD: 43, moderate and severe AD: 59 Inclusion criteria for cognitively normal controls: 1) No complaint of cognitive impairment.
2) The score of the MMSE test was in the normal range.
3) Age, sex, and the years of education should be matched with the AD cases.
Exclusion criteria of cognitively normal controls: 1) Subjects who could not complete the MRI scan due to embedded metal objects in the body (dentures, stents, pacemaker, metal fixtures). 2) Pregnant and lactating women.
3) Subjects with a severe systemic diseases (patients with severe hepatic disease, or a long history of chronic hepatic disease and the alanine aminotransaminase (ALT) and aspartate aminotransaminase (AST) exceed the 1.5 times the upper limit; patients with renal dysfunction; with uncontrolled hypertension; with uncontrolled hyperglycemia; with severe cardiac, pulmonary, or hematological diseases).

Data Collection of Visual Rating Scales Evaluation and Voxel-Based Morphometry (VBM)
First, two radiologists with at least 10 years of working experience in neuroimaging were trained on consistency of visual rating scales evaluation. Subsequently, visual rating of the T1 weighted sequence of all included participants was performed by the two trained radiologists blind to all clinical and pathological information. Six brain regions were rated according to existing scales. Detailed evaluating rules of the six visual rating scales were described in the previous studies (7,(9)(10)(11)(12)(13)(14) and Figure 1.
To improve the consistency of rating, two selected radiologists were trained several times, and slice selection of the structural MRI was specified.
To explore the relationship between each rating scale and pattern of gray matter volume loss, VBM was performed using SPM-8 (Statistical Parametric Mapping, Version8) and MATLAB2010a (uk.mathworks.com/products/matlab). In total, 89 individuals, including 42 AD cases and 47 normal controls, were enrolled to perform the analysis of VBM. Because the preprocessing and analysis of original images varied with different sequences, we needed to classify the original images before processing the data from the MRI. In this study, Dcm2AsiszImg software was used to complete the classification of original images. Before statistical analysis, the classified data should have undergone specified preprocessing achieved by using MATLAB2010a and SPM-8. The processing flow of images included motion correction, spatial normalization and segmentation and smoothness of brain tissue imaging. The realignment of head movement aimed to reduce the impact of noise produced by head movements on signal. We used the EPI template of SPM8 to normalize the image data of all subjects and transformed the original images into template images in units of a volume of 3 × 3 × 3 mm. Subsequently, correction for local nonlinear deformation was performed to eliminate local subtle difference and register the data of all subjects into the Montreal Neurological Institute (MNI) space. Gray matter, white matter, and cerebrospinal fluid (CSF) maps were obtained using the unified segmentation approach (16). We used 4 × 4 × 4 full-width half-maximum (FWHM) function to smooth the space to reduce spatial signal-to-noise ratio further and the error caused by space normalization to individuals.
We selected the corresponding brain regions of MTA, PA, OF, AC, AT, and FI, then used Representational State Transfer (REST) software to make a mask. Eventually, the mask was used to extract the gray matter signal of corresponding brain regions. The maps of gray matter signal of corresponding brain regions are presented in Figure 1.

Consistency Evaluation
The Assessment of Consistency Between Raters Intra-class correlation (ICC) is one of the reliability coefficients to evaluate the interobserver reliability and test-retest reliability. The value of ICC ranges from 0 to 1. A value of ICC lower than 0.4 indicates poor reliability, a range from 0.4 to 0.75 indicates ordinary reliability, and higher than 0.75 indicates good reliability. It is generally acknowledged that the value of ICC should be higher than 0.70 (17,18).

Correlation Analysis Between the Score of Each Visual Rating Scale and Gray Matter Signal of Corresponding Brain Region
To ascertain whether the visual rating scales can really reflect the atrophy of corresponding brain regions, we took the intracranial volume, age, and gender as control variables, and performed partial correlation analysis to estimate the correlation between the score of each visual rating scale and the gray matter signal of the corresponding brain region.

Exploration of the Value of a Single Visual Rating Scale in the Diagnosis of AD
According to the evaluation results of each visual rating scale, the receiver operating characteristic (ROC) curve was drawn to ascertain the optimal cutoff to diagnose AD, and the sensitivity, specificity, and area under the ROC (AUC) of each visual rating scale were calculated respectively. The individuals were divided into two groups according to age (age ≥65 years; age <65 years). The scores (rounded to the nearest integer) of the six visual rating scales and age were enrolled as the concomitant variables of the regression equation. Given that the preceding variables were ordered multivariate statistics, each visual rating scale was set as a dummy variate and diagnosis was set as the dependent variate. A model was established through stepwise selection of the binary logistic regressions. Finally, the optimal model that could distinguish the mild AD cases from the normal controls was ascertained according to the variation of −2 log likelihood.

Statistical Analysis
All data processing and analyses were performed using SPSS v.21.0 (IBM, West Grove, Pennsylvania, USA) software. Measurement data were presented as means and standard deviations (SDs), and categorical data were presented as proportions. Differences of the measurement variables were tested using two-sample Student's t-test or analysis of variance test. Differences of the categorical variables were tested by the 2 tests. Differences of the ranked data were compared using the Wilcoxon rank sum test. The correlation analysis of the two groups of measurement data was performed using partial correlation analysis. The correlation analysis of categorical variables was performed using logistic regression analysis. For all statistical tests, p < 0.05 was considered significant.

Demographic and Clinical Data
In the early quality control of the imaging data sets, three subjects (2 patients and 1 normal control) were excluded due to excessive head movements during the MRI scan. Ultimately, a total of 200 subjects were enrolled in this study, including 100 patients meeting the diagnostic criteria of "clinical probable AD" and 100 age-and gender-matched healthy controls. The demographic and clinical data of the enrolled subjects are described in Table 1.

Assessment of Consistency Between Raters
The value of ICC in this study ranged from 0.70 to 0.83, which indicates a good consistency of rating between raters. The detailed information is shown in Table 2.

Correlation Analysis Between Each Visual Rating Scale and MMSE
All six visual rating scales have a negative correlation with scores of the MMSE. The correlation coefficient range was −0.35 ∼ −0.48 and was statistically significant (p < 0.05). The detailed information is shown in Table 3.

VBM and Correlation Analysis Between the Score of Each Visual Rating Scale and the Volume of Gray Matter of Corresponding Brain Regions
In total, 89 individuals, including 42 AD cases and 47 normal controls, were enrolled to perform the analysis of VBM. The overview of the participants' clinical data is described in Table 4. Research results indicated that there was a significant negative correlation (p < 0.05) between each visual rating scale and the volume of gray matter of the corresponding brain regions. Detailed information is described in Table 5. The age-and gender-matched cognitively normal subjects were selected randomly as the controls. The concrete clinical data are described in Tables 6, 7.      The optimal cutoff of all six visual rating scales which could distinguish moderate-severe AD cases from normal controls was 1.5. The sensitivity, specificity, and AUC of all six visual rating scale ranges were 0.51-0.72, 0.56-0.97, and 0.68-0.80, respectively. Among them, the AUC of MTA and OF ranked the highest and were both 0.80. The detailed data are described in Table 8.
The optimal cutoff of all six visual rating scales, which could distinguish moderate-severe AD cases from normal controls was 1.5. The sensitivity, specificity, and AUC of all the six visual rating scale ranges were 0.78-0.87, 0.68-0.95, and 0.77-0.90, respectively. Among them, the AUC of MTA, AC, and OF ranked the highest and were all 0.90. The detailed data are described in Table 9.

The Value of a Prediction Model Combining Multiple Visual Rating Scales in the Diagnosis of Mild AD
Through the analysis of binary logistic regressions, three concomitant variables were enrolled, including two visual rating scales (MTA, OF) and age. The diagnostic prediction model was established as follows: Score = B MTA(score) + B OF(score) −1.58 (age <65 years); Score = B MTA(score) + B OF(score) −4.09 (age ≥65 years). When the value of the model ≥0, the person is estimated to have AD, when the value of the model <0, the person is estimated to be cognitively normal. The concrete parameters are described in detail in Tables 10, 11. The sensitivity, specificity, and AUC of this model in distinguishing mild AD cases from normal controls were 0.74, 0.93, and 0.92, respectively. Compared to the most effective single visual rating scale, MTA (AUC: 0.79, sensitivity: 0.62, specificity: 0.95), the difference between them was statistically significant (p < 0.05).

DISCUSSION
As one of the common screening means, structural MRI of the brain plays a key role in the diagnosis and differential diagnosis of dementia (20)(21)(22). Visual rating scales based on structural MRI could increase the accuracy of imaging assessment and provide radiologists and other clinical researchers a framework to describe the structural image. Evaluation of the visual rating scales was inevitably subjective; however, two studies (23,24) indicated that two trained   radiologists could have good consistency. Our study showed similar results. Consequently, we can conclude that the visual rating scales have a good repeatability and can be applied to clinical practice. One of the most widely used screening tools for AD, the MMSE, was established by Folstein in 1975 (25). The MMSE covers multiple cognitive domains, including orientation, memory, attention and calculation power, executive functioning, language and visuospatial functioning. The results of our study showed that all six visual rating scales had a negative correlation with the MMSE score. Our findings align with the fact that the MMSE is a comprehensive rating scale covering multiple cognitive domains and the atrophy of corresponding brain regions can decrease the points of corresponding cognitive domains. A previous study indicated that MTA and PA were negatively correlated with MMSE scores independently in AD cases (9), which was consistent with our study. The visual rating scale of MTA was first put forward by Scheltens et al., and the sensitivity and specificity of MTA in distinguishing AD from normal controls were 81 and 67%, respectively (7). In our study, the sensitivity and specificity of MTA were (62%, 95%) in mild AD cases and (86%, 89%) in moderate and severe AD cases. The subtle difference between the two studies may be due to the stratification of AD according to severity of disease. The previous study indicated that the sensitivity and specificity of FI in distinguishing early-onset AD from normal controls were 74 and 94%, respectively (15), close to our results (84 and 84%). AT was mainly used to estimate the atrophy of the temporal lobe in frontotemporal dementia (FTD) cases. A previous study indicated that the scores of AT related to the extent of atrophy at autopsy (11). However, AT could effectively differentiate AD from normal controls in our study, indicating that atrophy of the anterior temporal lobe can be found in AD cases. Anterior cingulate plays a key role in execution function. One study indicated that the volume of the anterior cingulate cortex had a negative correlation with two items of execution function in amnestic mild cognitive impairment (aMCI) cases, and it decreased significantly compared with normal controls (26). The significant difference of AC between AD cases and normal controls in our study was similar to the results of the preceding study. The visual rating scale, PA, focusing on the posterior cingulate, precuneus, parietooccipital sulcus, and parietal cortex, was put forward by Koedam et al. (9). To date, as the only visual rating scale targeting the posterior portion of the brain, PA can be used to differentiate AD from normal controls and other dementias. The sensitivity and specificity of PA in distinguishing AD from normal controls were 58 and 95%, respectively (9), which was superior to the performance in mild AD cases and inferior to the performance in moderate and severe AD cases in our study. Atrophy of multiple brain regions, including the insula lobe, anterior hippocampus, temporal pole, and orbital-frontal cortex, was found in AD cases compared to normal controls (12). The discovery indicates that there may be diffuse atrophy in AD brains, which is consistent with the high sensitivity and specificity of each single visual rating scale in distinguishing moderate and severe AD from normal controls in our study.
In our study, six visual rating scales and age were enrolled in the logistic regression equation to explore a prediction model that could distinguish mild AD cases from normal controls. The ultimate prediction model included three variables: MTA, OF, and age stratification. The sensitivity and specificity of this model in distinguishing mild AD cases from normal controls were 0.74 and 0.93, respectively, superior to the most effective single visual rating scale, MTA (sensitivity: 0.62, specificity: 0.95), and the difference between them was statistically significant (p < 0.05). To date, many studies confirm that medial temporal atrophy is related to AD (27,28). MTA was verified as practical and repeatable in clinical practice, and was related to the volume of the hippocampus (29,30). As well, increasing evidence indicated that the orbital frontal cortex was involved in the early stage of AD (31,32). These studies presented corresponding morphological changes of the medial temporal lobe and orbital frontal lobe cortex in the early stage of AD. Therefore, the combined use of MTA and OF may be superior to a single visual rating scale in the early diagnosis of AD and the higher the score of the model, the greater the possibility of AD.
Age is one of the influencing factors in brain atrophy, and visual rating scales after age stratification would have better accuracy (15,33,34). A previous study indicated that the score of MTA in the normal controls under the age of 70 years ranged from 0 to 1, and the score of MTA in the normal controls at the age of 70-80 years ranged from 0 to 2 (35). Given that there was research taking 65 years old as the cutoff point in the process of combining six visual rating scales to diagnose AD (15), our study divided AD cases into two groups (early-onset AD and late-onset AD) and enrolled age as a variate into the research model to be selected. The results showed that age stratification was enrolled into the prediction model finally, and the regression coefficient of age stratification was negative, which indicated that brain atrophy to some extent in the aged cases may be normal senile atrophy.
Our study found that every single visual rating scale could effectively distinguish AD cases from normal controls and was repeatable in the Chinese population, especially for moderate-severe AD cases. For mild AD cases, the prediction model of combining MTA, OF, and age stratification was better than using a single visual rating scale. There are still some limitations in our study as follows: The sample size was relatively small and the enrolled cases were mainly clinically probable typical AD cases. Consequently, a large sample size, detailed age stratification, and enrolling more AD cases, supported by positron emission tomography-computed tomography (PET-CT), CSF biomarkers, and autopsy, are necessary to obtain more accurate discoveries.

ETHICS STATEMENT
All procedures performed in studies involving human participants were approved by the Ethics Committee of Xiangya Hospital, Central South University in China, which was in accordance with ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Written informed consent was obtained from all subjects.

AUTHOR CONTRIBUTIONS
LS and TX involved in the study design. ZY and CP were responsible for the enrollment of the participants. ZY, CP, ML, and WZ were responsible for the estimation of the visual rating scales. LS, BT, and XY were responsible for the confirmation of the participants. ZY wrote the manuscript. LS and BJ modified and revised the manuscript. All authors have read and approved the final version of the manuscript.