A Novel Dementia Scale for Alzheimer’s Disease

Objective: We established the diagnostic accuracy of the “ABC Dementia Scale” (ABC-DS) for Alzheimer’s disease (AD), which concurrently assesses activities of daily living (“A”), behavioral and psychological symptoms of dementia (“B”), and cognitive function (“C”), using a novel scoring approach called the three-dimensional distance (TDD). Methods: The ABC-DS has 13 items with nine ordered categorical levels. Caregivers were interviewed using a semi-structured interview. The construct validity, concurrent validity, test-retest reliability, and responsiveness (score changes over 12 weeks) were assessed. Results: We enrolled 63 participants with probable AD as well as 88, 106, and 55 patients with mild, moderate, and severe AD, respectively. The construct and concurrent validities of each domain score were determined. The TDD accurately discriminated the AD stages and detected score changes indicating disease progression over 12 weeks. Conclusion: The ABC-DS is stable, accurately stages AD severity, and monitors disease progression. The TDD is a useful algorithm for detecting disease progression. Citation: Kikuchi T, Mori T, Wada-Isoe K, Umeda-Kameyama Y, Kagimura T, et al. (2018) A Novel Dementia Scale for Alzheimer’s Disease. J Alzheimers Dis Parkinsonism 8: 429. doi: 10.4172/2161-0460.1000429

mathematical rule, domain scores should not be summed if they have different qualities. Therefore, we need a new mathematical algorithm that can generate a new type of total score from otherwise incompatible domain scores.
To address this issue, we hypothesized that by plotting a patient's ADL, BPSD, and cognitive function scores in a three-dimensional coordinate plane (Figure 2), we would be able to calculate the mathematical distance of the scores (ADL score, BPSD score, cognitive score) from the origin (0, 0, 0) and use this distance as a type of total score. This distance, referred to as the three-dimensional distance (TDD) score, is calculated as follows: ADL score BPSD score Cognitive function score TDD = + + where ADL, BPSD, and cognitive function scores increase with symptom improvement and decrease towards 0 with symptom deterioration.
In 2014, we began developing a multi-dimensional scale for AD that initially consisted of 17 items regarding ADL ("A"), BPSD ("B"), and cognitive function ("C"). Since then, we revised the descriptions and deleted items based on a factor analysis and other analyses using Item Response Theory (IRT) [6]. We finally produced a "test version" of the Introduction Robert et al. [1] stated that an ideal scale for assessing Alzheimer's disease (AD) must be quickly administered, validated in the context of AD, include multiple AD characteristics, be applicable to all AD severity stages, monitor disease progression, and be sensitive to therapeutic responses. However, the authors concluded that no currently available scale satisfies all of these criteria.
The Relevant Outcome Scale for Alzheimer's disease (ROSA) was designed to assess cognitive function, activities of daily living (ADL), behavioral and psychological symptoms of dementia (BPSD), communication skills, and quality of life with 16 items and 21 levels [2]. Unfortunately, a critical limitation of the scale is that the total scores must be compared directly within a patient or between patient groups with the same disease severity; that is, if a patient's disease severity changes over time, the data must be excluded from statistical analyses. Although a multicenter clinical study statistically validated the ROSA [3], since then, only one report on this scale has been published [4]. Moreover, the clinical relevance of the ROSA and the degree to which it is used in daily practice remain unclear.
In clinical trials, therapeutic efficacy is often evaluated by comparing the total scores of a multi-domain scale, but total scores may not monitor disease progression or severity with sufficient accuracy. Indeed, various changes in ADL, BPSD, and cognitive function occur across the clinical stages of AD ( Figure 1); specifically, ADL and cognitive function levels at stage S2 (severe stage) are lower than those at stage S1 (mild stage), while the activity of BPSD is the same at S1 and S2. This is caused by a bell-shape activity curve of BPSD, with levels first increasing and then decreasing as AD progresses [5]. This change in BPSD may cause inaccuracies in total score-based assessments. Additionally, as a scale consisting of 13 items. We will report the details of the study in a future publication. Although we obtained statistical profiles of the test version using factor and IRT analyses during the development phase, the scale profiles require further confirmation in a different sample population. Accordingly, this study aimed to evaluate the diagnostic accuracy of the "ABC-DS. " We also discuss the utility of the TDD for longitudinal studies and clinical trials.

Ethical approval
This multicenter observational study was conducted in accordance with the World Medical Association Declaration of Helsinki 1964 and its amendments and subsequent clarifications. The institutional review board approved the study protocol, and all caregivers and participants provided written informed consent. The study was completed as per the ethical guidelines for clinical studies set by the Ministry of Health, Labor, and Welfare of Japan. The trial was registered with the University Hospital Medical Information Network (www.umin.ac.jp/; No.: UMIN000021134).

Inclusion criteria
We enrolled patients who were diagnosed with AD using the criteria outlined in the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision, or with probable AD, as per the National Institute on Aging-Alzheimer's Association workgroup, National Institute of Neurological and Communicative Disorders and Stroke, and/or Alzheimer's Disease and Related Disorders Association guidelines. Patients with dementias other than AD, unstable central nervous system disorders, or psychiatric diseases were not eligible for this study.

Participants
Participants were enrolled from 22 clinics and hospitals in Japan (see Appendix), based on the above inclusion criteria. Of the 327 participants enrolled, 15 were excluded due to protocol violations of inclusion criteria (three) or measurement procedures (12). Thus, 312 participants were eligible for the baseline assessment. The cohort included 63 participants with probable AD, 88 with mild AD, 106 with moderate AD, and 55 with severe AD.

The ABC dementia scale
The ABC-DS consists of 13 test items, each with nine levels of ordered categorical scales. The 13 items evaluate ADL, BPSD, and cognitive function (Table 1). A sample item for evaluating ADL is attached to (Figure 3). For scale administration, evaluators interviewed Activities of daily living (ADL) and cognitive function scores are different at stages S1 (mild) and S2 (severe), but the behavioral and psychological symptoms (BPSD) of dementia exhibit the same level α at both stages. The BPSD increase and then decrease as Alzheimer's disease progresses, which may cause inaccuracies when determining the disease stage using a simple total score. caregivers about their patients' recent episodes using a semi-structured interview [7]. Eligible caregivers were required to spend more than 3 days per week with their patients.
Participants were assessed with the ABC-DS at baseline, week 1, and week 12 by the same evaluators. For statistical analysis, we treated the nine ordered categories of the ABC scale as numeric values [8].
A PDF version of the ABC-DS in English, French, Chinese, and Korean can be downloaded from the homepage of the Mapi Research Trust at: https://eprovide.mapi-trust.org/instruments/abc-dementiascale.

Other measurements
As standard scales, we also administered the Mini-Mental State Examination (MMSE), Disability Assessment for Dementia (DAD), Neuropsychiatric Inventory-Caregiver Distress Scale (NPI-D), and Clinical Dementia Rating (CDR) (sum of boxes (SOB) and global scores) scales at baseline and week 12.

Evaluators
The evaluators who administered the standard scales did not administer the ABC-DS. All evaluators recorded the administration duration. Evaluators for the ABC-DS and standard scales were clinicians (13.5%), certified psychologists (3.5%), nurses (41.7%), and medical clerks (41.3%).

TDD scoring
The TDD for the ABC-DS is the mathematical distance (Euclidean distance) from the origin (0, 0, 0) to the score position of each patient (Domain A score, Domain B score, Domain C score). Domains A, B, and C refer to ADL, BPSD, and cognitive function, respectively. Here, we defined the TDD as follows: Where Domain A score=Q1+Q2+Q3+Q4+Q11+Q12, Domain B score=Q7+Q8+Q9, and Domain C score=Q5+Q6+Q10+Q13. For example, if a patient's score changed from (42,21,28) to (42,19,26), the TDD would change from 54.7 to 52.9. The difference over time was thus -1.8, indicating disease progression. The TDD has been submitted for application for an international patent (PCT/JP2017/035149).

Factor analyses (construct validity)
When we selected 13 items in the development phase, we expected that the items would assess the ADL, BPSD, and cognitive function domains. We performed factor analyses to confirm this assumption statistically and concluded that the results of a factor analysis would be reasonable if the values for the factor loadings were ≥ 0.4 and the cumulative proportion of the contribution was ≥ 0.5 [9]. We performed a factor analysis with a promax rotation using the statistical software R 3.1.0 with packages psych, psy, and polycor [10,11].

IRT analysis
IRT is a statistical approach for developing reliable assessment scales that measure various abilities, traits, or behavioral characteristics [12]. Here, we used this approach to inspect the difficulty parameters (locations) and discriminate parameters (steepness) of the item response category characteristic curves (IRCCCs) of graded response models [6,13,14]. Sample curves and IRCCC interpretations are shown in Supplementary (Figure 4). We accepted the models when the locations and steepness of each curve were<4.0 (absolute value) and ≥ 0.2, respectively [14].
We separately applied a graded response model for each domain, assuming "unidimensionality" within a domain, whereby the items in the domain would measure a single common trait or concept of ADL,   BPSD, or cognitive function. If an item displays excess covariation or dependence on all other items in the domain, the discrimination parameter may be high relative to other items in the domain [15]. We considered that this assumption of "local independence" was not violated if the discrimination parameter of the item was<4.0. The statistical analyses were performed using R 3.1.0 with the ltm and irtoys packages for the graded response model [14]. Since these R packages are unable to treat nine levels of ordered categorical scales, we converted the original nine levels into five levels as follows: Levels 1&2, Levels 3&4, Level 5, Levels 6&7, and Levels 8&9 were converted into Levels 1,2,3,4, and 5, respectively, accepting a loss of information.

Test-retest reliability (intra-rater reliability)
We used the ABC-DS scores at baseline and week 1 to evaluate the intra-rater reliability for each item with weighted kappa coefficients [16]. This test assessed the degree of agreement between the two scores given by the same evaluator for each item. If the 95% confidence interval (CI) band of the coefficients included 0.6, then we accepted the result. We also calculated intra-class correlation coefficients for the TDD to evaluate the consistency between two measurements.

Concurrent validity
Baseline MMSE, NPI-D, DAD, CDR, and ABC-DS scores were used to evaluate concurrent validity. We calculated the correlation coefficients and 95% CIs for Domains A, B, and C corresponding to the DAD, NPI-D, and MMSE, respectively. We also calculated the correlation coefficients of Domains A, B, and C to the global CDR and the CDR-SOB, as well as the correlation coefficients of the TDD to those of other standard scales. We used Spearman coefficients and polyserial coefficients for two continuous variables and continuous and categorical variables, respectively [16].

Receiver operating characteristic (ROC) curve for the TDD to discriminate the global CDR score
An ROC analysis [17] was used to investigate the sensitivity and specificity of the TDD for discriminating the severity of AD defined by the global CDR (categorical score). We analyzed the ROC curves to identify the most suitable TDD thresholds for discriminating AD severity using the R 3.1.0 statistical software with the ROCR package [18].
We calculated the sensitivities and specificities at the thresholds for "CDR 0/0.5 vs. others, " "CDR 0/0.5 & CDR 1 vs. others, " and "CDR 0/0.5, CDR 1 & CDR 2 vs. CDR 3. " If the values of the TDD were above the threshold, we defined the test as positive, indicating a better stage.

Changes in scores over the study period (responsiveness)
We used the ABC-DS scores and standard scale scores measured at baseline and 12 weeks for this analysis. Letting Δ or responsiveness denote the difference in the scores between the two evaluations, we stratified the Δ values by the baseline global CDR and calculated the mean difference, standard deviation, 95% CI, coefficient of variation (CV: standard deviation/mean), and P values testing the null hypothesis H 0 : Δ=0. The accuracy and repeatability of the measurements are high if the absolute value of the CV is small.

Administration duration
The mean administration durations for the ABC-DS and the CDR were 9.96 (4.79) min and 26.4 (9.8) min, respectively.

Factor analysis
In the development phase, we found that the 13 items of the ABC-DS constructed the ADL, BPSD, and cognitive function domains. We confirmed this result in the present study (Supplementary Table 1). Factors 1, 2, and 3 in (Supplementary Table 2) correspond to Domains A, C, and B, respectively. Each bold value in the table indicates the factor that gives the item the largest factor loading in the three factors. The bold factor loadings were>0.4, and the cumulative proportion of the contribution by the three factors was 0.585.
We also tested four-factor and five-factor models. The four-factor model was not reasonable, because the fourth domain did not have the item (component) that had the largest factor loading in four domains. The five-factor model was also not practical, because the fourth and fifth domains contained only one component item each. Accordingly, we concluded that more than three factors were redundant. The present results are consistent with those for the development phase (data not shown).

IRT analysis
The IRCCCs for each domain are shown in Supplementary  (Supplementary Figures 1-4). The difficulty (location) parameters of the IRCCCs were within reasonable ranges (between-4 and 4) for the standardized AD severities. The discrimination (steepness) parameters were>0.2 and<4.0 for all items in each domain, except for Q1 (4.1), which had a parameter that was similar to that observed in the development phase. We confirmed that the parameter values were comparable to those determined in the development phase (data not shown).

Test-retest reliability
For the test-retest reliability evaluation, data were available from 219 of 312 participants; however, one participant was interviewed by different evaluators and therefore excluded from this analysis. We summarized the results in Supplementary ( Table 2). The mean kappa coefficient values for Q8 and Q9 were<0.6, but the 95% CI bands included 0.6. The kappa coefficient values for the other items showed moderate or strong similarity between the two measurements. The TDD intra-class correlation coefficient was 0.964.

Concurrent validity
Correlation coefficients and 95% CIs were calculated to compare Domains A, B, and C, along with the TDD, with the corresponding standard scales ( Table 3). The correlation coefficients for "Domain A vs. DAD, " "Domain B vs. NPI-D, " and "Domain C vs. MMSE" were moderate. The correlations of Domains A and C with the CDR-SOB were strong. The TDD was also strongly correlated with the CDR-SOB and the global CDR.

Score changes over the study period (responsiveness)
For the responsiveness evaluation, data were available from 222 of the 312 participants; however, four participants were excluded from this analysis owing to missing paired values for baseline and week 12. We stratified the Δ values by the baseline CDR (Table 3). Bold values indicate statistically significant differences (P<0.05) between the baseline and week 12 scores. There were no statistically   Table 3). The Δ of the total score of 13 items (arithmetic sum) was significant only at CDR 0/0.5 ( Table 3). The CV values indicated that the measurement variation in TDD was smaller than that found for the arithmetic sum at CDR 1, 2, and 3 ( Table 3).
The Δ in the CDR-SOB was statistically significant at CDR 1 and 2; however, the results were inferior to those for the TDD, which successfully detected the differences at CDR 2, 1, and 0/0.5 (Table 3).

Discussion
The present study confirms the construct validity of all 13 items comprising the three domains, and the concurrent validity of the ABC-DS domain scores with their corresponding standard scale scores as well as with the CDR. The scale's intra-rater reliability, as determined by weighted kappa coefficients, was acceptable, and it detected statistically significant changes in AD severity over the 12-week study period. The correlation coefficients between the TDD and the CDR were strong, and the accurately discriminated AD severity in our ROC analysis.
Collectively, these results indicate that the TDD is similarly informative to the CDR for staging AD and measuring disease progression.
The IRT markedly aided in the ABC-DS development. During the development phase, we repeatedly revised descriptions of the items by examining the parameters of the IRCCCs until the location and steepness of the curves became reasonable. Here, we repeated these IRT analyses and checked that the estimated parameters were similar across the two different sample populations. The results suggested that our item descriptions in the ABC-DS were stable and accurate for assessing AD. It should be noted that the discrimination parameter of Q1 was larger than the conventional threshold of 4.0; hence, this item may have violated local independence. However, this should not seriously affect the scale as a whole because the value was not much larger than the threshold.
We think that the staging and monitoring of AD progression would be most accurate if we concurrently evaluate ADL, BPSD, and cognitive function. To conduct this concurrent evaluation, we introduced the TDD approach. The present study identified three main advantages of the TDD. First, the correlation coefficients between the TDD and the standard scale scores were better than those between the domain scores and the standard scale scores, except for "Domain B vs. NPI-D. " Second, the TDD discriminated the severity of AD, as diagnosed by CDR, with satisfactory sensitivity and specificity. Third, the total score (arithmetic sum) of the 13 items in the ABC-DS did not detect significant changes in disease progression over 12 weeks when baseline severity was CDR 1 or 2, whereas the TDD did detect significant changes in these cases.
Most standard dementia assessment scales only evaluate total scores (i.e., a simple sum or an arithmetic sum). However, the sum of item scores can fail to detect true changes in disease progression, as was observed here. Using the ABC-DS as an example, let us consider a patient whose ADL domain score rises by three points between assessments, while the BPSD domain score falls by three points. If we calculate the simple sum of the 13 items, these changes cancel out, and this approach would thus not detect a change in the patient's progression. Total score-based assessments may, therefore, increase the risk of false-negative results. Our TDD scoring system detects otherwise obscured changes by depicting a change in the score positions for ADL, BPSD, and cognitive function. For example, if the score position moves from (27,22,26) to (30,19,26), which represents the same simple total score, then the TDD changes from 43.46 to 44.01. This rationale, in tandem with our present observations, underscores the utility of the TDD for determining or comparing longitudinal treatment effects in clinical research. This study has some limitations. Several aspects of the ABC-DS require further examination in future research, including scale responsiveness. While we evaluated responsiveness in an observational cohort over a 12-week study period, we did not specify medical treatments and instead observed the natural progression of the participants. Future work should furthermore examine responsiveness in a comparative clinical trial setting, to confirm the utility of the TDD approach and the ABC-DS.
In conclusion, the present study confirms the construct validity, concurrent validity, intra-rater reliability, and responsiveness of the ABC-DS over a 12-week period. Importantly, the TDDs calculated from the ABC-DS represent a useful approach for evaluating patient responses in clinical trials of AD.