Evaluation of a research diagnostic algorithm for DSM-5 neurocognitive disorders in a population-based cohort of older adults

There is little information on the application and impact of revised criteria for diagnosing dementia and mild cognitive impairment (MCI), now termed major and mild neurocognitive disorders (NCDs) in the DSM-5. We evaluate a psychometric algorithm for diagnosing DSM-5 NCDs in a community-dwelling sample, and characterize the neuropsychological and functional profile of expert-diagnosed DSM-5 NCDs relative to DSM-IV dementia and International Working Group criteria for MCI. A population-based sample of 1644 adults aged 72–78 years was assessed. Algorithmic diagnostic criteria used detailed neuropsychological data, medical history, longitudinal cognitive performance, and informant interview. Those meeting all criteria for at least one diagnosis had data reviewed by a neurologist (expert diagnosis) who achieved consensus with a psychiatrist for complex cases. The algorithm accurately classified DSM-5 major NCD (area under the curve (AUC) = 0.95, 95% confidence interval (CI) 0.92–0.97), DSM-IV dementia (AUC = 0.91, 95% CI 0.85–0.97), DSM-5 mild NCD (AUC = 0.75, 95% CI 0.70–0.80), and MCI (AUC = 0.76, 95% CI 0.72–0.81) when compared to expert diagnosis. Expert diagnosis of dementia using DSM-5 criteria overlapped with 90% of DSM-IV dementia cases, but resulted in a 127% increase in diagnosis relative to DSM-IV. Additional cases had less severe memory, language impairment, and instrumental activities of daily living (IADL) impairments compared to cases meeting DSM-IV criteria for dementia. DSM-5 mild NCD overlapped with 83% of MCI cases and resulted in a 19% increase in diagnosis. These additional cases had a subtly different neurocognitive profile to MCI cases, including poorer social cognition. DSM-5 NCD criteria can be operationalized in a psychometric algorithm in a population setting. Expert diagnosis using DSM-5 NCD criteria captured most cases with DSM-IV dementia and MCI in our sample, but included many additional cases suggesting that DSM-5 criteria are broader in their categorization.


Background
Revised criteria for diagnosing dementia and mild cognitive impairment (MCI), now termed major and mild neurocognitive disorders (NCDs), respectively, in the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) [1], has the potential to significantly impact on clinical and research settings. Recent reviews [2,3] note the increased clarity and structure in DSM-5 NCD for assessing cognitive impairment, decline, and functional impact when compared to DSM-IV dementia or International Working Group (IWG) criteria for MCI [4]. The clearer criteria and greater emphasis on objective measures mean that the DSM-5 NCD categories should be easier to operationalize in large-scale studies of ageing using a psychometric algorithm. Algorithmic approaches to diagnosing NCDs are particularly valuable in resource-intensive population studies [5] and in settings where there is limited access to biomarkers and clinical services. Globally, most dementia cases occur in such settings [6]. Algorithmic approaches to DSM-IV and DSM-III-R dementia diagnosis have been previously published with agreement ranging from κ (Cohen's kappa) = 0.63 to 0.84 [5,7,8]. No study has as yet examined the algorithmic diagnosis of DSM-5 NCD. The present study fills this gap.
Given that both major and mild categories of NCD are designed to be age-and etiology-independent syndromes, it is expected that, when applied to older adults, the prevalence estimates would be higher than for the more ' Alzheimer's-centric' DSM-IV dementia category [2,9], whereas MCI criteria [4,10] are much broader and are not age-or Alzheimer's disease (AD)-specific. Field trials of DSM-5 suggested a similar prevalence of DSM-IV dementia and DSM-5 major NCD [11]. However, a number of recent studies [12][13][14] report differences between the DSM-5 and existing diagnostic systems, with one reporting increased prevalence of diagnosis with DSM-5 criteria relative to DSM-IV and MCI [14], and others reporting decreased diagnosis relative to systems such as 10/66 criteria [12], Petersen MCI criteria [13], and IWG-MCI criteria [14,15]. The variance in findings may reflect differences in the diagnostic systems used for comparison, sensitivity of different cognitive batteries, as well as the samples studied (e.g., memory clinic [14], populationbased cohort [12,13,15], middle-income nations [12,14]). In the context of these mixed findings, it is important to better understand the implications of applying DSM-5 NCD criteria to existing epidemiological studies with well characterized samples that have been followed longitudinally with neurocognitive diagnoses.
The aims of the present study were twofold. The first aim was methodological and sought to develop and evaluate a psychometric algorithm to assess participant data against criteria for the following diagnoses: DSM-5 major NCD, DSM-5 mild NCD, DSM-IV dementia, and IWG MCI. Algorithmic classification was compared to diagnosis of the same categories by experienced clinicians (expert diagnosis). The second aim was to examine the overlap between expertly diagnosed DSM-5 NCDs, DSM-IV dementia, and MCI, and characterize the groups in terms of their neuropsychological and functional profiles.

Participants
The participants were from the Personality and Total Health Through Life Project (PATH) which has been previously described [16]. Briefly, we recruited participants who were residents of the city of Canberra and adjacent town of Queanbeyan, Australia. Participants aged within three narrow cohorts (20)(21)(22)(23)(24)(40)(41)(42)(43)(44), and 60-64 years) were sampled randomly from the electoral roll and invited to participate in a study on the risk and protective factors for common mental disorders. Enrolment to vote is compulsory for all Australian citizens. The study protocol was approved by the Australian National University's Human Research Ethics Committee (Protocols: 2009/039; 2009/ 308; 2012/074; 2006/0314; 2002/0189) and participants provided written informed consent after receiving a complete description of the study. A total of 7485 consented to participate. The present study focuses on the older age cohort whose sample size at wave 1 (data collection 2001-2002) was 2551 (58.3% of the cohort's random sample). Participants were re-assessed every 4 years on a broad range of sociodemographic, health, lifestyle, and neuropsychological measures. Sample retention has been high at each wave (between 85.4% and 88.8%). This study reports data from the 12-year follow-up of the older cohort who were aged 72-78 at wave 4 (data collection 2014-2015).

Interview and assessment
Of the 2048 participants contacted for follow-up at wave 4, 116 were deceased, 259 refused, and 14 were not found ( Fig. 1). Data were obtained from individual face-to-face or telephone interviews conducted with 1644 participants by trained research personnel, including demographic, general health, anthropometric, physiological, and neurocognitive measures.

Demographics, depression and general health survey
An interviewer-administered survey collected data on the level of education, psychological measures, substance and medication use, psychiatric and medical history, including recent major surgery, activities of daily living, housing, home or personal care, and non-English speaking background. Depressive symptoms were screened using the self-report screen for DSM-IV criteria for depression, the Patient Health Questionnaire (PHQ-9) [17].

Cognitive assessment
A battery of neurocognitive measures was developed to address each of the domains described in the DSM-5 [1] (see Additional file 1: Table S1), and administered by trained research interviewers. Measures were selected on the basis of sensitivity to dementia and age-related cognitive impairment as well as efficiency of administration and scoring. Data on behavioral changes were obtained through the informant interview (see later). Briefly, the following measures were used to assess each of the domains: complex attention (Symbol Digits Modalities Test [18], Trail Making Test A [19], Reaction Time Test [20]); executive function (Digit Span Backwards [21], Trail Making Test B (19), Stroop Color Word Test [22], Zoo Map Test [23], Game of Dice Test [24]); learning and memory (California Verbal Learning Test [25], Benton Visual Retention Test (Administration B) [26]); language (Letter Fluency [19], Boston Naming Test-15 item [27], Spot The Word Test [28]); perceptual motor (Purdue Pegboard [29], Ideomotor Apraxia Test (IAT) [30], Benton Visual Retention Test (Administration C) [26]); social cognition (Reading the Mind in the Eyes [31]). Details on test measures are provided in a supplementary methods section (see Additional file 1). Scores were converted to z Fig. 1 Flow of participants through the PATH study and through wave 4. Diagnosis refers to DSM-5 neurocognitive disorders, IWG MCI, and DSM-IV dementia scores by normalizing relative to the whole wave 4 PATH sample data stratified by gender and education (low: 5-10 years, medium: 10-15 years; high: 15+ years).

Screen 1
The data for the 1644 participants assessed at wave 4 were screened for signs of decline based on the criteria detailed in Additional file 1. Briefly, this included either a previous PATH diagnosis of dementia or a mild cognitive disorder, or evidence of current objective cognitive impairment (based on performance ≤6.7 th percentile on at least one cognitive measure, or Mini-Mental Status Examination (MMSE) ≤24), and evidence of subjective decline on the Memory and Cognition Questionnaire (MAC-Q) [32] or decline on the MMSE of >3 points since wave 3, or consistent MMSE ≤24 at waves 3 and 4. Of the participants meeting criteria for any of the above (n = 623), the majority (n = 426) had a detailed informant interview. Of the remaining 1021 participants not meeting the criteria, most (n = 746) received a basic informant interview (Fig. 1).

Informant interview
Participants (n = 1438) consented to have an informant (spouse, friend, neighbor or relative) interviewed by telephone regarding the participant's changes in cognition and activities of daily life. The basic informant interview comprised the Bayer instrumental activities of daily living (IADL) questionnaire [33] and the Informant Questionnaire of Cognitive Decline in the Elderly 16-item Short Version (IQCODE) [34]. The detailed informant interview comprised the Bayer IADL, IQCODE, Dysexecutive Questionnaire (DEX-Q) [23], and Neuropsychiatric Inventory (NPI) [35], as well as questions on medical history (Parkinson's disease, Alzheimer's disease, other dementia, stroke, psychiatric diagnoses, memory complaints), recent behavior including symptoms of delirium, psychosis, hallucinations, alertness and physical function, sensory or motor loss, and onset and progression of cognitive difficulties. The DEX-Q [23] collected data on executive difficulties affecting social and daily activity. The NPI [35] collected data on non-cognitive symptoms of MCI and dementia.

Psychometric algorithm
Those identified by screen 1 (n = 623) had all interview and informant data entered into a case file spreadsheet. To minimize effects of non-response bias, case files with missing informant data (n = 59) were also screened by the algorithm. The algorithm combined the neurocognitive assessment data with the informant and survey data on medical history to operationalize criteria (criterion met/not met) for each diagnostic category: DSM-5 major NCD, mild NCD, DSM-IV dementia, and MCI (see Tables 1 and 2). Details of the neuropsychological battery are provided in Additional file 1. Cognitive scores were standardized relative to the gender-and educationstratified norms (from the whole PATH 60s sample at wave 4) and converted to z scores. Severe cognitive impairment was defined as a z score < -2.0. Given a lack of consensus in the literature regarding appropriate cutoffs for defining mild cognitive impairment, separate algorithmic categories were created using z score > -2.0 and ≤ -1.0, and > -2.0 and ≤ -1.5. In addition to the diagnostic categories of interest to the current study, the algorithm also classified participants according to other categories (e.g., age-associated memory impairment [36], age-associated cognitive decline [37], DSM-IV mild NCD, etc.). Participants not meeting criteria for any diagnostic category were classified as "normal". Those meeting criteria for at least one diagnosis (n = 368) had their data reviewed by the research neurologist ( Fig. 1).

Expert diagnosis and consensus
Case files (n = 368) were reviewed by an experienced research neurologist (CM); these included neuropsychological test data, informant data, structural brain magnetic resonance imaging (MRI) scans to aid differential diagnosis of dementia subtypes (n = 54), a self-reported medication list, and contact details of the participant for further clarification of details relevant to diagnosis (n = 21). The neurologist based her decisions on all available data, guided by the DSM-5 NCD, DSM-IV, and MCI diagnostic criteria, and used clinical judgement to determine whether each criterion was supported by the data. Inter-rater reliability with an experienced psychiatrist (RK) independently reviewing a subsample of 29 cases indicated high agreement for dementia (DSM-IV and DSM5 major NCD: κ = 0.79, 95% confidence interval (CI) 0.54-1.0, p < 0.01), and moderate agreement for mild cognitive disorders (MCI and DSM5 mild NCD: κ = 0.47, 95% CI 0.13-0.73, p < 0.01) which are within the ranges reported in field trials [7,11,38].
Further to estimating inter-rater reliability, consensus diagnosis was conducted by the two physicians and a neuropsychologist (RE) on complex cases identified as meeting at least one of the following criteria: (1) comorbid depression (moderate to severe on PHQ-9); (2) other comorbid psychiatric conditions; (3) stroke; (4) dementia or DSM-5 major NCD without memory impairment. A total of n = 60 met the above criteria and diagnoses were reviewed for consensus.

Statistical analysis
To evaluate the accuracy of algorithmic classification relative to the expert diagnoses, we used the binary algorithmic criteria (equally weighted) as predictors of expert diagnosis in logistic regression models, saving the model    predicted probabilities. We then conducted receiver operating characteristic (ROC) analyses of each probability variable against the corresponding binary diagnosis variable. Cross-tabulation and kappa (κ) statistics were used to evaluate agreement between algorithmic and expert diagnosis, with bootstrapping of 1000 samples to estimate 95% CIs on the kappa. Overlap between the different diagnostic criteria when used by clinicians was examined using crosstabs. Generalized linear models (GLM) were used to examine mean differences in each cognitive domain between diagnostic groups identified by the clinicians.

Predictive value of individual algorithmic criteria for identifying algorithm and expert diagnosis
Positive (PPV) and negative predictive values (NPV) of individual criteria (see Additional file 1: Table S2) are presented as functions of source of diagnosis (i.e., algorithm or expert). Predictive values were obtained using crosstabs of observed frequencies of those meeting each criterion against those achieving diagnosis. In general, the pattern of PPV for individual criteria was similar for algorithmic and expert diagnosis.

Overlap between expert diagnosed DSM-5 NCDs and DSM-IV dementia and MCI
Cross-tabulation of expert-diagnosed DSM-5 major NCD against DSM-IV dementia showed a moderate level of overlap (κ =0.49, standard error (SE) = 0.06, p < 0.001) ( Table 3). Of the 30 cases meeting criteria for DSM-IV dementia, 27 (90%) also met criteria for DSM-5 major NCD. The three cases meeting DSM-IV dementia but not DSM-5 major NCD both received AD etiological specifiers and met criteria for DSM-5 mild NCD. The DSM-5 identified 41 additional cases as dementia, representing a 127% increase in dementia diagnosis in the sample relative to DSM-IV, and a high positive predictive value (PPV = 0.88; NPV = 0.90). These additional cases included a few with vascular, fronto-temporal, and Parkinson's specifiers. They also had a higher rate of previous diagnoses (36.6%) relative to cases without any expert-diagnosed dementia (3.4%) (p < 0.001), and a similar rate to those meeting criteria for both DSM-5 and DSM-IV dementia diagnoses (40%) (p > 0.05). Cases qualifying for both DSM-5 major NCD and DSM-IV dementia were also more likely to carry at least one APOE e4 allele (55.2%) compared to those meeting only the DSM-5 major NCD diagnosis (14.6%) (p < 0.001), with the latter being statistically not different from the APOE e4 allele frequency in cognitively normal participants (25.8%) (p > 0.05).
There was a moderate level of overlap (κ = 0.58, SE = 0.04) between DSM-5 mild NCD and MCI diagnosis. Of the 144 cases qualifying for MCI, 119 (82.6%) were also given DSM-5 mild NCD diagnosis. The 25 MCI cases missed by DSM-5 mild NCD did not qualify for a diagnosis of DSM-5 major NCD or any other diagnostic category. They were mostly of the amnestic multi-domain (n = 9) and non-amnestic single domain (n = 9) subtypes. An additional 52 cases also received mild NCD diagnosis, representing an overall 19% increase in mild cognitive disorder diagnoses in our sample (PPV = 0.78; NPV = 0.82).

Characterization of neuropsychological profiles as a function of expert diagnosis overlap
A series of GLMs compared neurocognitive profile as a function of diagnosis. GLM analysis revealed that cases diagnosed with only DSM-5 major NCD had significantly better language (p < 0.01), memory encoding (p < 0.001), and IADL function (p < 0.05) compared to cases that also met DSM-IV dementia criteria (Fig. 3a). Figure 3b presents neuropsychological profiles as a function of DSM-5 mild NCD and MCI. Relative to Table 3 Overlap between expert diagnoses using DSM-5 criteria and DSM-IV for dementia and MCI

Algorithm accuracy
We report the first algorithmic approach to classifying DSM-5 NCDs. The algorithm used had good accuracy when classifying major NCD (κ = 0.72, AUC = 0.95) and DSM-IV dementia (κ = 0.64, AUC = 0.91) and was reasonably accurate when classifying MCI (κ = 0.42, AUC = 0.75) and mild NCD (κ = 0.43, AUC = 0.76). The findings indicate that a psychometric algorithm is capable of predicting clinical diagnosis in a population-based sample of older adults, and is consistent with previous work suggesting better algorithmic prediction of more severe diagnoses compared to milder diagnoses [5,7]. Our findings also support field trials of the DSM-5 NCD [11] which found that the reliability of mild NCD was generally lower and less consistent than that of major NCD, which was very good. The algorithm for DSM-5 criteria produced slightly more accurate prediction of expert diagnosis compared to DSM-IV dementia criteria or IWG MCI criteria, supporting our hypothesis that the clearer, more structured DSM-5 criteria may be easier to operationalize. Agreement between algorithmic and expert diagnosis ranged between κ = 0.42 and κ = 0.72, consistent with previously published algorithms [5,7,8].
We also found that the cognitive cut-off used to define mild impairment (either 1.0 or 1.5 SD) had minimal impact on the rate of diagnosis of either DSM-5 mild NCD or IWG MCI diagnosis.
The individual diagnostic criteria that were predictive of expert-diagnosed major NCD and DSM-IV dementia were similarly predictive of algorithm-defined major NCD and dementia, with cognitive impairment and IADL impact having the highest PPV. Individual criteria were less predictive for the mild diagnoses, but those with highest PPVs included cognitive impairment, subjective concern, and exclusion of dementia (in the case of MCI). The lower predictive value of algorithmic criteria for delirium and other disorders for expert diagnoses suggest greater reliance on clinical judgement when determining their likely impact.

DSM-5 overlap with DSM-IV and MCI, and comparison of neurocognitive profiles
We also found that expert diagnosis of dementia according to DSM-5 had excellent overlap with DSM-IV (90%); however, a large number of additional cases were identified by DSM-5 resulting in a 127% increase in diagnosis. This confirms the findings of Tay et al. [14] in a memory clinic sample (n = 234) where they found that DSM-5 major NCD criteria captured all cases of DSM-IV dementia, but with an additional 39.7% cases. These additional cases, however, had a similar rate of previous diagnoses (either MCI or dementia) to cases meeting only DSM-IV dementia, and a significantly higher rate than those without dementia, suggesting the more inclusive criteria captured additional cases with similarly chronic deficits.
Aside from the different populations, our higher rate of additional diagnosis may reflect our use of more detailed neurocognitive measurement, detailed informant report, and inclusion of etiological specifiers and structural MRI evidence. In the absence of sufficient data on the degree of impairment or biological evidence of change, cases not meeting DSM-IV dementia are more likely to be labeled as mild. While Tay et al. [14] labeled as MCI most of those who were DSM-5 major NCD but not DSM-IV dementia, none of our additional DSM-5 major NCD cases met criteria for MCI. Instead, they were more likely to receive a vascular specifier, frontotemporal or Parkinson's dementia. Although memory impairment was less severe for the group with only DSM-5 major NCD, the relative severity of impairment in other cognitive domains, as well as reported impact on IADLs, show that this group should be considered as dementia. Thus, our findings suggest that additional dementia cases identified by DSM-5 are not necessarily at a milder stage but present with a different neuropsychological profile, and possibly different etiologies, compared to cases meeting dementia criteria for both DSM-5 and DSM-IV where the pattern of impairment and APOE e4 allele distribution is more supportive of AD. Future research including additional biomarkers will enable evaluation of this finding.
Although the mild NCD criteria were not developed as an explicit replacement for IWG MCI, in the context of ageing-associated progressive NCDs, clinicians may consider them as an alternative. Accordingly, diagnosis of DSM-5 mild NCD was highly sensitive to MCI (83%) and showed a moderate agreement with MCI diagnosis (κ = 0.58), albeit with an overall 19% increase in the rate of diagnosis. This contrasts with Tay et al. [14] who reported a decrease of 54% using DSM-5 mild NCD criteria, and attributed this to difficulties defining the level of IADL impairment appropriate for mild NCD. Populationbased samples are more likely to contain individuals with very little functional impairment but sufficient cognitive deficits and decline to warrant a mild NCD diagnosis.
Luck et al. [15] reported a much higher agreement between MCI and DSM-5 mild NCD, but assessed each neurocognitive domain with a single test. Our use of a range of tests and obtaining average performance across the domain is likely more sensitive to true impairment but more variable. In fact, in our sample, 17.4% of MCI cases failed to be captured by DSM-5, and there were differences in neuropsychological profile, such that cases meeting only DSM-5 mild criteria had poorer social cognition and memory, supporting previous findings [15], but better performance on planning and decisionmaking. This suggests the inclusion of a greater range of neurocognitive domains in DSM-5, and particularly the inclusion of social cognition as a criterion, may help capture impaired individuals not detected by MCI criteria. Follow-up studies are required to examine the progression and predictive value of these cases.
Our study is limited by expert diagnosis based on case file review rather than clinical interview; however, this meant that our clinical diagnoses were based on the same data as those operationalized in the algorithm. Nevertheless, further work is required to validate these findings in independent data sets. Strengths include the large, population-based sample, detailed neurocognitive assessment, comparison of different cognitive cut-offs, and a systematic approach to collecting and analyzing evidence for impairment. The findings suggest that clinicians, trialists, and epidemiologists using the DSM-5 criteria should expect higher estimates of disease prevalence and incidence, and the ability to capture a broader range of etiologies and severities compared to DSM-IV and MCI. The findings also suggest that while MCI and mild NCD do overlap, MCI is not fully captured within the mild NCD construct. A similar pattern may be apparent for the forthcoming ICD-11 criteria if it adopts an approach analogous to DSM-5 [39].

Conclusions
In summary, an algorithm-based approach to DSM-5 diagnosis of NCD is feasible in cohort studies. This approach is more accurate at identifying major NCD than mild NCD. DSM-5 is more inclusive of the variety of clinical profiles of major NCD, resulting in higher rates of diagnosis but with good negative predictive power. The findings have implications for understanding the impact on rates of diagnosis when using the revised diagnoses.