Identifying dementia cases with routinely collected health data: A systematic review

Introduction Prospective, population-based studies can be rich resources for dementia research. Follow-up in many such studies is through linkage to routinely collected, coded health-care data sets. We evaluated the accuracy of these data sets for dementia case identification. Methods We systematically reviewed the literature for studies comparing dementia coding in routinely collected data sets to any expert-led reference standard. We recorded study characteristics and two accuracy measures—positive predictive value (PPV) and sensitivity. Results We identified 27 eligible studies with 25 estimating PPV and eight estimating sensitivity. Study settings and methods varied widely. For all-cause dementia, PPVs ranged from 33%–100%, but 16/27 were >75%. Sensitivities ranged from 21% to 86%. PPVs for Alzheimer's disease (range 57%–100%) were generally higher than those for vascular dementia (range 19%–91%). Discussion Linkage to routine health-care data can achieve a high PPV and reasonable sensitivity in certain settings. Given the heterogeneity in accuracy estimates, cohorts should ideally conduct their own setting-specific validation.


Introduction
The increasing burden of dementia is a cause for major public health concern worldwide [1]. Dementias develop as the result of a complex interplay between genetics, lifestyle, and environmental factors. The effect of any single risk factor is therefore likely to be modest, meaning that very large study populations are required to generate sufficient cases to study associations of exposures with incident dementia. Furthermore, because the pathological processes underlying dementia begin many years before the symptom onset [2], prospective, population-based studies that recruit participants in midlife or earlier will be crucial in understanding natural history and in identifying risk factors and causal exposures.
For prospective, population-based studies to be used for research into the determinants of dementia, participants developing dementia (the "cases" in nested case-control or case-cohort studies) must be identified. One method of doing so is through linkages to routinely collected, coded healthcare data sets, which are administrative data sets collected primarily for healthcare purposes, rather than to address specific research questions (e.g., hospital admissions or national mortality data) [3]. Such data sets potentially provide a costeffective means of identifying disease cases in prospective studies while minimizing loss to follow-up [4].
Participants who develop dementia during follow-up must be identified with a high positive predictive value (PPV); that is, a high proportion of those identified as having dementia in routinely collected data sets should be true dementia cases. Ideally, to maximize statistical power and minimize selection bias in the ascertainment of cases, these sources would also have a high sensitivity, so that a high proportion of all true cases are identified. Specificity and negative predictive values are less relevant metrics, as specificity will be high when precise diagnostic codes are used and negative predictive value, which is related to disease prevalence, will be high in population-based studies where most individuals do not develop the disease of interest.
Hence, a key focus for population-based prospective studies worldwide is to understand the accuracy of dementia codes in routinely collected health-care data sets for identifying dementia cases during follow-up. We therefore sought to systematically identify, evaluate, and summarize all relevant studies of the accuracy of dementia coding within these data sources.

Study protocol
We prospectively published the protocol for this review on PROSPERO (www.crd.york.ac.uk/PROSPERO/display_ record.asp?ID5CRD42015027232).

Search strategy
We searched the databases MEDLINE (Ovid), EMBASE (Ovid), Web of Science (Thomson Reuters), CENTRAL (Cochrane Library), and PsycINFO (Ovid) for potentially relevant studies published between 1/1/1990 and 14/09/ 2017. We developed the search strategies with assistance from an information specialist (Supplementary Appendix A). We also identified relevant studies through personal communication and reference list searching.

Study selection
We included studies that compared the presence of codes for dementia and/or its subtypes in any routinely collected health-care data set to any expert-derived reference standard for dementia. We excluded studies that only validated one routinely collected data set against another. Studies had to report either PPV and/or sensitivity or provide data from which either could be calculated. We included relevant studies published in full and as abstracts. We excluded studies that only assessed Creutzfeldt-Jakob disease because it is a notifiable disease in many countries. Where two studies appeared to have overlapping patient populations, we included the study with the largest sample size, and where two different coding systems were investigated separately, we selected the most recent version. We did not impose language restrictions on the search, and translated articles when necessary. We excluded studies with ,10 coded events, as we considered these to have limited precision. Studies assessing sensitivity had to be population based (as opposed to hospital or clinic based) and to have made comprehensive attempts to ascertain all dementia cases within that population. We did not impose this restriction on studies reporting PPV because to investigate PPV, the cases are obtained from a routinely collected data set, and the population depends on the data source (for example, for hospital admissions data, all cases will have been admitted to hospital). Two authors (T.W. and A.L. or K.B.) independently screened all abstracts and full-text articles, resolving any discrepancies through discussion and the assistance of a third, senior author (C.L.M.S.).

Data extraction
Two authors (A.L. and T.W.) independently extracted data from the full-text articles of included studies using a pretested standardized template. We extracted information on the following: year of publication; year(s) from which coded data were obtained; country; study population; mean or median age of dementia cases or, if neither was available, the age range of participants at recruitment; study size; the health-care data sets investigated; coding system; coding position; the reference standard to which the routinely collected data sets were compared; and the dementia subtypes (such as Alzheimer's disease [AD] or vascular dementia) investigated. We defined the study size for studies investigating PPV as the total number of participants with a dementia code (i.e., true positives and false positives) and for studies investigating sensitivity as the total number of dementia cases in the population according to the reference standard (i.e., true positives and false negatives). We contacted study authors to obtain key data items that were not reported in publications (e.g., sample size or coding system).

Risk of bias and applicability assessment
We assessed the risk of bias and applicability for included studies using an adapted Quality Assessment of Diagnostic Accuracy Studies 2 form [5]. The Quality Assessment of Diagnostic Accuracy Studies 2 form requires the risk of bias and applicability to be graded (low, unclear, and high) across four categories: patient selection, routine data set used ("index test"), reference standard, and study participant flow (Supplementary Appendix B). Two authors (A.L. and T.W.) independently performed the assessments and resolved discrepancies through consensus. To minimize the risk of study selection bias, we decided to not exclude studies based on the quality ratings (which are inherently subjective), but instead to aid interpretation of results by highlighting those studies, we considered to be at high risk of bias or of applicability concerns.

Data synthesis
We did not perform a meta-analysis given the heterogeneity between study settings and methodologies. Instead, we performed a descriptive analysis of the study results, displaying the range of values in forest plots for visual interpretation. We calculated 95% confidence intervals by the Clopper-Pearson (exact) method. We also reported any relevant within-study analyses that evaluated the effects on PPV or sensitivity of changing a single variable (e.g., selecting people with a dementia code in the primary position compared with those with a code in any position). We performed analyses in R (www.r-project.org).

Study characteristics
We included 27 studies , of which 26 had full publications [6][7][8] and one a published conference abstract [9]. We obtained further details required for analysis from the lead author of the abstract. Fig. 1 outlines the selection process and reasons for study exclusion. Of the 27 included studies, 25 reported PPV , and eight reported sensitivity [6,8,9,18,25,26,31,32] (five reported both). Characteristics of studies reporting PPV and sensitivity estimates are displayed in Tables 1 and 2, respectively.
NOTE. Ampersand (&) between data sets indicates .1 data sets were combined for the analysis, and commas (,) indicate data sets were analyzed separately, producing separate PPV figures. Drug codes were not provided in either study that assessed medication data sets. NOTE. Studies ordered by routine data set type. *Any information given regarding the ages of dementia cases or age at recruitment. Study period: years from which coded data were obtained. Study size corresponds to the number of coded dementia cases (true positives and false positives).   NOTE. Drug codes not available for study that assessed medications data set. NOTE. Some studies used clinically modified versions of ICD coding system which extends code length to provide extra detail (i.e., ICD-9-CM); however, for the purposes of dementia coding up to four digits, these are identical to the original versions.

Quality assessment
Only five studies [15,17,18,23,24] were judged as having a low risk of bias and applicability concerns across all categories (Supplementary Appendix D). Most studies had one or more "unclear" ratings across categories, either because information was not provided or was unclear in the publication. Eight studies that assessed PPV had a high risk of bias or applicability concerns in one or more areas [7,14,16,19,20,22,29,30], but no studies of sensitivity were so affected.

Sensitivity-all-cause dementia
The 12 estimates of sensitivity for all-cause dementia ranged from 21%-86%, with only three studies reporting estimates .60% (Fig. 4). The only study investigating insurance data reported the highest sensitivity (86%), likely reflecting the comprehensive coverage of this data source [26]. The lowest sensitivity (21%) came from a study which only selected codes in the primary position on the death certificate [8]. There were no clear overall differences in sensitivity of hospital and mortality data, but two studies demonstrated higher sensitivities from combining hospital admissions and mortality data compared with either source alone, increasing from 48% and 28% in mortality data and from 40% and 43% in hospital admissions data to 62% and 52% in both sources combined [9,31].

Within-study analyses
Supplementary Appendix F shows results of 10 withinstudy analyses from seven studies [6,8,15,17,19,21,32]. In general, sample sizes were small, resulting in broad confidence intervals. Selecting codes only in the primary versus any other position gave a higher PPV but, Fig. 3. PPV estimates for routinely collected coded health data to identify dementia subtype cases, stratified by type of routine data set. Study size: number of cases with 1 dementia codes in data set. *High risk of bias or applicability concerns in one or more areas. Abbreviations: AD, Alzheimer's disease; VaD, vascular dementia; PPV, positive predictive value; CI, confidence interval. unsurprisingly, at a cost to sensitivity, with fewer cases identified [6,21,32]. The results of two studies suggested that relying on codes that refer to dementia subtypes (such as AD and/or vascular dementia) to identify any dementia case (not necessarily that subtype) produced a higher PPV than using general dementia codes but with fewer cases identified [6,15]. In keeping with the positive association between PPV and disease prevalence, one study demonstrated a lower PPV for patients ,65 versus 65 years of age (PPV 68% and 96%, respectively) [17]. One study reported that death certificates identified moderate or severe dementia with a higher sensitivity than mild dementia [8]. Finally, one study found that patients with 2 dementia codes in hospital admissions data were more likely to have dementia than those with only one code (PPV 94% vs. 68%, respectively) [19].

Summary of findings
In this systematic review, we found wide variation in the results of validation studies of dementia coding in routinely collected health-care data sets, at least partly reflecting the heterogeneity in study methodologies, settings, and the data sets they assessed. Importantly, however, we found that in some settings, these data sets can achieve high PPVs of .80%-90%. By contrast, the sensitivity of the data sets investigated to date is lower, with many data sources identifying ,50% of all dementia cases.
For all-cause dementia, primary care data appears to identify cases with a high PPV [28][29][30]. Combining hospital and death data produces a reasonable sensitivity for all-cause dementia [9,31], and, of the data sources assessed, the US insurance data produces the highest sensitivity [26]. For identifying AD cases, PPV is reassuringly high across most studies and appears to be particularly high in medications data [25] and combined US hospital and mortality data [23].
There is no widely accepted minimum level of accuracy for disease case ascertainment in prospective studies. The level of accuracy that is considered acceptable is likely to differ according to the study setting, and there will inevitably be a trade-off between PPV and sensitivity. For example, large prospective studies are likely to be best served by data sources which achieve a high PPV even if these data sets have a lower sensitivity, as the number of false positives (controls misidentified as dementia cases) must be minimized to reduce bias and distortion of risk estimates [33]. A high sensitivity is less crucial because the effects of false negatives (cases misidentified as controls) would be diluted among the large control population. A reasonable sensitivity is still required, however, to ensure that the cases ascertained are representative and to maximize statistical power. Variation in study methodologies may explain some of the wide variation in PPV estimates. For example, the two studies with the lowest PPV estimates investigated only a single code (F03-unspecified dementia) and an ethnic minority population, respectively [12,14]. In one of these, the PPV was likely to have been further lowered by high rates of "indeterminate" cases due to the particularly strict reference standard requirements [14]. By contrast, one study with the joint-highest PPV involved nursing home residents only, a population with a high prevalence of dementia, which will increase PPVs because of the positive association between PPV and disease prevalence [7]. Furthermore, the reference standards used varied widely with respect to which, if any, diagnostic criteria were employed and whether the diagnosis was made by screening followed by in-person evaluation, medical record review, General Practitioner questionnaire, or another method.
The sensitivity of routine data sets for identifying dementia cases appeared lower than that of some other neurodegenerative diseases, such as motor neurone disease [34]. Key differences between these conditions may explain the lower sensitivity of dementia coding. First, it is recognized that a significant proportion of dementia cases are undiagnosed and so missing from routine data sets [35,36]. This is less of an issue for conditions such as motor neurone disease that result in rapidly progressive physical symptoms. Second, for patients with a diagnosis of dementia, their dementia may not be the primary reason for admission to hospital, meaning it may not be mentioned in hospital discharge summaries and so omitted from hospital admissions data [37]. However, the sensitivity of routinely collected health-care data is changing over time. For example, a UK clinic-based study reported an improvement in the sensitivity of mortality data for dementia from 40% to 63% between 2006 and 2013, probably reflecting a changing awareness and desire to diagnose dementia in health professionals, patients, and caregivers over time [38].

Future directions-improving accuracy of dementia identification
Given that management of dementia is predominantly community based, primary care data sets may provide an opportunity to identify cases that do not appear in hospital admissions or mortality data. Three small studies reported on the PPV of primary care data sets, and these suggested that primary care data may identify dementia cases with a high PPV, in keeping with our previous findings that primary care can be an accurate data source for other neurodegenerative diseases [34]. This warrants further investigation. Our review also identified a need for studies of the accuracy of routinely collected health-care data to identify dementia subtypes other than AD or vascular dementia (e.g., frontotemporal dementia or dementia with Lewy bodies).
The use of medication prescription data to identify AD cases is an under-investigated area, but one small study re-ported a promising PPV of 97% [25]. Dementia drugs such as cholinesterase inhibitors are now commonly prescribed for patients with dementia with Lewy bodies and for AD and therefore medication data alone may not be sufficiently accurate to identify dementia subtypes. Although the indications for these medications are relatively specific to dementia, they may be used in other conditions, such as memantine for migraine [39]. Future studies with larger sample sizes would allow further evaluation of medication data to identify AD and all-cause dementia.
Cohorts may wish to link to several different data sets to increase sensitivity. To date, only hospital admissions and death registrations have been evaluated in combination. Studies investigating the accuracy of using combinations of data sets (e.g., primary care, hospital admissions, and death data together) are required to pursue this further.
Case detection algorithms need to achieve an appropriate balance between the proportion of cases that are true positives (high PPV) and comprehensive case ascertainment (high sensitivity). Results from the within-study analyses reported here provide some possible mechanisms through which cases can be identified with a high PPV. For example, we found higher PPVs when all-cause dementia cases were identified using codes for dementia subtypes compared with general dementia codes [6,15,19], by selecting dementia codes in the primary position rather than other positions [6,21], or by requiring a dementia diagnosis code to occur in 2 rather than only one hospital admission [19]. However, in each of these studies, the use of these techniques to increase PPV reduced the number of cases identified.
One method of maximizing both PPV and sensitivity may be to use a broad code list to identify cases from routinely collected data, followed by an examination of the full-text medical records to select participants who truly have dementia. Whereas, this would be time consuming to do manually in a large study, the use of natural language processing to confirm diagnoses of dementia from free-text records holds promise [40]. One study found that combining natural language processing with coded data produced a PPV of 92% [41].

Strengths and limitations
We used rigorous systematic review methodology to maximize the validity of our results. This included prospective protocol publication; detailed search criteria; and duplication of study screening, quality assessments, and data extraction by two authors.
There were some limitations however. First, Quality Assessment of Diagnostic Accuracy Studies 2 assessment showed that studies were of variable quality with some risk of bias. Second, publication bias (with a possible tendency to publish results demonstrating good accuracy) may also have influenced our results. We did not attempt to quantify this due to the absence of a robust technique for doing so in test accuracy reviews [42]. Third, PPV increases with disease prevalence and so studies in settings with a higher prevalence of dementia (older populations and care home residents) will inevitably result in higher PPVs. We could not formally adjust for the underlying prevalence of dementia in the study populations, but rather attempted to take this into account in interpreting the results. Fourth, we included all relevant studies published since 1990, but results from the older studies among these may be of less contemporary relevance because perceptions and diagnostic boundaries of dementia have changed over time. Fifth, many studies reported a relatively young average age of dementia cases (e.g., ,80 years), limiting the generalizability of the findings to studies ascertaining dementia in the oldest old.
A major source of heterogeneity in validation studies, and therefore a limitation of our systematic review, is the variation in the reference standards to which the coded data were compared. This reflects the complexities of dementia diagnosis and the lack of a robust "gold standard" for confirmation of cases in dementia research [43]. Although we did not see a pattern in reported PPVs when stratifying by reference standard, it is highly likely that the method of case confirmation will affect study estimates. Similarly, studies differed on whether diagnostic criteria were applied during validation, and the use of strict diagnostic criteria is likely to affect the study estimates. Future studies will need to carefully consider the reference standard used and could consider reporting a "best case" and "worst case" PPV, based on how strictly diagnostic criteria are applied.

Conclusion
Although no replacement for in-person, comprehensive clinical assessment, routinely collected health-care data sets have the potential to be a cost-effective and comprehensive method of identifying dementia cases in prospective studies. Given the marked heterogeneity between existing validation studies, cohorts should ideally validate these data sets using their own data so that the accuracy is known for their specific study population and setting. Dementia subtypes, primary care, prescribing data, and the development of algorithms to maximize accuracy are potentially useful and under-investigated areas for further research.