A Computerized Cognitive Test Battery for Detection of Dementia and Mild Cognitive Impairment: Instrument Validation Study

Background Early detection of dementia is critical for intervention and care planning but remains difficult. Computerized cognitive testing provides an accessible and promising solution to address these current challenges. Objective The aim of this study was to evaluate a computerized cognitive testing battery (BrainCheck) for its diagnostic accuracy and ability to distinguish the severity of cognitive impairment. Methods A total of 99 participants diagnosed with dementia, mild cognitive impairment (MCI), or normal cognition (NC) completed the BrainCheck battery. Statistical analyses compared participant performances on BrainCheck based on their diagnostic group. Results BrainCheck battery performance showed significant differences between the NC, MCI, and dementia groups, achieving 88% or higher sensitivity and specificity (ie, true positive and true negative rates) for separating dementia from NC, and 77% or higher sensitivity and specificity in separating the MCI group from the NC and dementia groups. Three-group classification found true positive rates of 80% or higher for the NC and dementia groups and true positive rates of 64% or higher for the MCI group. Conclusions BrainCheck was able to distinguish between diagnoses of dementia, MCI, and NC, providing a potentially reliable tool for early detection of cognitive impairment.


Introduction
In proportion with the growth of the aging population, the incidence of dementia is on the rise and is projected to affect nearly 14 million people in the United States and upwards of 152 million people globally in the coming decades [1][2][3]. Current rates of undetected dementia are reported to be as high as 61.7% [4], and available treatments are limited to promoting quality of life rather than reversal or cure of the disease process. The ability to properly identify and treat dementia at this scale requires an active approach focused on early identification. Early detection of dementia provides access to timely interventions and knowledge to promote patient health and quality of life before symptoms become severe [5][6][7][8]. Early and accurate diagnosis also allows for proper preparation for patients, caregivers, and their families, resulting in improved caregiver well-being and delayed nursing home placements [9][10][11][12]. Further, it helps to characterize patients with early-stage dementia for clinical trials, exploring the latest therapeutics and validating biomarkers indicative of specific pathologies. Despite the benefits, early detection is a challenge with current clinical protocols, leaving many patients undiagnosed until symptoms become noticeable in later stages of the illness [13].
Considered an early symptomatic stage of dementia, mild cognitive impairment (MCI) signifies a level of cognitive impairment between normal cognition (NC) and dementia [14]. While not all MCI cases progress, the conversion rate of MCI to dementia has been observed to be approximately 5% to 10% [15]. This stresses the importance of identifying MCI in early detection and clinical intervention for dementia, which is included in recommendations from the National Institute on Aging and the Alzheimer's Association [16]. Detection of MCI has been successful when using brief cognitive screening assessments. The widely used Montreal Cognitive Assessment (MoCA) has demonstrated 83% sensitivity and 88% specificity in distinguishing MCI from NC, and 90% sensitivity and 63% specificity in distinguishing dementia from MCI [17][18][19]. Similar performance has also been observed for the Mini-Mental State Examination (MMSE) and the Saint Louis University Mental Status (SLUMS) exam [20][21][22]. While these screening tests do well in their ability to detect MCI, they have many limitations. First, these tests are time-and labor-intensive (ie, verbal administration by a physician or test administrator and hours for training, recording responses, scoring, and interpreting results). Second, these paper-based tests cannot allow for tracking of timing, which is an important indicator of an individual's cognitive health [23]. Also, there is a lack of detailed insight into different cognitive domains because their individual subtests are, by design, simple and suffer from ceiling effects [19,24,25].
Neuropsychological tests (NPTs) represent a more extensive and comprehensive class of cognitive evaluation [26]. They allow for research into certain cognitive domains (eg, attention, working memory, language, visuospatial skills, executive functioning, and memory), research that is used to support clinical diagnoses and further delineate specific neurocognitive disorders. NPTs can determine patterns of cognitive functioning that relate to normal aging, MCI, and dementia progression with a specificity of 67% to 99% [27]. A major strength of NPTs is their ability to characterize cognitive impairment, providing clues to underlying pathology, and thereby improving diagnostic accuracy to guide appropriate treatment. However, NPTs come with downsides, including financial cost, long appointment times, and high levels of training and expertise required to conduct and interpret tests. Prior studies have also shown that some NPTs demonstrate high accuracy in differentiating dementia patients from healthy participants, but do not have adequate psychometrics to distinguish MCI from dementia [28][29][30][31].
Computerized cognitive assessment tools have been developed to address the issues of accessibility and efficiency [31][32][33][34]. They are more comprehensive than screening tests but less expensive and quicker than clinical NPTs, and they aim to maximize accessibility to both patients and providers. They also yield multiple benefits, including maintaining testing standardization, alleviating the time pressures of modern clinical practice, and providing a comprehensive assessment of cognitive function to strengthen a clinical diagnosis. Importantly, in the new era of practicing amid the COVID-19 pandemic [35][36][37], increasing the accessibility of remote cognitive testing for vulnerable and high-risk patients is essential.
This study evaluated BrainCheck, a computerized cognitive test battery available on mobile devices, such as smartphones, tablets, and computers, making it portable and allowing it to be administered remotely. In addition to offering automated scoring and instant interpretation, BrainCheck requires short administration and testing times, comparable to traditional screening instruments, but provides detailed insight into multiple aspects of cognitive functioning that only comprehensive NPTs can. BrainCheck has previously been validated for its diagnostic accuracy in detecting concussion [38] and dementia-related cognitive decline [39]. Furthering its validation for dementia-related cognitive decline, we sought to assess BrainCheck's utility as a diagnostic aid to accurately assess the severity of cognitive impairment. We measured BrainCheck's ability to distinguish individuals with different levels of cognitive impairment (ie, NC, MCI, and dementia) based on their comprehensive clinical diagnoses. Our goal was to further demonstrate the utility of BrainCheck for cognitive assessment, specifically as a diagnostic aid in cases where NPT may be unavailable or when a comprehensive evaluation is not indicated.

Ethics Approval
This study was approved by the University of Washington (UW) Institutional Review Board (IRB) for human subject participation (review number STUDY00000790).

Recruitment
Participants were recruited from a research registry maintained by the Alzheimer's Disease Research Center associated with the UW Medicine Memory and Brain Wellness Center clinic [40]. This registry is a continually updated database of individuals who have expressed interest and signed an IRB-approved consent form to be contacted about participation in Alzheimer disease (AD) and related dementias research studies, many of whom have been recently evaluated at the clinic and, hence, have a clinical diagnosis or evaluation. Those with listed addresses within a 70-mile radius of Seattle, Washington, were contacted by phone or email, based on information provided within the registry. If the person was unable to physically use an iPad, if the person was too cognitively impaired to understand or follow instructions, or if the primary contact (eg, spouse) indicated that the person was unable to participate, they were not recruited for the study. When study procedures were modified from in-person to remote administration due to the COVID-19 pandemic (approximately March 2020), participants outside the initial geographical range were contacted to explore remote testing capabilities. We required that these participants have access to either an iPad with iOS 10 or later or a touchscreen computer and Wi-Fi connectivity to participate in the study.
Using the provided primary cognitive diagnosis within the registry, participants were divided into one of three groups: (1) NC, indicated by subjective cognitive complaint or no diagnosis of cognitive impairment, some of which were self-reported; (2) MCI, representing both amnestic and nonamnestic subtypes; or (3) dementia, which included dementia due to AD, frontotemporal dementia, vascular dementia, Lewy body dementia, mixed dementia, or atypical AD.
A total of 5 participants were not recruited from the registry but via snowball sampling from other participants. The recruitment of these participants was simply due to convenience, typically a family member or friend that was also available at the time of testing. Out of these 5 participants, 4 of them were placed into the NC group after their self-reports denied symptoms or a history of cognitive impairment; these 4 were not patients of the memory clinic. The remaining participant was a patient from the memory clinic, just not a part of the registry, and was placed in the AD group based on their most recent diagnosis retrieved from their medical records. Testing for these 5 participants was administered on-site.

On-site Administration
Data for on-site administration was collected from October 2019 to February 2020. A session was held either in the participant's home or in a well-lit, quiet, and distraction-free public setting. Consent forms were reviewed and signed by the participant or their legally authorized representative and an examiner, with both parties obtaining a copy. The study was designed for participants to complete one session with a moderator using a provided iPad (model MR7G3LL/A; Apple Inc) connected to Wi-Fi to complete the BrainCheck battery. Prior to testing, participants were briefed on BrainCheck, and moderator guidance was limited to questions and assistance requested by the participant during the practice portions. Participants received a gift card (US $20) for participation at the conclusion of the study session.

Remote Administration
Due to the COVID-19 pandemic and interest in preliminary data on remote cognitive testing, study procedures were modified to accommodate stay-at-home orders in Washington state. Data collection resumed from April to May 2020, with modified procedures using remote administration. These participants provided written and verbal consent and were administered the BrainCheck battery remotely over a video call with the moderator. Participants used their personal iPads or touchscreen computer browsers to complete the BrainCheck battery. The same method for on-site administration, as described above, was used for remote administration.

Measurements
A short description for each of the five assessments comprising the BrainCheck battery (V4.0.0) is listed in Table S1 in Multimedia Appendix 1. More detailed descriptions may be found in a previous validation study [38]. After completion of the BrainCheck battery, the score for each assessment was calculated using assessment-specific measurements by the BrainCheck software (Table S1 in Multimedia Appendix 1). The BrainCheck Overall Score is a single, cumulative score for the BrainCheck battery that represents general cognitive functioning. This score was calculated by taking the average of all completed assessment scores. If an assessment was timed out, a penalty was applied by setting this assessment score to zero. The normalized assessment scores and BrainCheck Overall Scores were corrected for participant age and device used (ie, iPad vs computer) using the mean and SD of the corresponding score from a normative database previously collected by BrainCheck [38,39]. The score generated followed a standard normal distribution, where a lower score indicates lower assessment performance and cognitive functioning.

Statistical Analysis
Statistical analyses were performed using Python (version 3.8.5; Python Software Foundation) and R (version 3.6.2; The R Foundation) programming languages. All tests were 2-sided, and significance was accepted at the 5% level (α=.05). Comparison of means of groups was made by an analysis of variance test for normally distributed data. The chi-square test was used to analyze differences in categorical variables.
To evaluate BrainCheck performance among participants in different diagnostic groups while adjusting for age, sex, and administration type, linear regression was used in which the outcome variables were duration to complete BrainCheck battery, individual BrainCheck assessment scores, and BrainCheck Overall Scores. P values were corrected using the Tukey method for multiple comparisons. To assess the accuracy of the BrainCheck Overall Score in the binary classification of participants in the different diagnostic groups (ie, dementia vs NC, MCI vs NC, and dementia vs MCI), receiver operating characteristic (ROC) curves with area under the curve (AUC) calculations were generated to determine diagnostic sensitivity and specificity. In these binary classifications, sensitivity (ie, true positive rate) and specificity (ie, true negative rate) are measured, with the more severe group as cases and the less severe group as controls. For example, the MCI group represents cases in the MCI versus NC classification, but it represents controls in the dementia versus MCI classification. In assessing BrainCheck for three-group classification, we used volume under the three-class ROC surface method from Luo and Xiong [41] to define optimal cutoffs for the BrainCheck Overall Score and find the maximum diagnostic accuracy.

Participant Characteristics and Demographics
A total of 241 individuals were contacted to participate, and 99 participants completed the study. Demographic details of the participants are provided in Table 1. The three groups did not differ to a significant degree in terms of education, administration type, or recruitment type, but there were differences in age and sex.

Completion of Assessments
We found that most participants in the NC group were able to complete the assessments, whereas the dementia group had a higher time-out rate, with the MCI group falling in between the two (Figure 1)

BrainCheck Performance
BrainCheck assessments were compared across the three groups using a linear regression model with age, sex, and administration type as regressors ( Figure 2 and Table 2). Individual scores, such as the BrainCheck Overall Score, were normalized for age and device. Overall, participants with greater cognitive impairment showed lower BrainCheck assessment scores. All individual assessments except Trails B showed significant differences in performance between the NC and dementia groups, whereas two of the seven assessments (ie, Immediate Recognition and Digit Symbol Substitution) showed significant differences in performance between all three groups ( Figure 2 and Table 2). Digit Symbol Substitution, Flanker, and Trails A/B assessments showed long tails in the scores of the dementia group because some participants in the dementia group only completed parts of the assessments or exhibited low accuracy ( Figure S1 in Multimedia Appendix 1).
The BrainCheck Overall Score is a composite of all individual assessments within the BrainCheck battery, representing overall performance (see details in the Measurements section). Using an existing normative population database, partly compiled from controls in previous studies [38,39], the BrainCheck Overall Score was adjusted for age and the device used to generate the normalized BrainCheck Overall Scores. The normalized BrainCheck Overall Scores differed significantly among these three groups (P<.001). Pairwise comparisons with Tukey adjustments for multiple comparisons show that the NC group scored significantly higher than the MCI group (P=.002) and the dementia group (P<.001), and the MCI group scored significantly higher than the dementia group (P<.001; Figure  3).

BrainCheck Diagnostic Accuracy
Using ROC analysis, BrainCheck Overall Scores achieved a sensitivity of 88% and a specificity of 94% for classifying between dementia and NC participants (AUC=0.95), a sensitivity of 86% and a specificity of 83% for classifying between MCI and NC participants (AUC=0.84), and a sensitivity of 83% and a specificity of 77% for classifying between dementia and MCI participants (AUC=0.79; Figure 4).
Using methods described by Luo and Xiong for three-group classification [41], the optimal lower and upper cutoffs of the normalized BrainCheck Overall Score in maximizing diagnostic accuracy were -3.64 and -0.06, respectively. This achieved true positive rates of 80% for the NC group, 64% for the MCI group, and 81% for the dementia group ( Figure 5).  Overall Scores, where the x-axis is the index of the participant, sorted by primary diagnosis (dementia: red, MCI: green, and NC: blue). The values t+ and t-, respectively, represent the optimal upper and lower cutoffs of the normalized BrainCheck Overall Score in maximizing diagnostic accuracy. B. Box plots of normalized BrainCheck Overall Scores for each diagnostic group. The normalized BrainCheck Overall Score follows a standard normal distribution. The dashed lines label the optimal cutoff scores for distinguishing the diagnostic groups. MCI: mild cognitive impairment; NC: normal cognition.

Principal Findings
Consistent with prior findings in concussion [38] and in dementia and cognitive decline [39] samples, this study demonstrated that BrainCheck is consistent in its capability to detect cognitive impairment and can reliably detect severity and differentiate between cognitive impairment groups (ie, NC, MCI, and dementia). As expected, participants with more severe cognitive impairment performed worse across the individual assessments and on BrainCheck Overall Scores. The BrainCheck Overall Scores separated participants of different diagnostic groups successfully with high sensitivity and specificity.
BrainCheck Overall Scores were more robust in distinguishing between these groups where participants in the dementia group had significantly lower scores than those in the NC group. The BrainCheck battery was able to distinguish between NC and dementia participants, with 94% sensitivity and 88% specificity. These findings show that the BrainCheck Overall Score demonstrates better accuracy for differentiating NC from dementia, compared to the MMSE, SLUMS, and MoCA screening measures [22,41,42]. People with MCI usually experience fewer cognitive deficits and preserved functioning in activities of daily living compared to those with dementia [43], and our findings of sensitivity and specificity with separating MCI from other groups were slightly lower than the NC versus the dementia differentiations ( Figure 5). Nonetheless, the BrainCheck Overall Score showed sensitivities and specificities greater than 80% in distinguishing MCI from NC and dementia groups, which is comparable to the MoCA, SLUMS, and MMSE [18][19][20][21][22]42]. Furthermore, a review of validated computerized cognitive tests indicated AUCs ranging from 0.803 to 0.970 for detecting MCI, and AUCs of 0.98 and 0.99 in detecting dementia due to AD [44], which were comparable with the results found in this study.
Although not all individual assessments in the BrainCheck battery differentiated between NC, MCI, and dementia, we observed a general trend for each assessment showing that dementia participants had the lowest scores, whereas the NC participants had the highest scores. Individual assessments that did show significant differences in the scores between NC and MCI groups and between dementia and MCI groups included Immediate Recognition and Digit Symbol Substitution. Notably, Digit Symbol Substitution showed significant differences in performance between all three diagnostic groups, whereas a previous study found that Digit Symbol Substitution did not show significant differences between cognitively healthy and cognitively impaired groups (n=18, P=.29) [39], likely due to this study having a larger sample size. Individual assessments with no significant differences between the MCI group and the NC and dementia groups were the Stroop and Trails A/B tests ( Figure 2 and Table 2). All of these tests include time-out mechanisms if participants are unable to complete the test, and time-out rates were higher in the more cognitively impaired groups ( Figure 2). Therefore, when calculating the BrainCheck Overall Score, we have introduced a penalty mechanism for timed-out assessments.
In comparison to comprehensive NPTs, which can typically last a few hours and sometimes require multiple visits [43], Another limitation was that although not all individual assessment scores could differentiate the three groups, the pattern of differences across these scores may contain useful diagnostic information. The use of the BrainCheck Overall Score as an average of all individual assessment scores appears to work effectively, but does not take into account the other relationships seen across individual scores. Furthermore, some individual scores may be more informative for detecting cases, whereas others may be informative for gauging severeness. Future studies recruiting a larger sample size in each group will allow for an investigation into whether machine learning methods can extrapolate these relationships and improve the diagnostic accuracy of BrainCheck.
When administration type was considered in linear regression model analyses, scores only showed significant differences among the three diagnostic groups instead of administration types. While remote administration was not designed into the original study, stay-at-home orders due to COVID-19 required modifications, and efforts were made to provide preliminary data for remote use. With preliminary outcomes indicating feasibility for remote administration, a more robust study and increased sample size will be needed to fully validate BrainCheck's cognitive assessment via its remote feature.

Conclusions
The use of computerized cognitive tests provides the opportunity to increase test accessibility for an aging population with an increased risk of cognitive impairment. The findings in this study demonstrate that BrainCheck could distinguish between three levels of cognitive impairment: NC, MCI, and dementia. BrainCheck is automated and quick to administer, both in person and remotely, which could help increase accessibility to testing and early detection of cognitive decline in an ever-aging population. This study paves the way for a comprehensive longitudinal study, exploring BrainCheck in early detection of dementia and monitoring of cognitive symptoms over time, including further comparison to gold-standard neuropsychological assessments.