Machine Learning Detection of Cognitive Impairment in Pri- mary Care

Purpose: Routine cognitive screenings in primary care settings can benefit patient care and preventive medicine in multiple ways; however, their integration to the protocol of physical exams, as a standard of care, may be hampered by systemic considerations related to labor and cost. In an effort to decrease these impediments, the current study evaluated the validity of a screening procedure that had been specifically designed to impose minimal burden on the clinic. Method: The study examined the ability of a brief computerized cognitive test (the CNS Screen) to detect mild cognitive impairment in a cross-sectional design. Analyses employed a machine learning model of Support Vector Machines (SVM) to classify non-symptomatic subjects from a primary care clinic (n = 49) and hospitalized psychiatric patients with mild cognitive impairment (n = 26), based on the screening data. Results: The classifying algorithm correctly assigned participants to their respective groups at a probability of 0.945 through a ‘leave one out’ validation procedure. Conclusions: These findings suggest that instruments such as the CNS Screen may offer a pragmatic alternative to clinician-administered procedures, while maintaining the validity required for clinical practice. Implications for patient care in primary care settings are discussed.


Introduction
Although central to most aspects of human life, the brain receives relatively limited attention during physical exams. At current, the routine monitoring of brain health is excluded from primary care practices primarily due to practical considerations. In particular, biologically based procedures that can reveal brain pathology are too invasive, labor-intensive, and expensive to employ in community wide screenings [1]. In light of such prohibitive burden, assessing brain functioning through a cog-clude in a single evaluation [13].
In summary, incorporating a cognitive assessment to physical exams may provide several benefits, including early detection of mild cognitive impairment due to brain pathology, improved care for psychiatric disorders, enhanced ability to monitor the effects of other medical conditions on functioning, and the potential to inspire preventive life style changes in patients at risk for abnormal cognitive decline. These considerations warrant the development of cognitive measures that are tailored for the specific needs of the primary care settings. In this regard, it seems important to design a valid instrument that is cost effective and labor efficient. The optimal measure should be brief, computerized, self-administered, multi-versioned, and highly sensitive to changes in cognitive functioning.
The present study evaluated the accuracy of the CNS Screen an instrument that was designed according to the principles described above in detecting mild cognitive impairment due to psychiatric illness, across hospital and primary care settings. For this purpose, the study examined performance on the CNS Screen in two samples. One sample consisted of cognitively intact people who completed the CNS Screen during a regular visit to a primary care clinic. The second sample included psychiatric inpatients who exhibited cognitive impairment without dementia, assessed on the day of hospital discharge. Analyses aimed to determine the extent to which the CNS Screen was able to correctly identify cognitive impairment through a classification procedure. The overarching goal of the study was to achieve a high level of accuracy in a brief, self-administered procedure, as these attributes may offer a useful cognitive tool for detecting cognitive impairment in primary care practice and thereby improve patient care.

Sample characteristics
The study compared two samples that were collected in the context of different projects; one was designed to examine the utility of the CNS Screen in psychiatric care, and the other collected normative data for this instrument in a primary care clinic. The first sample (n = 49; mean age = 61.30, SD = 10.15, range: 45-83) was recruited in a primary care clinic in Quincy, MA, USA, over the course of 9 months. This project aimed to evaluate the feasibility of employing a self-administrative procedure for cognitive assessment in a primary care setting. The sample consisted of cognitively healthy patients (based on performance on the Mini-Mental State Examination, MMSE). Inclusion criteria consisted of age ≥ 45 and the ability to comprehend instructions in English. Participants were referred to the study by their primary care nitive proxy may offer a unique pragmatic advantage and multiple benefits to preventive healthcare [2].
Most notably, setting the starting post of annual cognitive testing to midlife (e.g., age 45) may create longitudinal data sets that constitute an empirical metric of brain aging. These data would allow physicians to characterize the slope of cognitive aging and change for individual patients in comparison to both personal base line and the general population, and thereby detect the departure from cognitive health, ideally before the development of significant psychosocial impairment [3]. Within the context of primary care, continuous feedback about cognitive decline can inform the treatment of medical risk factors for dementia, and with effective behavioral interventions also inspire life style changes that are conducive to protecting brain health [4,5].
Beyond its potential contribution to early detection of dementias, the routine assessment of cognitive functioning in primary care can help the treatment or management of other pathologies, especially psychiatric disorders. For example, most patients with schizophrenia and a substantial number of patients with bipolar disorder suffer from progressive cognitive impairment and reduction of brain volume over the course of the illness [6,7]. Studies indicate that cognitive decline and neurological degeneration in severe psychiatric disorders is associated with a more severe course of illness, poor functional adjustment, greater allostatic loads, higher risk for dementia, and heavier medical burden [8,9]. Early detection of cognitive decline in psychiatric patients may therefore help to prevent medical illness if the feedback leads to improvements in the management of the illness and related stresses in the community. Through the routine monitoring of cognitive functioning, primary care may play an important role in this process. Beyond severe psychiatric disorders, primary care is also a major mental health provider for common mood and anxiety disorders, which are often accompanied by cognitive deficits [10,11]. Cognitive impairment in these conditions tends to co-vary with symptom severity, and therefore represents an important clinical domain that warrants assessment, especially in late life depression [12]. For example, in major depression, recurrent assessment of cognitive functioning can help to evaluate the severity of a mood episode, potential functional impairments, and the remission process through objective, as opposed to self-report, measures. In many situations, the cognitive evaluation may capture residual symptoms or indicate deterioration in a manner that can guide clinical decisions and care more generally. Moreover, in older people, repeated cognitive assessment in primary care may inform the process of differential diagnosis between depression and dementia, which can be difficult to con-denced by a Pearson correlation coefficient of 0.78 across administrations [15].
The CNS Screen proceeds in the following sequence: (1) Verbal memory-acquisition; (2) Verbal memory-immediate recall; (3) Visual memory-acquisition; (4) Visual memory-immediate recall; (5) Balloon-popping task; (6) Number sequence task; (7) Number switching; (8) Verbal memory delayed recall; (9) Visual memory delayed recall. In the verbal memory series, participants memorize five words over twenty-five seconds, which they recall amid ten distractors during immediate recall and within pairs during delayed recall. The visual memory sequence consists of timed memorization of five playing cards for five seconds, and a later recall of card location on the screen. The immediate recall trial ends when the participant identifies all five cards in the sequence without error, and the delayed recall trial repeats the same task without an additional exposure to the cards. To test motor speed, the battery consists of a task requiring subjects to pop sixteen balloons as fast as possible in a spatial sequence that corresponds to subsequent tasks measuring attention and executive functioning. This balloons task controls for reaction time and mouse sensitivity, which affect the tasks that follow. These tasks assess attention and executive functioning through number sequencing and switching, based on the Trail Making Test paradigm [16]. On the attention subtest, subjects are required to click on non-consecutive numbers in ascending order, and in the executive section they are asked to advance through the ascending sequence while alternating between even and odd numbers. The test is available at http://www.cnshealth1. org/cnsger/index.jsp.

The Mini-Mental State Examination (MMSE)
The MMSE is currently the most widely used and studied brief screens for cognitive impairment [17,18]. It is clinician-administered with normal cognition generally falling equal to or above 27 of 30 points, although the clinical cut off score is often adjusted for age and education. Pooled estimates of sensitivity are found at 88.3% and specificity at 86.2% [17].

The Montreal Cognitive Assessment (MoCA)
The MoCA is another clinician-administered screen for the assessment of mild cognitive impairment and has wide distribution in clinical and research settings [19]. With a cut off of 25/26 of 30 points, sensitivity is found in the 80% to 100% range and specificity in the 50% to 76% range [17]. The MoCA is found to have a good convergent validity with the MMSE (r = 0.66) [20] and with the CNS Screen (r = 0.60) [15].

The Patient Health Questionnaire-4 (PHQ-4)
The PHQ-4 for depression and anxiety is a 4-item screen with wide distribution across healthcare settings providers if they met the aforementioned criteria. Potential participants were excluded from the study if they lacked proficiency to use the computer independently or presented with an acute medical condition that would adversely affect mental status (as opposed to chronic conditions managed in treatment). Four patients referred to the study were excluded due to possible early dementia, based on their MMSE scores. The primary care sample included twenty-six men and twenty-three women, of whom; five indicated an affiliation with a minority ethnic group. With respect to formal education, fifty percent of this sample reported earnings at least a bachelor degree, and two participants did not obtain a high school diploma (mean years of education = 14.3, SD = 2.2).
The second sample was gleaned from a larger data set that was collected in a previous study for the purpose of validating the CNS Screen in psychiatric patients [12]. The subsample consisted of twenty-six psychiatric inpatients at McLean Hospital (Belmont, MA, USA), who completed the test on the day of hospital discharge. Their data were selected for the present analysis based on the inclusive criterion of age (≥ 45; mean = 63.38, SD = 10.58, range = 46-84) and a presentation of cognitive impairment without dementia. All of the participants selected for this analysis scored below the clinical cut off score for cognitive impairment on the Montreal Cognitive Assessment (MoCA; mean = 22.75, SD = 1.25, range = 21-25). This sample included fourteen men and twelve women. Two participants indicated an affiliation with a minority ethnic group. Approximately thirty five percent of this sample reported earning a bachelor's degree or higher, and one participant did not obtain a high school diploma (mean years of education = 13.65, SD = 31).
With respect to psychiatric diagnoses, nineteen percent of the sample carried a diagnosis of schizophrenia or schizoaffective disorder, twenty-seven percent were diagnosed with bipolar disorder, and fifty-four percent were hospitalized for a major depressive episode. At the time of testing, participants were psychiatrically stable and approved for discharge by the treatment team.

Measures The Central Nervous System (CNS) screen
The CNS Screen [14] is a computerized, self-administered assessment that presents cognitive challenges across multiple domains, including verbal and visuospatial memory (immediate and delayed recall), attention, motor speed, and executive functioning. Each subtest has unlimited versions for multiple administration. Psychometric properties were established in the larger psychiatric sample described above. The measure co-varied with diagnosis, disability status, age and MoCA scores, and demonstrated acceptable test retest reliability, evi- gle direction: skewing outcome toward underestimating the performance of the CNS Screen. Stated differently, the variance in diagnostic measures across samples can decrease the power of analysis due to error. However, it does not increase the probability for detecting false findings with a blind classifier. Diagnostic errors in either sample (e.g., instances in which the MoCA falsely diagnosed impairment in the psychiatric sample, or the MMSE failed to detect impairment in the primary care subjects), regardless of source or proportion, therefore, limit rather than enhance the empirical success of an independent classifier. For this reason, analysis is unlikely to overestimate the merit of the CNS Screen, at least not due to sample discrepancies in diagnostic measures. Hence, evidence for satisfactory performance should be considered valid. We proceeded with the analyses below, based on this assumption.

Demographic and clinical data
Group comparisons revealed no differences in demographic variables. Consistent with the intentional selection of psychiatric subjects within the inclusive criteria of the study, a Yuen test for trimmed means [24] failed to detect group differences in age (trimming set to 20%; trimmed mean difference (tmd) = 2.38, SE = 3.09, test statistic (ts) = 0.77, df = 28.2), p < 0.44. Similarly, group comparisons of categorical variables revealed no differences in the level of education, χ 2 (5) = 2.60, p < 0.76, or gender, χ 2 (1) = 0.11, p < 0.91. On a measure of mood symptoms, subjects in the primary care sample scored markedly below the clinical range (i.e., mean PHQ-4 depression items = 1.3, SD = 0.29); whereas, the range of scores on the mood measure administered to the psychiatric sample indicated mild to moderate symptoms of depression (mean QIDS-SR 16

Group comparison of CNS Screen data
Both parametric and non-parametric comparisons revealed significant differences between the primary care and psychiatric samples on all of the CNS Screen measures. Table 1 presents the Kolmogorov Distances (KD) and differences in trimmed means between the groups. The analysis employed KD to assess the overlap between group distributions of cognitive measures. As Table 1 indicates, although the outcome of the "Kolmogorov Smirnov" tests was significant, the degree of overlap between the groups varied across tasks, without a mutually exclusive outcome for any single measure. In other and good reliability (Cronbach's α = 0.80) [21]. Scores of 9 or above out of 12 are considered severe, descending to moderate (6-8), mild (3)(4)(5), and normal (0-2).

The Quick Inventory of Depressive Symptomatology (Self-Rated) (QIDS-SR 16 )
The QIDS-SR is a 16-item inventory assessing symptom domains of a major depressive episode according to the American Psychiatric Association Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision [22]. The QIDS-SR 16

Subjective memory report
In addition to the aforementioned measures, participants were asked to provide a subjective assessment of their memory and concentration with three questions ("How would you rate your overall memory?"; "How would you rate your memory compared to 10 years ago?"; "How would you rate your ability to concentrate right now?"). Answers were rated on a Likert-type scale (range 0-3), with descriptors ranging from "Poor" to "Excellent" for the first and third questions, and "Much worse" to "About the same" for the second question.

Procedure
In both samples, participants received a detailed explanation of the test procedures and the goals of the study, before obtaining written informed consent. In the primary care sample, a trained member of the research team administered the PHQ-4, MMSE and the subjective memory questions, before subjects self-administered the CNS screen. The procedure in the psychiatric sample followed a similar sequence, employing the QIDS-SR 16 and MoCA instead of the PHQ-4 and MMSE. All participants in the psychiatric sample were assessed on the day of hospital discharge.

Methodological considerations surrounding sample discrepancies
As mentioned above, the samples in this study were characterized with different dementia (i.e., MoCA versus MMSE) and mood (i.e., PHQ-4 versus QIDS-SR 16 ) screens, because they were collected in the contexts of separate projects. In this section, we examine the potential effects of these discrepancies on the study's outcome.
The current study employed a common (i.e., uniformly applied across samples) cognitive classifier (i.e., based on CNS Screen data), independent of procedures related to sample selection and group assignments. Blinding the classifier in this manner effectively narrows the potential biases of the aforementioned discrepancies to a sin-and search for the function that best separates them in a geometrical manner into the correct categories [27].
The validity of the model is tested by the extent to which the function can successfully classify new cases (i.e., data not used during the training process that created the classifying function). Aside from linear classification, SVM can apply non-linear functions with different kernels, which represent specialized algorithms for pattern analysis [28].
The current analysis performed the SVM procedure in R with the function comsvm.best; based on the CRAN package e1071 [29] (see Appendix A for comsvm.best R code). This function runs the four most commonly used SVM kernels-linear, polynomial, radial and sigmoid-and reports the outcome of the kernel with the best performance, based on a 'leave one out' cross validation procedure. During its run, the algorithm eliminates the label of one of the observations in the dataset and attempts to classify it by applying its vectors to the function derived by the remaining data. This process is repeated for all of the observations, so each subject in the study is classified once by the rest of the sample. To maintain the independence between the classified observation and classifying function, the latter is generated anew with every iteration. In the present analysis, comsvm.best applied all the measures that appear in Table 1, as features, in addition to average response times for each task (i.e., 23 CNS Screen features in total). The result implicated sigmoid as the highest performing kernel for the data, and estimated the prediction rate of new cases to their respective labels (i.e., impaired or non-impaired) at 94.6%. Only 4 of the 75 subjects were misclassified, with 3 misses (i.e., impaired patients classified as non-impaired) and 1 false positive (i.e., non-impaired patient labeled as impaired).

Discussion
The current study provides preliminary support for assessing cognitive functioning in primary care settings through a brief, computerized, self-administered proce-words, the KD did not yield a single measure on the CNS Screen that was able to separate the groups completely, which would be a desirable psychometric property to attain for clinical practice. In general, Table 1 shows that the overlap between the groups was lower on measures of completion time than on measures of errors. Adjustment of type I error for multiple comparisons was based on a modified procedure of Bonferroni inequality [25].
The results of the parametric analyses were consistent with the non-parametric findings. Specifically, Yuen tests of trimmed means were highly significant for most measures in the hypothesized direction, with few exceptions noted on measures of errors during practice sessions, which were uniformly low across groups. Yuen's procedure [24] was deemed to be a preferred parametric analysis for the current data, as the observed distributions were skewed and contained numerous outliers across measures. Under these conditions, the Yuen test possesses higher power relative to conventional methods based on means. In addition, it maintains more adequate control over type I error, and allows for a more meaningful interpretation of effect size [26]. In a comparative view across measures and tests, Table 1 suggests that the even-odd numerical sequence (i.e., number switching) offers the largest effect size and lowest overlap between the groups, and may therefore provide good indication of impairment in a procedure that takes between 1 and 2 minutes to complete (Table 1).
To assess the clinical utility of the tested procedure, subsequent analysis employed a Support Vector Machine (SVM) method. SVM is a machine-learning model for classifying data with algorithms that are designed to develop exploratory functions through supervised learning. Given a set of training examples, each labeled for belonging to one of two classes (categories), an SVM training algorithm builds a model that assigns new examples into one category or another, based on a range of multiple vectors (i.e., predicting variables). In this process, the model may export the data to higher-order hyper planes an average level of education. To provide more accurate outcomes in these situations, the SVM analyses will need to be based on normative and clinical samples that are culturally relevant to the assessed population. Third, the present study does not assess the clinical promise of the protocol to detect the early stages of neurodegenerative illness; yet, it does present a compelling first step for a longitudinal investigation that will be able to address this critical issue. Finally, it should be emphasized that whereas an automated (i.e., self-administered/computerized) procedure can detect impairment; it cannot replace a comprehensive cognitive evaluation for diagnostic purposes. In other words, it may effectively signal to the physician the need for a more elaborate assessment in certain cases, but it does not constitute a cognitive evaluation that is sufficient for clinical purposes.
Despite these limitations, the present study supports the notion of conducting cognitive assessment in primary care settings. Specifically, it demonstrated that it might be possible to meet satisfactory clinical standards with highly efficient procedures suitable for physical exams in primary care. Further advances in this area may enhance the treatment of various medical conditions that involve cognitive symptoms in general and the primary care of geriatric populations in particular.
Attaining continuous cost-efficient improvements to geriatric primary care can be highly consequential, based on present estimations of demographic trajectories. Primary care and geriatrics intersect at the epicenter of one of the greatest modern-day challenges for national and global healthcare systems. The demographic "silver tsunami" is predicted to scale global heights of an estimated 379 million octogenarians by 2050 [30,31], of whom 115.4 million will have dementia [32]. This epidemic has led to the Institute of Medicine calling for improving geriatric care delivery [33]. However, economic considerations hamper the warranted progress.
With a fiscal structure rewarding optimal value and low-cost care, the current system remains entrenched in a volume driven fee for service paradigm [34]. Regrettable ramifications of this current-state are multifarious. Primary care physicians feel pressured to see numerous patients in very brief episodes of outpatient care. In addition, patient care related administrative burden compound work pressure [35] in a manner that increases provider burn out [36] and decreases the interest of postgraduate medical trainees in the field [37,38]. The consequent work force shortage prevents the significant work flow burden from being adequately distributed.
In this state of affairs, the imperative to screen for cognitive impairment early enough with optimal sensitivity is sacrificed at the altar of expediency. Of note, in dure. All unimpaired participants in the primary care clinic were able to complete the cognitive testing portion of the study on a laptop computer in less than 15 minutes without assistance. Analyses revealed that an SVM procedure distinguished between these subjects and psychiatric patients who presented with mild forms of cognitive impairment at an estimated accuracy of ninety four percent. This psychometric property meets clinical standards and therefore carries pragmatic significance. In general, the findings of this study indicate that the development of effective protocols for detecting mild forms of cognitive impairment in primary care clinics is feasible, even without a major increase in cost or burden.
It is important to note, however, that the nature of the screening employed in the current study is not diagnostic in nature. Although the SVM algorithm correctly identified psychiatric patients who achieved a MoCA score within the impaired range, it may not shed light on the origin or nature of impairment in uncharacterized patients. Instead, in the context of a physical exam, the tested protocol may be able to flag the presence of cognitive impairment more generally, or in a manner that justifies a referral to a specialized evaluation. To expand the generalization of the current findings, future studies may seek to replicate the reported outcome in multiple patient groups, which are assessed in memory clinics and other clinical settings. Recruiting patients who suffer from early stages of neurodegenerative conditions without psychiatric history or comorbidity would be of particular interest to this line of research.
Future efforts should also invest in longitudinal designs. The current study provides evidence for a reliable detection of cognitive impairment; however, its cross-sectional design does not assess the sensitivity of the CNS Screen to the measurement of cognitive change. The correct identification of abnormal decline or clinically significant improvement in cognitive functioning relative to previous assessments (i.e., baseline) may expand the clinical utility of cognitive screening in the primary care setting. It may allow physicians to identify early neurodegenerative processes across various medical conditions that adversely affect cognition, or monitor response to treatment.
Several limitations of the study deserve mention. First, the sample size was relatively small; thus, the reported psychometric properties of the CNS Screen for primary care use should be considered preliminary. Second, only a small number of participants in this study indicated an affiliation with a minority ethnic group, which limits the generalizability of results. In particular, it is possible that certain immigrant populations or patients with lower exposure to the mainstream culture would be flagged with impairment in higher proportions than Caucasians with previous studies, only 16.3% of cognitively impaired patients who had the added vulnerability of living alone had been formally diagnosed with dementia [39]. The failure to diagnose cognitive impairment early in these patients' results in increased medical cost and larger expenses related to community care supports and care giver time.
These trends are noted even across the most internationally acclaimed Scandinavian healthcare systems [40].
Hence, a cost effective, time efficient and optimally sensitive screening tool for cognitive impairment which can be seamlessly integrated into the high intensity pace of primary care work flow is the need of the hour. With this lens, the findings of this study, building upon others, [41] are significant, as they suggest possible technological advances that can promote the service quality of primary care for both patients at risk for dementia and patients who suffer from psychiatric disorders.