Introduction

The World Alzheimer Report estimated that approximately 131 million people are expected to live with dementia by 2050, worldwide [1, 2]. In spite of such burdensome epidemics, so far, no effective curative or disease-modifying treatment has been identified. Noticeably, the inclusion in randomized controlled trials of persons with an overt evidence of cognitive impairment may have limited the efficacy of the tested treatments [3]. Hence, increasing efforts have been paid toward the identification of putative modifiable risk factors and in favor of an earlier detection of AD [4,5,6]. Recently, research focused on the development of tools able to increase the specificity of diagnosis of prodromal AD [7, 8]. In this regard, neuropsychological tests have been deemed as pivotal both in clinical and research contexts, as they are easy to administer, inexpensive, and non-invasive [9].

The distinctive feature of typical amnestic AD is the presence of a paradigmatic and specific episodic memory involvement, characterized by a diminished free recall ability, which is only marginally improved by cueing [10]. The free and cued selective reminding test (FCSRT) has been used with the aim of maximizing the differentiation between the genuine hippocampal deficit of AD and age-associated memory dysfunction, due to impaired attention, inefficient information processing, and ineffective retrieval [11, 12]. The International Working Group (IWG) recommends the use of the FCSRT as a reliable tool for the assessment of this specific cognitive deficit [7].

The FCSRT has been extensively studied during the past years, with promising results [13,14,15]. Previous studies have demonstrated that the FCSRT is an appropriate tool to detect AD at early stages and predict future cases of dementia in cognitively intact or mildly impaired people [11, 15]. However, some results go in the opposite direction; namely, a handful of studies failed in demonstrating a superiority of the FCSRT with respect to other routine memory tests [16, 17]. These findings suggest that ensuring for a controlled encoding and testing for cued recall does not imply a better predictive power than free recall tests [16]. Those studies were conducted mainly in research settings, with limited comparisons between FCSRT and other neuropsychological tests and without providing estimations of the timing from the test administration to AD diagnosis.

In the present study, we aim to investigate the diagnostic accuracy for the diagnosis of AD of FCSRT in people with mild cognitive impairment referring to a memory clinic.

Materials and methods

Study population

We consecutively recruited outpatients from the Center for Research and Treatment of Cognitive Dysfunctions, Luigi Sacco Hospital, University of Milan, from May 2009 to January 2016. Participants were evaluated at baseline and re-evaluated every 6 to 12 months as part of the clinical routine [18, 19]. During each visit, demographic, functional, and clinical information was gathered through a comprehensive assessment. Follow-up ended at the diagnosis of AD, dementia, death, or at the end of the cohort surveillance (July 2016), whichever came first.

The ethic committee of the “Luigi Sacco” Hospital approved the study protocol. Informed written consent from all subjects was obtained.

Participants were included if they met the diagnostic criteria of MCI [20], based on an extensive neuropsychological battery. MCI was operationalized as follows: (1) subjective cognitive complaint as reported by the subject and/or corroborated by an informant, (2) objective cognitive impairment, according to age-specific and education-specific norms [21], on at least one task of the neuropsychological battery, (3) essentially preserved daily functioning defined as no impairment in basic activities of daily living (ADL) [22] and unimpaired or minimally impaired (i.e., impairment in one IADL) complex instrumental activities of daily living (IADL) [23], and (4) the absence of dementia defined according to DSM-IV criteria (Diagnostic and Statistical Manual of Mental Disorders). Based on the impaired domains on the neuropsychological battery, MCI participants were then classified as “non-amnestic single domain MCI” (impairment in one cognitive domain other than memory), “amnestic MCI,” (impairment only in memory tasks), or “multiple domain MCI” (impairment in at least two cognitive domains). Multiple domain MCI included both people with multiple domain amnestic and non-amnestic MCI.

Exclusion criteria were (1) severe psychiatric disorders (e.g., major depression, bipolar disorders, psychotic symptoms), (2) structural brain alterations (e.g., mass lesions and hydrocephalus) or organic illnesses affecting the brain, (3) history of severe traumatic brain injury, (4) major systemic illnesses or medical complications, with uncontrolled organ failure, (5) sensory disorders that could prevent the correct administration of the neuropsychological battery (i.e., blindness or deafness), and (6) history of substance or alcohol abuse.

Comorbidity burden was assessed using the Cumulative Illness Rating Scale (CIRS) comorbidity index [24]. Depression was assessed using the 30-item version of the Geriatric Depression Scale (considering ≥ 10 as the presence of clinically relevant depressive symptoms) [25].

Subjects were also offered APOE genotyping, and the results from the tests of those who accepted (130 subjects; 70% of the whole sample) were included in a supplementary analysis. As compared to those who accepted the APOE genotyping, those who refused did not have any statistically significant difference in terms of demographics, global cognitive performance as assessed with the MMSE, and chronic comorbidities as assessed with the CIRS.

Neuropsychological battery

Participants were assessed by trained neuropsychologists (RG, VC) following a standardized procedure. The neuropsychological battery assessed the whole cognitive spectrum including declarative long-term memory (story recall, the Rey Auditory Verbal Learning Test—RAVLT, and the Rey complex figure recall), language abilities (letter and category fluency), visuospatial and visuo-perceptual abilities (the Rey complex figure copy test and the clock-drawing test), and executive functions and attention (the Raven Colored Progressive Matrices 47 tests, the Trail Making Test parts A and B, the frontal assessment battery, the Stroop Color-Word Test). The battery was administered in a standardized sequence, alternating verbal and non-verbal tests, with the aim of avoiding the risk of proactive and retroactive interference in memory tests. All test scores, when appropriate, were adjusted for age and educational level according to available normative data [21, 26, 27]. The Clinical Dementia Rating [28] (CDR) was used as a measure of global cognition.

FCSRT procedure

All participants were administered with the Italian version of the FCSRT, whose normative data and cutoff for each score have been previously published [12, 29]. Briefly, the test has two different parts, the study and the memory ones. In the study phase, which ensures encoding, the examiner asks for pointing and naming 12 stimuli presented in three different cards. The card is then removed and immediate recall was examined. When the item is not recalled, the examiner shows the card again to test immediate recall again. The memory part consists of three recall trials, with a non-semantic interference task before (counting backwards for 20 s). First, participants need to recall as many items as possible in 2 min. Then, those non-retrieved items are recalled through one specific semantic cue for each stimulus. If the participant fails to retrieve the item with the category cue, the examiner verbally reminds the item. After 30 min, the same procedure is used to test delayed recall. Each trial is scored based on the number of recalled items and five scores are derived: (1) immediate free recall (IFR), (2) immediate total recall (ITR), (3) delayed free recall (DFR), (4) delayed total recall (DTR), (5) index of sensitivity of cueing (ISC).

Dementia and AD diagnosis

In the present study, we considered as outcome the diagnosis of dementia in keeping with the DSM-IV criteria [30], which requires the evidence of cognitive decline on the neuropsychological test battery and impairment on social or occupational functioning. The diagnosis was established by a consensus panel of a neurologist (SP, LM, IC, or GG) and a neuropsychologist (RG or VC). Furthermore, dementia subtypes were defined as follows: AD according to NIA-AA criteria [31], Lewy body dementia according to McKeith criteria [32], frontotemporal dementia according to Rascovsky criteria [33], and vascular dementia according to NINDS-AIREN criteria [34].

The results of the present study were reported in keeping with the STROBE recommendations (Appendix, Table S1).

Statistical analyses

Participants’ characteristics are reported as means and standard deviation (SD) or frequencies (%). The two-tail Pearson’s χ2 test and the analysis of the variance (ANOVA) were used to compare sample characteristics according to incident dementia. Multiple-comparison post-hoc analyses were also performed.

Diagnostic accuracy measures (i.e., sensitivity, specificity, positive predictive values, negative predictive values, positive likelihood ratios, negative likelihood ratios, and area under the curve) and their 95% confidence intervals (95% CI) were obtained for each cognitive test, using the clinical diagnosis of AD as a reference standard. To assess the strength and the consistency of the results, sensitivity analyses were also performed: (1) considering as a reference standard the clinical diagnosis of dementia (all types) according to DSM-IV, (2) restricting the analyses only to participants with amnestic impairments, (3) including only participants with a MMSE score ≥ 24, as a measure of severity of cognitive impairment [35].

In order to achieve a better diagnostic accuracy, we combined in parallel the pair of tests presenting the highest specificity and sensitivity. To increase sensitivity, we considered the test combination as positive when either one or both tests (i.e., those with the highest sensitivity) resulted positive. To increase specificity, we considered the test combination as positive when both tests (i.e., those with the highest specificity) resulted positive [36].

The association between FCSRT and the occurrence of AD, with the development of other types of dementia as competing event, was tested through proportional hazard regression models. Adjusted sub-distribution hazard ratios (sHRs) and 95% CI were obtained for each FCSRT index. Time to event was calculated from the time of enrolment in the study to any of the censoring events reported above. To limit the possibility of reverse causation, we repeated the analyses investigating the same associations considering only those participants who had an observation time of at least 24 months.

Finally, we used the Laplace regression to model the time-to-AD diagnosis as a function of FCSRT indexes. During a mean follow-up of 2 years, approximately 50% of participants in our cohort developed AD; we therefore chose to examine the median time-to-AD diagnosis. We estimated differences in the time by which the first 50% of the population developed AD according to FCSRT indexes [37].

All analyses were performed using Stata version 14 (Stata Corp, Texas, USA), with an α level of p < 0.05.

Results

During the mean follow-up period of 2.5 ± 1.3 years (interquartile range, IQR: 1.3–3.5 years), within the 187 participants (mean age at baseline 76 years; 56% women), 87 (46%) developed dementia; among them, 73 (84%) were of AD type. Sample characteristics at baseline according to the development of dementia (of any types and of AD type) are shown in Table 1. In multiple-comparison post-hoc corrections, those who developed AD were more likely to be female, to have lower MMSE and CIRS scores than those non-converters (p < 0.05 for all comparisons). Those who developed dementia at follow-up (irrespective of the type) were more likely to have a multi-domain MCI than those non-converters (p < 0.05 for all comparisons). No statistically significant differences in the post-hoc analyses arose for what concern age and education in the other groups.

Table 1 Baseline characteristics of the whole mild cognitive impairment (MCI) sample and by outcome at follow-up

Table 2 shows the diagnostic accuracy measures of the entire neuropsychological battery. The five indexes of the FCSRT had a sensitivity that ranged from 56 (DTR index) to 81% (DFR sub-item) and a specificity that ranged from 67 (IFR index) up to 83% (DFR index). Within the five indexes of the FCSRT, the DFR had the best AUC (0.76; 95% CI: 0.70–0.82). All the sensitivity analyses led to similar results (Appendix, Table S2).

Table 2 Measures (with the correspondence 95% confidence intervals) of diagnostic accuracy of neuropsychological tests for the development of Alzheimer’s disease

When combining the category fluency test (92.0; 95% CI: 85.4–96.3) in parallel combination with the DTR index of the FCSRT (83.3; 95% CI: 75.2–89.7), the overall specificity peaked up to 100%. The overall sensitivity of the delayed recall of Rey Auditory Verbal Learning Test (75.4; 95% CI: 63.5–84.9) with the DFR index of the FCSRT (80.8; 95% CI: 69.9–89.1) in parallel combination was 89.9%.

As depicted in Fig. 1, MCI participants with a score below the cutoff in the FCSRT indexes presented a two-to-five times increased risk of incident AD, as compared to those with FCSRT scores above the threshold. More specifically, those with a score below cutoff in the DFR had a more than six times higher risk (sHR: 6.2; 95% CI: 3.4–11.3) of developing AD at follow-up compared to those with a normal score and considering the development of other dementia types as a competing event. In the fully adjusted model, we obtained similar, although slightly attenuated, results.

Fig. 1
figure 1

Association between free and cued selective reminding test (FCSRT) indexes and incident Alzheimer’s disease. For each test, the group negative to the test was considered as the reference group. The estimations refer to separate models (one for each FCSRT index). Model 1: basic adjustment for age, sex, and education. Model 2: adjustment for age, sex, education, MCI subtypes, MMSE score, CIRS score

Similar results were also obtained when: (1) repeating the analyses considering only those MCI participants who had at least 1-year follow-up and (2) considering APOE genotype in the fully adjusted model (data not shown).

Figure 2 depicts the differences in the time-to-AD diagnosis as a function of the FCSRT indexes. Participants scoring below threshold in the DFR had a diagnosis of AD approximately 3 years before (median difference: − 3.0 years, 95% CI: − 2.0, − 4.0) in comparison to those with normal scores in the same index. Participants with a score below cutoff in the IFR, ITR, DTR, and ISC had respectively 2.2 (95% CI: − 3.6, − 0.7), 2.3 (95% CI: − 3.6, − 0.9), 1.7 (95% CI: − 3.2, − 0.2), and 1.7 (95% CI: − 3.5, − 0.2) years before AD diagnosis as compared to those with a normal score in the same index.

Fig. 2
figure 2

Difference in time to Alzheimer’s disease diagnosis (years) according to participants positive and negative at the FCSRT. For each test, the group negative to the test was considered as the reference group. Model adjusted for age, sex, education, MCI subtypes, MMSE score, CIRS. Abbreviations: IFR, immediate free recall; ITR, immediate total recall; DFR, delayed free recall; DTR, delayed total recall; ISC, index of sensitivity of cueing

Discussion

According to our results, as compared to an extensive neuropsychological battery, the FCSRT assessment suite shows the best predictive performance for the development of AD in people with MCI. Noticeably, a higher specificity was achieved by combining the scores of category fluency test with the DTR index in parallel combination. In addition, participants who tested positive at the FCSRT presented a twofold to fivefold higher risk of developing AD as compared with those who tested negative. Noteworthy, persons with MCI who tested positive at the FCSRT at baseline were diagnosed with AD 2–3 years earlier than those testing negative.

The present findings must be considered in the context of an extensive literature that looked at the role of memory tests as predictors of cognitive decline in the older population. The free and cued selective reminding procedure aims at maximizing the differentiation between the genuine encoding and storage deficits that characterize AD and the age-associated memory dysfunction, secondary to impaired attention, inefficient information processing, and ineffective retrieval [38, 39]. Accordingly, the FCSRT has been consistently considered an appropriate tool to detect AD both in population and clinical-based settings [40, 41] and to distinguish AD from frontotemporal dementia in people with severe cognitive impairment [13, 42]. In the present study, we confirm the specificity of the FCSRT for the first time through analyses that took into consideration other dementia subtypes as a competing event for incident AD.

In line with a growing body of literature investigating the FCSRT in at risk populations [14, 15, 43, 44], our results further support the crucial role of this test in the assessment of cognitive function in persons with MCI. Notably, we report for the first time the differences in time-to-AD diagnosis as a function of FCSRT scores, showing that persons with MCI who tested positive at the FCSRT are diagnosed with AD 2–3 years before than those testing negative. This finding expands the prior knowledge, empowering the FCSRT assessment of clinical and epidemiological value.

The study by Sarazin and colleagues [15], the first study conducted on people with MCI, suggested the immediate total recall as the best predictor for progression to dementia. The higher predictive value of this score compared to the free recall indicates that the “insensitivity” to cueing might be relevant in the identification of a hippocampal memory deficit, namely the pure memory disorder due to defective storage of information, rather than to poor encoding or retrieval performance. By controlling both for encoding and retrieval with the same semantic cues, the FCSRT can isolate storage deficits linked to an early involvement of hippocampal structures typical of AD, differentiating the memory impairment due to attentional problems and strategic retrieval processes.

Several similar longitudinal studies [43, 44] conducted on people with MCI led to comparable results. Lemos and colleagues, in line with our findings, showed a higher risk of developing AD in amnestic persons with MCI and positive at the FCSRT, with the total recall score being superior to the logical memory test in predicting such progression [14]. Likewise, the free recall and total recall were slightly better than the CERAD word list in another study, which also suggested an increased predictive value of the combined score [45]. In the present study, among the five indexes of the FCSRT, the ISC presented the poorest predictive power. This result can be surprising, given the theoretical background of the FCSRT and is in contrast with previous studies, which described the ISC as a valid measure in differentiating subtypes of MCI [46]. However, other studies did not confirm such results, particularly in the identification of people who will subsequently develop dementia, with other measures outperforming [16]. Our findings are in line with previous studies, which show the early and reliable deficit in people with MCI and in the older patients who will develop dementia. The use of composite scores may increase the diagnostic reliability in dementia prediction. However, the composite scores as substitutes of pure scores (i.e., free and total recall), even if theoretically more appropriate, may lead to a deflation of the global effect, because of the failure of several parameters employed to build the indexes themselves.

To summarize, there is ample evidence supporting the value of the FCSRT to predict progression toward dementia, in particular at risk populations. Our study provides additional valuable information from a monocentric clinical setting with an extended follow-up. This can be considered as a representative sample for a specialized, hospital-based memory clinic. Moreover, our study allows a direct comparison between the FCSRT scores and other memory tests widely used in similar settings (logical memory, word list learning, complex figure recall), and supports the value of combined measures [47]. It is worth mentioning that the version of the FCSRT used for this study was similar in format to the original picture version [39], but used only 12, rather than 16 stimuli. This allows shortening the testing time, without any loss of predictive value.

Our findings need to be interpreted in the context of the strengths and limitations of the methodology of our study. Major strengths are the large MCI sample—both clinically and psychometrically defined—in a longitudinal study design, and the use of strong outcomes as the clinical diagnosis of AD and dementia. Moreover, to the best of our knowledge, this is the first study assessing the predictive performance of FCSRT considering as competing risk the development of dementia of other subtypes, and reporting the differences in time-to-AD diagnosis as a function of FCSRT scores. Some limitations should be acknowledged. First, we had a mean observation period of 2.5 years, so it is, in this context, difficult to fully rule out the presence of a reverse causation. Individuals performing worse in the FCSRT scores may have been in a pre-dementia phase. To partially overcome this issue, we conducted a sensitivity analysis considering only those participants with a longer (> 2 years) follow-up, and the results were consistent, even though with less power. Second, we considered a sample of MCI people referring to a memory clinic, with a high incident rate of dementia/AD, and this might limit the generalizability of our results. Third, although many covariates have been taken into account in the adjusted analyses, incomplete control of confounding and the effect of unknown confounders may still be present.

In light of the ongoing efforts paid to the development of anti-dementia medications, the identification of people at higher risk of developing AD remains a clinical priority. In this context, the use of cognitive and memory tests to detect mild AD may be effective, and the FCSRT appears to be consistently reliable. Given its characteristics of being also non-invasive and easy to administer, it can be used to assess and recognize impairment in memory of hippocampal type. From a research point of view, this might allow identifying a more homogeneous population for the ongoing clinical trials. From a clinical standpoint, our results might help physicians in focusing on those MCI people referring to a memory clinic that will benefit from more frequent and regular follow-up, tailoring appropriate treatments and preventive strategies.