The recency ratio assessed by story recall is associated with cerebrospinal fluid levels of neurodegeneration biomarkers

Recency refers to the information learned at the end of a study list or task. Recency forgetting, as tracked by the ratio between recency recall in immediate and delayed conditions, i.e., the recency ratio (Rr), has been applied to list-learning tasks, demonstrating its efficacy in predicting cognitive decline, conversion to mild cognitive impairment (MCI), and cerebrospinal fluid (CSF) biomarkers of neurodegeneration. However, little is known as to whether Rr can be effectively applied to story recall tasks. To address this question, data were extracted from the database of the Alzheimer's Disease Research Center at the University of Wisconsin - Madison. A total of 212 participants were included in the study. CSF biomarkers were amyloid-beta (Aβ) 40 and 42, phosphorylated (p) and total (t) tau, neurofilament light (NFL), neurogranin (Ng), and α-synuclein (a-syn). Story Recall was measured with the Logical Memory Test (LMT). We carried out Bayesian regression analyses with Rr, and other LMT scores as predictors; and CSF biomarkers (including the Aβ42/40 and p-tau/Aβ42 ratios) as outcomes. Results showed that models including Rr consistently provided best fits with the data, with few exceptions. These findings demonstrate the applicability of Rr to story recall and its sensitivity to CSF biomarkers of neurodegeneration, and encourage its inclusion when evaluating risk of neurodegeneration with story recall.


Introduction
In memory research, serial position effects refer to better retrieval of the information learned at the beginning of a study list or task as compared to the information learned in the middle (primacy effect), and better retrieval of the information learned at the end of such study list or task as compared to the information learned in the middle (recency effect; Murdock, 1962). Interestingly, while individuals with Alzheimer's disease (AD) typically present with poor primacy effects, a recency effect is often still observed, particularly when testing occurs right after the learning phase (Foldi et al., 2003). However, recency performance then tends to deteriorate if a delay is placed between learning and test (Carlesimo et al., 1995). To leverage this pattern whereby high recency is observed at immediate recall whereas low recency is found at delayed recall in AD, the recency ratio (Rr) was proposed, which is operationally defined as the ratio between recency recall in immediate compared to delayed recall conditions (Bruno et al., 2016(Bruno et al., , 2018. Using Rr in list-learning tasks (e.g., Rey's Auditory Verbal Learning Task), it has been shown that higher (i.e., worse) scores predict cognitive decline in asymptomatic individuals (Bruno et al., 2016), preclinical mild cognitive impairment (MCI) from a healthy baseline (Bruno et al., 2016;Egeland, 2021), and amyloid-b pathology in individuals with MCI (Bruno et al., 2019). Rr from list-learning tasks has also been found to correlate with cerebrospinal fluid (CSF) levels of neurogranin (Ng), a post-synaptic protein reflecting synaptic dysfunction (Bruno, Reichert Plaska, et al., 2021), and to both phosphorylated (p-) and total (t-) tau levels . Additionally, Rr scores have been found to discriminate between individuals diagnosed with AD versus other types of dementia (Turchetta et al., 2018), and to identify successfully individuals with MCI who are more likely to convert to AD (Turchetta et al., 2020). All in all, in list-learning tasks, Rr compares favourably to most conventional scores employed to estimate memory ability in older individuals (Bock et al., 2021).
However, little is known as to how successful Rr may be in story recall tasks. Story recall tasks, unlike list-learning tasks, present participants with a coherent story to learn, and then typically ask them to recall it right after presentation and then again after a delay. Owing to their semantic structure (compared to a semantically unrelated list of words), story recall tasks are typically thought to be less sensitive to serial position effects, but this is not the case. Hall and Bornstein (1991), for example, showed serial position effects in individuals with closed head injuries and controls, when using the Logical Memory test (LMT), a common story recall task (Wechsler, 1987). The LMT is also commonly used as a screening tool for dementia, as demonstrated by its use in the Alzheimer's Disease Neuroimaging Initiative (ADNI) and the Australian Imaging, Biomarker & Lifestyle Flagship Study of Ageing (AIBL). Aptly then, Bruno, Mueller, et al. (2021) examined LMT scores in late middle-age and older individuals with MCI and people with no cognitive impairment, and observed clear serial position effects. Moreover, they also highlighted the role of primacy forgetting (billed there primacy ratio, henceforth Pr) in predicting amyloid burden.
The aims of the present study were to examine the degree to which process scores such as Rr and Pr may predict CSF biomarkers of brain amyloidosis, tau pathology and neurodegeneration in middle age and older individuals. Moreover, we also aimed to establish whether these process scores performed better than traditional scores of immediate and delayed recall performance in story recall. We achieved this by analysing data from the Wisconsin Alzheimer's Disease Research Center (WADRC), an ongoing longitudinal cohort study based at the University of Wisconsin e Madison, which included LMT data alongside CSF levels of biomarkers associated with AD and neurodegeneration: amyloid-beta (Ab) 40 and 42, p-and t-tau, neurofilament light (NFL), Ng, and asynuclein (a-syn).

Methods
We report how we determined our sample size, all data exclusions, all inclusion/exclusion criteria, whether inclusion/ exclusion criteria were established prior to data analysis, all manipulations, and all measures in the study. The ethical regulations that govern the WADRC prevent unrestricted public archiving of anonymised study data. Data can be requested from the WADRC Executive Committee at: https:// www.adrc.wisc.edu/apply-resources. Data will be released to internal and external investigators following confirmation of IRB approval together with an evaluation by the WADRC of scientific merit and resource availability. 2.1.

Participants
Data were extracted from the WADRC database. After baseline, WADRC participants complete regular follow-up visits at 1-or 2-year intervals, including neuropsychological tests, clinical measurements (e.g., blood pressure, heart rate), and health history. Participants were selected on the basis of having completed at least two assessment visits: one for cognitive evaluation, including LMT, and at least one lumbar puncture visit for CSF extraction. Participants were classified as either cognitively unimpaired, with MCI due to presumed AD (MCI-AD), or with dementia due to presumed AD, via consensus conference diagnosis, determined by a team that included physicians, clinical neuropsychologists, and clinical nurse practitioners, based on core clinical criteria developed by the National Institute on Aging and the Alzheimer's Association (Albert et al., 2011;McKhann et al., 2011), and without regard to AD biomarker status. From the total pool of 828 participants, 212 participants fit the inclusion criteria. Of these, 156 were cognitively unimpaired, 26 had a diagnosis of MCI-AD, and 30 had a diagnosis of AD. Moreover, 16 (10%) cognitively unimpaired participants displayed biomarkersdetermined AD based on a CSF p-tau/Ab42 ratio cut-off of .038: a low ratio identifies individuals without a biomarkerbased AD diagnosis, whereas a higher ratio identifies people with a biomarker-based AD diagnosis (Van Hulle et al., 2021). Similarly, 16 (62%) and 28 (93%) of MCI-AD and AD participants, respectively, displayed AD biomarkers. Only one participant did not identify as white Caucasian. Participants' cognitive data were taken from whichever visit was closest to the visit where the lumbar puncture was performed. All activities for this study were approved by the ethics committees of the authors' universities and competed in accordance with the Declaration of Helsinki. All participants provided informed consent prior to testing. No part of the study procedures or analyses were pre-registered prior to the research being conducted.

Cognitive assessment
The LMT was used to measure story recall performance. The LMT is a subtest of the Weschler Memory Scale Revised (WMS-R; Wechsler, 1987), comprising two stories, A and B, with 25 items ("idea units") each, and representing different semantic and lexical categories. Each story is read aloud to the participant and then the participant is asked to recall both stories immediately and again after a 25e30-min delay. Scoring procedures from the WMS-R manual were applied. Although the scoring criteria permits some alteration from the original item (e.g., "slid off the table" is allowed instead of "fell off the table"), certain items must be recalled verbatim, e.g., numerical expressions or proper names. In order to slim down the administered cognitive battery, story recall in the present study was measured only with story A of the LMT. However, to note, story A and story B have been found to be of comparable memorability in a separate study . Immediate LMT and Delayed LMT recall scores were calculated from adding all the correctly recalled items in the immediate recall trial and delayed recall trial, respectively. Possible scores for Immediate and Delayed Recall trials range from 0 to 25 for each, where higher scores reflect more items being recalled. Finally, primacy and recency were defined as the first and final eight idea units of the story, respectively, while middle was defined as the middle nine units, following previous work (Bruno, Mueller, et al., 2021) e the choice of idea units per serial position is arbitrary. Immediate and delayed recency scores were calculated as the number of correctly recalled recency items in immediate and delayed recall trials, respectively. Rr was obtained by dividing the recency scores in the immediate recall trial by the corresponding scores in the delayed recall trial. A correction also was applied ((immediate recency score þ 1)/(delayed recency score þ 1)) to avoid missing data due to zero scores (Bruno et al., 2018). Pr was calculated following (Bruno, Mueller, et al., 2021) by dividing delayed primacy by immediate primacy, with no adjustments. While this is inconsistent with the way Rr is computed, and should arguably be aligned, we opted here for maintaining the original formula. Finally, to provide a non-serial position based forgetting index that would account for memory loss, we also computed a ratio score with Immediate LMT and Delayed LMT ((Immediate LMT þ 1)/(Delayed LMT þ 1)), which we dubbed the total ratio (Tr).

Biomarker determination
All CSF samples were assayed at the Clinical Neurochemistry Laboratory, University of Gothenburg, under strict quality control procedures. All CSF markers were measured using the exploratory Roche NeuroToolKit assays, a panel of automated robust prototype immunoassays (Roche Diagnostics International Ltd, Rotkreuz, Switzerland), not currently approved for clinical use. Elecsys® Ab42, Ab40, p-tau (181P), and t-tau, were performed on a cobas e 601 analyzer; a-syn, NFL, and Ng were performed on a cobas e 411 analyzer, as previously described (Van Hulle et al., 2021).

Genotyping
DNA was extracted from whole blood. Samples were aliquoted on 96-well plates for determination of APOE genotypes. An APOE risk score was calculated based on the odds ratios of the e2/e3/e4 genotype, as previously reported (Darst et al., 2017).

Analysis plan
For each CSF outcome, we carried out Bayesian linear regressions with Pr, Rr, Immediate LMT, Delayed LMT, and Tr as predictors in the same analyses; age at the lumbar puncture, time elapsed between lumbar puncture and memory assessment, sex, years of education, APOE risk score, number of overall cognitive assessment visits (to account for practice effects), and whether they classified for biomarkersdetermined AD were used as control variables. Control variables formed the null models. CSF biomarkers were used as outcomes, in separate analyses. Bayesian analyses allow for the estimation of model plausibility, which permits comparison of models with different combinations of predictors, and for the determination of effect sizes with credible intervals (e.g., Teipel et al., 2021). In Supplementary Materials we also report the outputs of Frequentist analyses. Note that diagnosis was not included in the analyses to avoid circularity since LMT scores are evaluated in the consensus process. For all Bayesian analyses, the model prior was set to Uniform, where all models are a-priori equally likely, and the prior on parameters was set to the default Jeffreys-Zellner-Siow (JZS) prior probability distribution, which allows the Bayes factor to be the same regardless of unit of measurement. Credible intervals were set to 95%. To address potential issues with nonnormally distributed residuals in the regressions, Markov chain-Monte Carlo (MCMC) sampling to each analysis was applied 1,000 times. The outcome variables were CSF levels of Ab40, Ab42, p-tau, t-tau, Ng, NFL, and a-syn. Additionally, we also examined models with Ab42/Ab40 and p-tau/Ab42 ratios as outcomes, as these measures are commonly used as biomarkers of neurodegeneration (Campbell et al., 2021;Li et al., 2013). Control variables formed the null models in each analysis. Analyses were conducted using JASP (0.16.2; https:// jasp-stats.org/). Table 1 reports means, standard deviations and range for the demographic variables, age differences, APOE risk score, Pr, Rr, Immediate LMT, Delayed LMT, and Tr recall scores by cognitive status closest to lumbar puncture. Rr scores ranged from .43 to 5 across participants. Note that while higher Pr scores are preferable, the higher the Rr score, the worse. Fig. 1a and b report serial position performance by delay in controls and individuals with biomarkers-determined AD, respectively. The values for primacy, middle and recency are proportions out of eight, nine and eight, respectively, to allow for direct comparison across serial positions. The plot displays a slightly more pronounced curve for immediate recall than for delayed recall in controls, and a substantial drop in Table 1 e Demographics, CSF measures and memory tests scores (mean and standard deviation) for the study participants. Elapsed time refers to time between cognitive testing and lumbar puncture, and it was calculated as an absolute value. Statistical tests were also conducted to check for differences across cognitively unimpaired, MCI-AD and AD: p values are reported. LP ¼ lumbar puncture; CSF ¼ cerebro-spinal fluid; Rr ¼ recency ratio; Tr ¼ total ratio; Pr ¼ primacy ratio; LMT ¼ logical memory test. *Ab40 N ¼ 211, p-tau N ¼ 212, a-syn N ¼ 212.

Characteristic
Total  c o r t e x 1 5 9 ( 2 0 2 3 ) 1 6 7 e1 7 4 delayed primacy in biomarkers-determined AD. This pattern is analogous to that reported already by Bruno, Mueller, et al. (2021). It may be noted also that delayed recency is better in our data with LMT than what traditionally expected with word-list tests. While it is beyond the scope of this paper to address theories of serial position, these findings do argue against the recency boost being solely a consequence of shortterm memory processing in story recall. CSF Ab42. The best fitting model was the null model. The second best model had Immediate LMT alone (see Supplementary information for full model comparisons and posterior summaries). The Bayes Factor (BF 10 ) that gives us the relative predictive adequacy of this model compared to the null model was .649, meaning that the observed data are .649 times more likely under this model than under the null model (which includes all the covariates). BF 10 scores below 1, as in this case, indicate that the null model is a better fit for the data than the alternative models. Conventionally, also, BF 10 scores below 3 are considered to provide only anecdotal evidence over the null model, and are therefore not sufficiently strong to draw firm conclusions.
CSF Ab40. In contrast, the best fitting model for CSF Ab40 combined Delayed LMT performance with Rr (BF 10 ¼ 5.350; moderate evidence). Both Delayed LMT and Rr were positively associated with CSF Ab40 levels: Delayed LMT had a posterior mean of 64.438 (SD ¼ 112.988), and 95% Credible Intervals (CIs) ranged from À84.286 to 327.865; Rr had a posterior mean of 1067.043 (SD ¼ 844.836), and CIs ranged from À47.522 to 2539.844. The inclusion probability was much higher for Rr, .762, than for Delayed LMT, .493, suggesting that Rr is the better predictor of CSF Ab40 in these data.
CSF t-tau. The best fitting model with CSF t-tau included only Rr (BF 10 ¼ 3754.173; extreme evidence). Rr had a posterior mean of 56.214 (SD ¼ 13.652; CIs from 31.435 to 83.779). The inclusion probability for Rr was >.999, and adding Rr to the model improved it by over 650 times (i.e. BF inclusion ¼ 658.873).
CSF p-tau. The best fitting model with CSF p-tau also included only Rr (BF 10 ¼ 1552.309; extreme evidence). Rr had a posterior mean of 5.490 (SD ¼ 1.428; CIs from 2.719 to 8.241). The inclusion probability for Rr was ¼ .998 (i.e. BF inclusion ¼ 231.593).
CSF a-syn. The best fitting model with CSF a-syn was the model with Rr alone (BF 10 ¼ 50.318; very strong evidence). Rr was positively correlated with a-syn (posterior mean ¼ 33.501, SD ¼ 12.311, CIs: 12.009 to 59.343). The BF inclusion was 28.196.
CSF Ab42/Ab40. The best fitting model was the null model, followed by a model with Rr alone (BF 10 ¼ .250).

Discussion
The goal of this study was to establish whether process scores from story recall, such as Rr, were as sensitive to amyloid and tau proteinopathy, alongside other biomarkers of neurodegeneration, as it has been shown to be previously in listlearning tasks. To test this claim, we analysed data from the WADRC, comprising 212 participants who were either cognitively unimpaired, with presumed MCI-AD, or with presumed AD, and we correlated performance in the LMT, a popular story recall test, with CSF levels of biomarkers associated with AD, including measures of amyloid, tau, and neurodegeneration. Our Bayesian analyses clearly indicate that cross-sectional Rr levels are associated with several CSF biomarkers of neurodegeneration and AD, when controlling for age, level of education and APOE risk. Rr also specifically outperformed a ratio score introduced to measure total memory loss from immediate to delayed story recall, which we termed here total ratio. By and large, these results were mirrored by frequentist analyses (see Supplementary information, S1): Rr was the best predictor of t-and p-tau, Ng, a-syn and the p-tau/Ab42 ratio, consistent with the Bayesian results. Also consistent with the Bayesian results, Rr was not correlated with Ab42, NFL and the Ab42/Ab40 ratio. The only difference was with Ab40, where the Bayesian analysis, but not the Frequentist analysis, found an association with Rr. A lack of association between Rr and CSF Ab42 is overall consistent with a recent report (within an overlapping cohort) using a list-learning task, where Rr was not found to predict CSF Ab42 levels . As CSF Ab42 levels are thought to reflect closely brain amyloid deposition, the preeminent pathological hallmark of AD, these findings may suggest that Rr is not a specific cognitive marker of AD. However, this notion is not consistent with the following observations: Rr is sensitive to the CSF levels of Ng (also reported in Bruno, Reichert Plaska, et al., 2021), a neuron-specific postsynaptic protein that has been linked specifically to AD neurodegeneration (Wellington et al., 2016;; and Rr was also sensitive to the levels of the p-tau/Ab42 ratio, which has been shown to be as predictive of brain amyloid pathology (Campbell et al., 2021). A final point to consider is that Rr was also found to correlate with CSF a-syn levels, partially consistent with the results of Bruno, Reichert Plaska, et al. (2021). While a-syn, a pre-synaptic protein that c o r t e x 1 5 9 ( 2 0 2 3 ) 1 6 7 e1 7 4 can be found in cortical and sub-cortical areas, is typically linked to Parkinson's disease and dementia with Lewy bodies (Selnes et al., 2017), elevated CSF a-syn levels have also been found in individuals on a trajectory to AD (Shim et al., 2020).
Both p-and t-tau were found to be associated with Rr levels, suggesting that it is sensitive to neurofibrillary tangle (tau) pathology and, in turn, neurodegeneration in the medialtemporal lobe (Maass et al., 2019;Tennant et al., 2021). These findings are consistent with a recent report using list-learning, where Rr was also found to correlate positively with both pand t-tau in individuals with MCI and unimpaired cognition . A link between higher Rr scores and lower volume of the hippocampus was also recently observed  in overlapping participants, giving credence to the suggestion that higher Rr scores may be a consequence of combined loss of consolidation ability, which would follow atrophy of the medial-temporal lobe (Wixted, 2004;Wixted & Cai, 2013), while reliance on phonological/ echoic short-term memory remains relatively intact (Bruno et al., 2018;Turchetta et al., 2018). These observation may also help explaining further the lack of association between Rr and CSF Ab42 levels, as amyloid pathology does not specifically target regions in the medial temporal lobe, unlike neurofibrillary tangle pathology.
The different sex distribution across consensus diagnoses should be noted. As per Table 1, 72% of unimpaired individuals were female, while that percentage dropped drastically in people with MCI (23%) or probable AD (37%). This finding is at odds with the common observation that the majority of AD cases tend to be women (Alzheimer's Association, 2017). While a thorough examination of this issue is beyond the scope of the present manuscript, we looked at story recall outputs across sexes to see how they may vary. Interestingly, while immediate and delayed LMT scores are significantly higher for unimpaired females than males, with both parametric and non-parametric tests, Rr tends not to vary in relation to sex in this group. Finally, none of the memory scores differed across sexes for people with MCI or AD.
Despite co-varying in the analysis whether participants classified as AD positive according to a CSF p-tau/Ab42 ratio threshold of .038 (Van Hulle et al., 2021), we also ran post hoc regressions within the AD positive cohort only (see Supplementary information for full results). These extra analyses were consistent with what reported above. When also evaluating inclusion probabilities, Rr is the best predictor for all outcomes, except for Ab42, Ab42/Ab40, and NFL, as with the full sample. To note, as these analyses were based on a smaller sample, the findings should be interpreted with increased caution.
Unlike Bruno, Mueller, et al. (2021), who showed that Pr was predictive of amyloid load, as measured via Pittsburgh compound-B (PiB) positron emission tomography (PET), we did not observe an association between Pr and Ab42 or the Ab42/ Ab40 ratio. In this regards, we wish to make two observations. First, while the data were drawn in both cases from studies based at the University of Wisconsin e Madison, the actual samples were different: in the Bruno, Mueller, et al. (2021) paper, participants came from the Wisconsin Registry for Alzheimer's Prevention; these volunteers are generally younger and there is a higher proportion of cognitively unimpaired individuals, compared to WADRC. Second, in the Bruno, Mueller, et al. (2021) study, the outcome was discrete, based on relevant PiB PET cut-points, whereas in the present study we examined continuous CSF levels as outcomes: we have noted that CSF and PiB PET markers will sometimes show differential levels of sensitivity to different cognitive (process) scores, and we plan on pursuing this observation further in the near future.
Limitations of this research should be noted. First of all, the sample sizes for MCI-AD and AD are significantly smaller than for the cognitively unimpaired participants. Sample sizes were dictated by availability, and future research with larger groups of individuals with cognitive impairment, possibly also including dementia pathologies other than AD, would be ideal to further these research questions. A second limitation is that the present sample nearly exclusively comprised individuals that identified as white Caucasians. While this may be positive methodologically, as possible confounding variables related to race are limited, many studies have highlighted the importance of including a wider spectrum of ethnicities and backgrounds in AD research (Manly et al., 2021;Morris et al., 2019). As far as we are aware, at least with regards to published works, Rr to date has only been tested primarily in white Caucasian populations e future research should consider examining whether the same patterns observed here would also extend to a more heterogeneous sample.
To conclude, this study showed that Rr, the ratio between immediate and delayed performance scores at the recency position is applicable to story recall, and sensitive to CSF levels of Ab40, p-tau, t-tau, NFL, Ng and a-syn. Higher Rr scores, showing disproportionate loss of recency recall from immediate to delayed testing, were associated with worse biomarkers profiles, when controlling for age, diagnosis and APOE risk e and that the best predictors of biomarkers outcomes tended to be Rr combined with lower levels of immediate or delayed LMT performance. Therefore, we suggest the following: 1) Rr is a worthwhile measure to add to the clinician's battery (see also Egeland, 2021) when evaluating individuals suspected to be on a trajectory towards neurodegeneration; and 2) serial position values should be included in databases examining AD and other types of dementia. Future research should consider also comparing the relative predictive power of Rr when derived from word lists versus story recall; and examine whether the neurocognitive basis of Rr is different in word lists and story recall tasks.