Stroop task performance across the lifespan: High cognitive reserve in older age is associated with enhanced proactive and reactive interference control

Susceptibility to interference increases with age but there is large inter-individual variability in interference control in older adults due to a number of biological and environmental factors. The present study aims at analyzing behavior and ERPs in a Stroop interference task with increasing dif ﬁ culty in a sample of 246 young, middle-aged and healthy old participants. The old age group was divided into three subgroups based on performance scores. The results show a gradual performance reduction with increasing age and task dif ﬁ culty. However, old high performers reached a performance level comparable to middle-aged subjects. The contingent negative variation (CNV) re ﬂ ecting preparation and proactive task control and the target-locked P2/N2 complex associated with retrieval and implementation of S-R mappings during reactive task control were larger in the old high than low performers and similar to middle-aged or even young participants. High performance was limited to executive control tasks, while other cognitive functions were less affected. In addition, high performance was associated with higher level of education, usage of foreign languages and higher IQ. Thus, the performance differences in old age were discussed in the framework of cognitive reserve that constitutes individual differences in neural networks underlying task performance.


Introduction
Increasing age has been associated with deficits in cognitive functions (Salthouse, 2000). Specifically, executive functions orchestrating basic cognitive processes are vulnerable to aging (Salthouse et al., 2003;West, 1996). Executive functions are crucial for goal-directed behavior (Diamond, 2013). Two of the most important executive functions are response inhibition (i.e. inhibition of habitual, pre-potent responses) and interference control (selective attention toward relevant and away from irrelevant pieces of information). Inhibitory deficits and reduced interference control lead to increased conflict between competing stimuli and response representations (Lustig et al., 2001), and are particularly evident in older age (Aschenbrenner and Balota, 2015;Craik and Bialystok, 2006;Hasher and Zacks, 1988;Lustig et al., 2007;Verhaeghen and Cerella, 2002). Yet, a general inhibition deficit in older age has been questioned based on the results of meta-analyses (Rey-Mermet and Gade, 2018;Verhaeghen and De Meersman, 1998).

Sources of inter-and intraindividual differences in cognitive control in aging
There are large inter-individual differences in executive functioning in older age due to a number of biological (e.g. genetic) and environmental (e.g. lifestyle) factors constituting cognitive reserve (Stern, 2009;Whalley et al., 2004). The notion of cognitive reserve postulates that inter-individual differences in neural networks underlying task performance allow some people to perform better than others (Stern et al., 2018). This concept implies anatomic and functional variability at the level of brain networks and higher cognitive reserve may be due to enriched environment or cognitively stimulating aspects of life, such as education, physical activity, use of foreign languages, playing an instrument or training complex sensory-motor sequences by dancing -skills that are mediated by neuronal plasticity (Ballesteros et al., 2015;Greenwood and Parasuraman, 2010). High cognitive reserve also allows compensation for cognitive decline and dementia in older age (Stern, 2009). In sum, inter-individual differences in cognitive reserve accrue from complex interactions between genetics and environmental factors, representing widely stable individual properties in old age.
On the other hand, intra-individual variability (consistency) provides information about stability of cognitive functioning at different occasions within a person. Increasing behavioral fluctuations within an older individual suggests alterations in information processing (Hultsch et al., 2000(Hultsch et al., , 2002Kray et al., 2004;Rabbitt et al., 2001;West et al., 2002) and may reflect attenuated neuronal integrity, primarily in frontal brain areas that are most vulnerable to aging (Hultsch and MacDonald, 2004;West, 1996). The intra-individual variability in behavioral performance across the lifespan shows a U-shaped function with young adulthood associated with low and childhood and older age with high inconsistency. However, some older individuals show highly consistent performance similar to young adults, while others perform comparable to children (MacDonald et al., 2006). Some studies investigated intra-individual variability in speed performance as measured by individual standard deviations (ISD; Williams et al., 2005). ISD is generally larger in older than younger adults (Garrett et al., 2011;Hultsch et al., 2000Hultsch et al., , 2002, and larger under difficult task conditions (Gajewski et al., 2011. Inter-and intra-individual variability in interference processing has also been explained in terms of proactive and reactive mode of processing proposed in the dual-mechanisms of cognitive control framework (DMC) by Braver (2012). According to this model, proactive control relies upon the anticipation and prevention of interference before it occurs, whereas reactive control relates to the detection and resolution of interference after its occurrence. The DMC framework proposes that a change in situational factors or in individual strategy will result in shift of the weighting between proactive and reactive control settings in tasks with high control demands (Karayanidis et al., 2011). Situational factors induce intra-individual variability, whereas strategies or stable individual traits lead to inter-individual differences in the preference for proactive or reactive control mode of processing.
According to the DMC model, we ask whether high performing older adults show preserved proactive or rather reactive mode of control and which of both relates stronger to cognitive reserve.

Electrophysiology of cognitive reserve in executive control tasks
Interference control has frequently been measured using a color-word Stroop task (Stroop, 1935). In a computer-based version of this task, participants have to indicate either the meaning of presented color-words or its ink color, while ink color is either congruent (e.g., the word green is presented in green color) or incongruent with word meaning (e.g., the word green is presented in red color). Indicating the word meaning and ignoring the (incongruent) ink color reflects the low interference condition, whereas indicating an incongruent ink color requires inhibition of the overlearned response to read the word (high interference condition). Random intermixing of both tasks within an experimental block requires a trial by trial adjustment of cognitive control, which enhances task difficulty (e.g. Eppinger et al., 2007;Kalanthroff et al., 2015;Küper et al., 2017). The instruction can be varied in a block-wise manner or within a single block of trials.
Behavioral effects in executive tasks are usually accompanied by differences in event-related potentials (ERP). ERPs play an important role in distinguishing processing steps during task performance. Some of these processes, like interference control in the Stroop task, are impaired or delayed in older age (Mager et al., 2007;West and Alain, 2000) but there is also space for improvement or compensation. For example, the Contingent Negative Variation (CNV), a negative slow wave developing in advance to a critical event or response (Walter et al., 1964;Brunia and van Boxtel, 2001), varies with the effort invested in task preparation (Falkenstein et al., 2003;Wascher et al., 1996). The CNV is generally smaller in older than younger adults (Sterr and Dean, 2008;Wild-Wall and Falkenstein, 2010). However, an enhanced CNV for older compared to younger adults has also been observed under effortful task conditions, indicating that older adults may intensify preparation in order to maintain a reasonable level of performance (Berchicci et al., 2012;Kropotov et al., 2016;Wild-Wall et al., 2007).
There are also some hints for inter-individual performance differences in elderly associated with ERPs during target processing (Daffner et al., 2011;Getzmann et al., 2013;Hohnsbein et al., 1998;Th€ ones et al., 2018). However, as cross-sectional studies are less useful to evaluate compensatory or gain processes, it is important to consider randomized controlled trials (RCT) to track changes in performance and in the corresponding ERPs. To date, there exist a number of RCTs that offer some insights into the dynamics of behavioral gains and their electrophysiological correlates due to training regimes. Nearly all of them reported increased amplitudes and shorter latencies of different ERPs accompanying enhanced training-induced performance (Chen et al., 2019;Covey et al., 2019;Isbel et al., 2019;Pergher et al., 2018;Pozuelos et al., 2019). For example, our previous studies demonstrated that training-related performance gains in older age were accompanied mainly by an enhanced fronto-central N2 component that is associated with response selection and cognitive control in executive control tasks (Gajewski andFalkenstein, 2012, 2017;Küper et al., 2017, see also Ga al and Czigler, 2018and Olfers and Band, 2018 for similar results), an increased P2 in visual search (Wild-Wall et al., 2012), and a pronounced frontal P3 in working-memory tasks (Gajewski and Falkenstein, 2018a). Thus, it can be assumed that higher performance in executive control tasks is related to more efficient neural networks underlying task performance (Greenwood and Parasuraman, 2010;Pergher et al., 2019;Stern, 2009, Stern et al., 2018. Corresponding differences in CNV, P2 and N2 components may reflect the electrophysiological underpinning of cognitive reserve in terms of efficiency, capacity or flexibility. Within the framework of the DMC account, proactive control has been associated with sustained and anticipatory activation of the lateral prefrontal cortex (PFC). In contrast, the reactive control mode has been related to the activation of task-related action goals mediated by detection and resolution of interference in the anterior cingulate cortex (ACC). In line with this, the CNV as an index of proactive control was localized in frontal brain areas (G omez et al., 2003;Kropotov et al., 2016;Leynes et al., 1998;Wild-Wall et al., 2007), whereas the frontal-central N2 sourced in the ACC (Ullsperger and von Cramon, 2004;Van Veen and Carter, 2002) and has been interpreted as a correlate of reactive control (Colcombe et al., 2006) during response selection (Gajewski et al., 2008;Von Gunten et al., 2018;Yeung and Cohen, 2006).

The present study
The present study analyzes inter-and intra-individual differences in interference control in young, middle-aged and old participants using a Stroop task with three levels of difficulty (low, intermediate, and high interference). Task difficulty should increase the range of behavioral and ERP effects as some age-related effects are evident only if the tasks require high demands on the executive control system. Additionally, higher difficulty may enhance both inter-and intra-individual variability in performance.
The main aim of the present study is a systematic evaluation of behavioral and ERP effects between groups of older participants defined according to their general performance level and to compare old high performers with the younger participants to elucidate the neuronal underpinnings of cognitive reserve in older age. The findings will be interpreted within the dual mechanism of control framework. Additionally, some potential environmental and lifestyle factors contributing to the individual differences in cognitive reserve will be explored and discussed.
To this end, the sample of old participants was subdivided into low, mid and high performers based on the participants' inverse efficiency scores (IES; Townsend and Ashby, 1983). This procedure was additionally validated by the drift rate (v) obtained using a drift diffusion model (Ratcliff, 1978). Both parameters, IES and drift rate, reflect a compound of speed and accuracy. A particular focus of the study was the comparison between old high performers and middle-aged and young individuals to evaluate the real effect of aging not confounded with performance level. We administered a PC-based version of the Stroop task during EEG recording. Also, a paper version of the Stroop test has been administered to confirm validity of the group effects obtained in the computer-based Stroop test.
On the ERP level, we focused on preparatory activity as reflected in the CNV and the target-related P2/N2 complex associated with task-set retrieval and response selection that may explain performance differences between groups. This analysis may help to elucidate the relative contribution of proactive and reactive mode of cognitive control as proposed in the DMC account that has been used to explain intra-and inter-individual variability in cognitive performance.
Additionally, participants were tested with a number of standardized paper and pencil tests to document their cognitive status and evaluate group differences in further cognitive domains like processing speed, selective and sustained attention, short-term and working memory, verbal memory, and task switching. This aimed at investigating whether the obtained performance differences in elderly are specific to interference control or rather related to general cognitive ability.
Finally, we report potential relations to (sociodemographic) aspects, such as education, usage of foreign language, family status, occupation, IQ, personality, frequency of cognitive failures, quality of life and some life style factors, like physical activity etc. The data were used to explore the impact of life style and environmental factors on cognitive reserve.
We formulated the following hypotheses: 1. Interference control declines with increasing age and increasing task demands. Thus, we expect the largest performance impairment in the old group and in incongruent Stroop trials of the most demanding block that requires both word reading and color naming. 2. We predict substantial inter-individual variability in the distractibility of subjects in the older group. Accordingly, based on a general performance score, the old sample can be subdivided into three distinct performance groups of low, mid and high performers. 3. We hypothesize that performance and ERP activity (pronounced peak amplitudes) should be similar between old high performers and middle-aged or even young participants, reflecting high cognitive reserve in aging in some individuals, whereas performance decline and reduced ERP activity should be observed in others (old low performers), indicating low cognitive reserve. 4. Finally, according to the DMC model differences in the CNV would indicate that the group variability in performance is primarily due to differences in proactive control, whereas effects in the P2/N2 would indicate individual differences in reactive mode of processing.

Participants
A total of 246 healthy participants, without any neurological or psychiatric impairments participated in the study. The group of young adults consisted of 36 individuals (19-33 years; M: 25.1; SD: 2.7; 17 males, 19 females), the middle-aged group consisted of 58 participants (range: 40-53 years; M: 46.5; SD: 4.5, all male). 152 older individuals were about 70 years old (range: 65-88 years; M: 70.6; SD: 4.5; 59 males, 93 females). Possible dementia and mild cognitive impairment were assessed by using Mini Mental State Examination (MMSE; Folstein et al., 1975). No participant of the old group revealed a MMSE score lower than 25, suggesting normal cognitive functioning (mean: 28.6, SD: 1.4). The old participants reported taking medication against hypertension, thyroid hormones and cholesterol lowering drugs. 90.5% of the group of elderly were no-smokers, 6% smoked previously, and 3.5% were active smokers. All reported normal or corrected-to-normal vision.
The data were collected in three studies and used pre-test data of two training studies with old (n ¼ 152; Gajewski andFalkenstein, 2012, 2018) and middle-aged participants (n ¼ 58; Gajewski et al., 2017), as well as a study including young participants only (n ¼ 36; Gajewski and Falkenstein, 2014). The studies, in which the data were collected, were reviewed and approved by the Ethics committee of the Leibniz Research Centre of Working Environment and Human Factors, Dortmund, Germany. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The participants received payment for their participation.
2.2. Sociodemographic data, IQ, personality, lifestyle and quality of life According to the notion proposed by Stern et al. (2018) a key proxy for cognitive reserve is the IQ. Thus, we examined differences between the low,middle and high performers in crystallized intelligence assessed by the multiple-choice word-test (MWT-B; Lehrl, 1995). This test consists of 37 items and each item contains five words. One of these reflects a meaningful word, whereas the other verbally similar words are meaningless. The subjects are required to mark the correct word. The difficulty of items increases with increasing item number. The number of correctly identified meaningful words allows assessment of the IQ. The level of education was assessed by the highest graduation: No degree (1), Primary (2), Secondary general (3), Intermediate secondary (4), High school diploma (5). Physical activity in the past two years was measured by a standardized questionnaire (Lüdenscheid Activity Questionnaire, H€ oltke and Jakob, 2002). The current level of physical activity was measured by a bicycle ergometry by using the Physical Work Capacity (PWC-130 cycle test). The PWC-130 is a version of the PWC-150 test adapted for elderly individuals (Cambell et al., 2001). Furthermore, we obtained the "Big Five" personality factors using the NEO-FFI questionnaire (Costa and McCrae, 1992). The self-reported failures in daily attentional, memory and motor functions were assessed by the Cognitive Failures Questionnaire (CFQ; Broadbent et al., 1982). Additionally, the quality of life regarding physical and psychological health, social relationships and environment was assessed with the WHOQOL-BREF (World Health Organization, 1996). Anthropometric and sociodemographic data, like weight, height, family status, education, type of work, work life duration, medication, smoking history etc., were obtained using a self-constructed questionnaire. The usage of foreign languages was assessed by the question: "I am speaking a foreign language" with the possible answers: very often/sometimes/rarely/never/I don't know any.

Neuropsychological testing
The neuropsychological tests measured attentional endurance (d2; Brickenkamp, 1994), speed of processing and vigilance (Digit-Symbol-Test) and short-term and working memory (Digit-Span-Test), both of which are subtests of the Wechsler Adult Intelligence Scale (WAIS-III; Wechsler, 1997). Interference was measured using the classic Stroop color-word test (Stroop, 1935). The Verbal Learning and Memory Test (VLMT; Helmstaedter and Durwen, 1990) were used to measure immediate and delayed verbal memory. The Trail Making Test (TMT; Reitan, 1992) was administered to measure psychomotor speed of task switching (see Gajewski and Falkenstein, 2018b for detailed description of the tests). Neuropsychological tests were administered in an extra session, one day before the EEG session.

Stimuli and tasks
As the main task, a computer version of the color-word interference test was used. All stimuli were presented centrally on a computer screen (size: 17 00 , refresh rate: 100 Hz, resolution: 640 x 480 pixels) at a viewing distance of 60 cm. A fixation point (5 Â 5 mm) was presented before each stimulus and located in the centre of the monitor. Responses to each target stimulus were given by pressing one of four designated keys on a response box. The procedure of a single experimental trial is presented in Fig. 1.
The stimuli consisted of the words "red," "green," "yellow" and "blue" (5-7 mm wide x 10 mm high) presented on a black computer screen in one of the four colors. The color of the word displayed corresponded to its meaning in 50% of the trials (congruent), whereas in the remaining 50% of the trials word color and meaning were different (incongruent). A centrally presented diamond or a square (37 mm per side) served as cue stimuli. A diamond indicated the word reading task (low interference, block 1) when the participant's task was to indicate the meaning of the color word irrespective its ink color. A square indicated the color naming task (intermediate interference, block 2) and the task was to respond to the ink color the word was displayed in. In block 3, the cues varied randomly between the trials (high interference). In the ERP analysis, the cues were used for analyzing the CNV during the preparation period. Apart from the different cues indicating the relevant task, the setup of all blocks was identical.
Each trial started with the presentation of a cue. One thousand milliseconds after cue onset, the color word (target) was presented. The stimuli, cue and word, disappeared after the required button press had been given. A response had to be given within 2500 ms after word-onset. 500 ms after the response, a feedback was displayed for 500 ms. In case of a correct response, a plus sign was displayed, after an incorrect response, a minus sign was shown. After a delay of 300 ms, the next cue was presented. Thus, the response-cue interval (RCI) was 1300 ms and included the response-feedback delay and the feedback.
Responses were given by pressing one of four buttons corresponding to each of the four colors, which were mounted in a response box. The button for responding to the color red was placed at top left, greenbottom left, blue -bottom right and yellow top right (Fig. 1). The buttons were operated with the index and middle fingers of both hands. The stimulus-response mapping of the tasks was overlapping, e.g. the response button for a particular word meaning, i.e. red, in the wordreading task was the same as for the corresponding ink color, i.e. of the red-colored word green, in the color-naming task. The color-button assignment was the same for all participants.
Before each block, participants performed a practice block encompassing 16 trials, followed by the two test blocks of 52 trials each. The 3rd block including both tasks consisted of 146 trials. The frequency of each required response was equal (25%). Before each block, participants were given a written instruction that explained the task. The instruction encouraged both quick and accurate responses.

ERP recordings
EEG was recorded continuously from 32 scalp electrodes arranged according to the extended 10-20 system and mounted in an elastic cap. The montage included 8 midline sites, 12 sites over each hemisphere and two mastoid electrodes (M1 and M2). The electrodes had the following positions: C3, C4, CP3, CP4, CPz, Cz, F3, F4, F7, F8, FC3, FC4, FCz, Fp1, Fp2, Fpz, Fz, O1, O2, Oz, P3, P4, P7, P8, PO3, PO4, POz, Pz, T7, T8. Horizontal and vertical EOG was recorded using a bipolar montage from electrodes at both eyes. An amplifier and data acquisition software (ActiveTwo System, BioSemi, Amsterdam, Netherlands) were used. Electrode impedance was kept below 10 kΩ. The amplifier bandpass was set to 0.01-140 Hz. All signals were digitized with a sample rate of 2048 Hz. EEG was re-referenced offline to linked mastoids. Offline, the EEG was down-sampled to 1000 Hz and cut into cue-and target-locked epochs by using the software Vision Analyzer (Brain Products, Gilching). Epochs in which the amplitude exceeded AE 100 μV were rejected. ERPs were filtered digitally offline with a 17 Hz low and 0.05 Hz high pass filter. Eye movement artifacts were corrected using the algorithm of Gratton et al. (1983).

Data analysis
Neuropsychological tests were analyzed according to standard procedures described in the manuals. The Stroop interference index in the paper version of test was computed by subtracting the time to perform the color-naming list from the time to perform the interference list. Mean group differences were analyzed using a series of one-way and mixed ANOVAs with Bonferroni corrected post-hoc tests.
In the PC-based Stoop-task, the first trial of each test block, trials with responses faster than 100 ms or slower than the mean reaction times plus two standard deviations (RT þ 2SD) computed for each group and block separately, as well as error trials, were excluded from the analysis. For the individual standard deviation (ISD) analysis, no outliers were excluded to evaluate the whole range of RT variability undistorted by an artificial cut off of RT distribution. Mean RTs, ISDs, and error rates (ERRs) were subjected to an ANOVA design including the two withinsubject factors Block (1 vs. 2 vs. 3) and Congruity (congruent vs. incongruent) and the between-subject factor Group (YA: young adults vs. MA: middle-aged adults vs. OL: old low vs. OM: old middle vs. OH: old high). For each measure, mean and standard error of the mean are provided (M AE 1 S.E.M.).
The ERP analysis was restricted to the frontocentral midline electrodes as CNV, and P2/N2 amplitudes are known to be maximum at the midline. The mean CNV and N2 amplitudes were larger at FCz than at Cz. Thus, we used FCz only to reduce complexity of the data analysis. The terminal CNV was measured as the mean amplitude in the time range between 800 ms and 1000 ms after cue-onset relative to a 100 ms pre-cue baseline. The target-locked P2/N2 complex was analyzed as the difference in the peak amplitudes of the P2 and N2 allowing baseline free analysis. The target-locked P2 was measured as the maximum peak amplitude in the time range between 150 and 250 ms. The N2 was quantified as the most negative peak amplitude between 250 and 400 ms after target-onset at FCz. Time windows for the ERP analyses were selected based on visual inspection of the grand average waveforms and consistent with previous studies using the same paradigm in different populations Gajewski and Falkenstein, 2015a;Küper et al., 2017). The reliability of the automated peak detection was additionally controlled in a random sample of participants and conditions.
The CNV was subjected to an ANOVA with the within-subject factor Block (3) and the between-subject factor Group (5). Information about congruity was not available during the preparation phase. Thus, ERPs from congruent and incongruent trials were collapsed for the CNV analysis. The post target P2/N2 was subjected to an ANOVA including the within-subject factors Block (3) and Congruity (2), and the between- subject factor Group (5). Partial-eta 2 (ηp 2 ) is reported as a measure of effect size. Significance level was set at p < .05. The relationship between behavioral and electrophysiological parameters was assessed by correlation analyses using the Pearson-correlation coefficient (two-tailed). All post-hoc tests were Bonferroni corrected to avoid inflation of Type I error.
2.7. Classification of older low-, mid-and high-performing subjects The classification of older high-and low-performing subjects was based on their behavioral performance in the PC-based Stroop task. To this aim, a combined measure of response time (RT) and ratio of correct responses, i.e. accuracy (accuracy ¼ 1 -ERR) was used. The so-called inverse efficiency score (IES) was calculated for each condition and each participant by dividing the mean RT by accuracy x 100 (Townsend and Ashby, 1983). The IES reflects an overall performance index. RT and accuracy have been assumed to mirror different aspects of cognitive processing and the IES accounts for a possible trade-off between speed and accuracy, as it considers that a participant can either try to be fast, accurate, or moderately fast and accurate at the same time (e.g. Liesefeld and Janczyk, 2019). In case of a trade-off, speed and accuracy are negatively correlated. On the other hand, low performance is often expressed by both, high RT and low accuracy, due to uncertainty, attentional lapses and general performance fluctuations. In this case, speed and accuracy are positively correlated.
The group of older participants was subdivided into three performance groups based on the IES scores across all conditions: IES were computed for each block (1, 2, 3), Congruity condition (congruent, incongruent) and each subject. A sum of all six IES was ranked and divided into 3 equally large subgroups (tertiles: high, middle, and low performers). Validity of this procedure was proved by correlating the IES with the drift rate parameter of a drift diffusion model reflecting efficiency of information processing (Ratcliff, 1978, Fig. 2, right). The analysis was conducted using fast-dm-30 (Voss et al., 2015, see also Lerche et al., 2017, for further methodical and theoretical issues). Similarly to the strategy used in computing the total IES, the computation of drift-rate (v) included reaction times and accuracy data of all conditions and blocks. Thus, both RT and ACC dimensions result in a score defining general performance. The scores of both methods were highly correlated (r ¼ -.846), indicating that the IES method to subdivide the sample into performance groups was valid. Fig. 2 left illustrates individual performance (RT and ERR) in all groups (young adults (YA), middle-aged adults (MA), and old high (OH), old middle (OM) and old low (OL) performers in Block 3. Fig. 2 right shows the relationship between the total IES and the drift rate obtained by the diffusion model for each subject of the old group (Ratcliff, 1978).

Data and code availability statement
All data are available upon request.

IQ, education and lifestyle factors of the low-, middle-and high old performers
The sociodemographic data of the three performance groups are presented in Table 1.
The subsamples of elderly subjects with high, medium, and low performance did not differ significantly with respect to age, gender or MMSE score (all F's < 1).

Fig. 2.
Left: Scatter plot illustrating individual performance (RT and ERR) in all groups of participants in block 3 with randomized task instruction (high interference). Right: Scatter plot illustrating the relationship between total inverse efficiency score (IES) and the drift rate in the diffusion model across all conditions in the three performance groups of old participants. However, the performance groups differed with respect to IQ (OL: 111.7 vs. OM: 117.4 vs. OH: 119.6; F(2,150) ¼ 6.2, p < .005). Post-hoc tests showed differences between OH vs. OL (p < .001) and OM vs. OL, (p < .05), whereas no difference was observed between OH vs. OM (p > .05). Accordingly, group differences were observed in the level of education (F(2,150) ¼ 4.5, p < .05). The mean education for OH (3.8) was higher than for OL: 3.2 (p < .01), and marginally higher than for OM (3.4, p ¼ .053), whereas OM did not differ from OL (p > .05). Additionally, the frequency of using a foreign language was different between groups (F(2,146) ¼ 5.5, p < .005) with the most frequent usage in the OH vs. OL (p < .005) and OH vs. OM (p < .05) and no difference between OM and OL (p > .05).
No differences were found in the total duration of occupation (OH: 34, OM: 34, OL: 36 years, F < 1). This suggests that the three performance groups represent different levels of cognitive reserve as reflected in differences in education level, usage of other language and IQ that was not affected by the duration of working life.
No differences between performance groups were found regarding the current level of physical activity using the PWC-130 test, or regular physical activity in the last two years measured by the Lüdenscheid Activity Questionnaire or with respect to the Body Mass Index (all F's < 1), suggesting that the level of physical activity was not a crucial factor contributing to the performance differences.
There were no significant differences in the personality dimensions measured by the NEO-FFI. However, the CFQ measuring attentional and memory lapses in the daily life differed between the groups (F(2,150) ¼ 4.7, p < .01) showing less frequent absent-mindedness and slips of action in OH than OL (25 vs. 32; p < .005) and OM (30; p < .05).
Finally, the groups differed with respect to their quality of life in the dimension Physical Health (F(2,150) ¼ 6.2, p < .005) demonstrating higher scores in the OH: 16.6 than OL: 15.2 (p < .001). There was also a higher risk to live alone in the OL than in the OH group (p < .05).

Neuropsychological data
The descriptive results of the neuropsychological tests are shown in Table 2.
Almost all cognitive tests showed significant differences between the five age and performance groups (all p's < 0.005). The only exception was the part 2 of the Stroop test (color naming; F(4,240) ¼ 1.9, p ¼ .10).
Regarding the digit span task forward, Bonferroni corrected post-hoc comparisons revealed substantial group differences (all p's < 0.05) except for YA vs. OH, YA vs. OM, OM vs. OH, OM vs. OL (all p's > 0.05). Similar results were observed in the digit span task backward: performance differed between all age groups (all p's < 0.01). However, the three old subgroups did not differ from each other (all p's > 0.05).
The number of correct responses in the digit-symbol test showed clear differences between young and middle-aged adults and old groups (all p's < 0.0001) and a difference between OH and OL (p < .001), whereas no difference was found between OH vs. OM and OM vs. OL (both p's > 0.05).
Also, in the d2 test, the number of correctly indexed letters clearly differed between young participants and all remaining groups (all p's < 0.001). Performance of the middle-aged group did not differ from the OH group (p > .05). The OL group showed lower performance with respect to all groups (all p's < .05) except for the OM group (p > .05).
In part 1 of the Stroop test requiring reading of black color words, the performance of young adults did only differ from MA (p < .005) and OL (p < .001). No significant differences occurred between the three older performance groups (p > .05) except for OM vs. OL (p < .05). As mentioned above, no effects were found in part 2 of the Stroop test, requiring naming of colored fields. In contrast, substantial group differences were found for performance in the interference list (Stroop 3): young adults performed faster than all remaining groups (all p's < 0.001) except MA (p > .05). MA did not differ from OH (p > .05), but performed faster than OM and OL (p < .001). OL performance was lower than all other groups' (all p's < 0.001) except OM (p > .05). The same pattern occurred after computing the Stroop interference effect by subtracting part 2 from part 3.
The sum of the immediate recall of word lists in 5 subsequent trials of the VLMT showed better performance in young and middle-aged adults compared to all groups (all p's < 0.001), whereas the three old groups did not differ from each other (p's > 0.05). Delayed recognition of words in the VLMT was superior in the young group compared to all other groups (all p's < 0.001), whereas no further group differences were found (all p's > 0.05).
Finally, the analysis of the TMT-A yielded faster performance in YA compared to all groups of elderly (all p's < 0.001) and no significant difference compared to MA (p ¼ .06). OL were slower than MA and OH (both p's < 0.001) but did not show any difference to OM (p > .05). An even clearer pattern was observed in the TMT-B: YA were faster than all other groups (all p's < 0.001). MA did not differ relative to OH (p > .05) but OH differed from OM (p < .05). OL performed worse compared to all other groups (all p's < 0.001). Generally, the same pattern was found after subtracting TMT-A from TMT-B.
In sum, clear differences between the three performance groups were found in tasks requiring executive control, like Stroop and TMT-B, whereas (sub) tests measuring performance speed (Stroop 1 and 2, TMT-A), attentional functioning (d2, digit-symbol test), short-term and working memory (digit span), immediate and delayed verbal memory (VLMT) showed less consistent or no differences between the older performance groups.

Behavioral data
Generally, for the analysis of RTs, trials with incorrect responses (YA: 5.0%; MA: 7.0%; OH: 4.6%; OM: 7.4%; OL: 11.3%) and outliers with RTs shorter than 100 ms and longer than mean RT þ 2SD (YA: 5.9%; MA: 4.4%; OH: 0.7%; OM: 2.7%; OL: 8.6%) were discarded. Mean RTs, SDs and ERRs for congruent and incongruent trials in the word-reading, color-naming, and mixed block are presented in Figs. 3-5,    Bonferroni corrected post-hoc tests on differences between incongruentcongruent showed no significant group differences in block 1 with low interference (all p's > 0.05). Substantially larger interference in OL compared to all other groups was observed in Block 2 with moderate task difficulty (all p's < 0.0001), whereas accuracy in the mixed block (Block 3) was higher in YA vs. MA (p < .01), YA vs. OM (p < .0001), YA vs. OL (p < .0001) but no difference between YA vs. OH was found (p > .05). OH accuracy was higher compared to OM (p < .05) and all groups outperformed OL (all p's < 0.001).

ERP data
Grand averaged cue-and target-locked ERP-waveforms at FCz for the YA, MA, OH, OM and OL in block 1, block 2 and block 3 for congruent and incongruent trials are presented in Fig. 6. Topographical maps of the CNV and N2 are shown in Fig. 7 and mean amplitudes of the CNV and P2/ N2 complex at FCz are plotted in Figs. 8 and 9. 3.3.2.1. Cue-locked. In the preparatory interval, the effects of Block and Group collapsed across congruent and incongruent trials were analyzed (there was no a priori information about congruity in the preparation phase). Regarding the terminal CNV, the mean amplitude was larger at FCz than at Cz (À2.5 μV vs. À2.3 μV). The ANOVA revealed an effect of Block (F(2,482) ¼ 3.6, p < .05, ηp 2 ¼ 0.015), suggesting largest CNV in the Block 3 and a main effect of Group (F(1,241) ¼ 3.5, p < .01, ηp 2 ¼ 0.055).
Bonferroni corrected post-hoc tests yielded a larger (more negative) CNV in OH relative to OM and OL performers (both p's < 0.05). No differences were found between old high and middle-aged or young adults (all p's > 0.05). No interaction of Block x Group was found (F < 1).   Fig. 6, target stimuli evoked a prominent P2/N2 complex. The absolute amplitude of the N2 at was slightly larger at FCz (À0.4 μV) than Cz (À0.2 μV). The amplitude of the P2/N2 complex computed as peak difference N2-P2 was À6.8 μV at FCz and À5.5 μV at Cz.

Correlation CNV vs. P2/N2
In order to analyze the relationship between preparatory efficiency and retrieval and response selection processes indicated by the P2/N2, Fig. 6. Grand averaged ERP to cue and targets for all groups in congruent (left) and incongruent trials (right) in block 1 (low interference), block 2 (moderate interference), block 3 (high interference). Time point À1.0 s reflects cue onset, time point 0 s reflects target-onset. Fig. 7. Topographical maps (current source densities) of the CNV in the time range -100 -0 ms and the N2 at the maximum peak in the young, middle-aged and old groups. P.D. Gajewski et al. NeuroImage 207 (2020) 116430 correlation analyses were conducted for each group and condition. No significant correlations were found in YA, MA, OM and OL. However, the high performing elderly showed a significant correlation between the CNV and P2/N2 in the congruent and incongruent trials in block 1 (both r's ¼ 0.28; p < .05) and even stronger in block 2 (r ¼ 0.40; p < .005 and r ¼ 0.51; p < .001), whereas no relationship was found in block 3.

Validity check of the tests
To ensure the validity of the interference score, a correlational analysis for the interference indices of the paper-and-pencil and PC-version of the Stroop test was conducted in the whole sample. The interference index of the paper-and-pencil version of the Stroop test computed by subtracting the performance time of Stroop part 2 from Stroop part 3 was related to the differences between incongruent and congruent trials in RTs, ERRs and IEs in Block 2(RT: r ¼ 0.38; p < .001; ERR: r ¼ 0.34; p < .001; IE: r ¼ 0.29; p < .001) and Block 3 (RT: r ¼ 0.37; p < .001; ERR: r ¼ 0.28; p < .001; IE: r ¼ 0.18; p < .005).
The consistent correlations between both versions of the Stroop task indicate that both tasks measure the same construct.

Discussion
The present study aimed at evaluating interference control across the lifespan using the Stroop task. Yet, the main aim was an analysis of mechanisms underlying inter-individual performance differences in older age that may constitute cognitive reserve. To this end, a sample of older participants was subdivided in three groups according to the subjects' general performance level. These groups differed additionally with respect to their level of education, frequency of using foreign language and IQ considered as a proxy for cognitive reserve (Stern et al., 2018). This suggests that the three old groups do indeed represent different levels of cognitive reserve independently of personality traits, physical fitness, quality of life or the total duration of occupation, which did not differ between the three groups. Additionally, high performers showed higher physical health. High performers were living more frequently with a partner or family, and reported lower attentional and memory lapses, absent-mindedness and slips of action than old low performers.
Three blocks of Stroop trials with increasing difficulty were applied during EEG recording. The results show performance decline in terms of slower response, increased intra-individual variability of speed as well as reduced response accuracy with increasing age and task difficulty. Old high performers reached a performance level comparable to middle-aged or even young adults. In contrast, old low performers showed a disproportional decline in executive control. The results were supported by a paper & pencil version of Stroop test and other psychometric tests that required executive control but no such differences were observed in simple choice-reaction and memory tests.  . Target-locked P2/N2 amplitude and standard errors assessed by the difference between N2 and P2 peaks. P.D. Gajewski et al. NeuroImage 207 (2020) 116430 Overall, the behavioral data show that cognitive aging is related to slower information processing (e.g. Salthouse, 2000), increased intra-individual variability of speed, supporting the notion of impaired neural integrity (Hultsch et al., 2008), as well as to qualitative performance decline as indicated by impaired response accuracy (Starns and Ratcliff, 2010). These parameters were less affected in old high performers. The main question was related to the electrophysiological underpinnings of these behavioral effects. The terminal contingent negative variation (CNV) indicating task preparation revealed larger amplitudes in old high performers compared to the other groups of older subjects. Moreover, the target-locked P2/N2 complex associated with retrieval and implementation of S-R mappings was substantially larger in the old high than old low performers. Additionally, some conditions showed even larger P2/N2 amplitudes in old high performers than in middle-aged or young individuals.
It is important to note that the larger amplitudes of the CNV and P2/ N2 in the high performing group of older participants occurred in all blocks and conditions, corroborating previous results using a dual-task paradigm (Th€ ones et al., 2018). This suggests that old high performers engaged generally more cognitive resources than the other groups of elderly to perform the task, supporting the notion of a task invariant network constituting cognitive reserve (Stern et al., 2018).
This pattern of behavioral and ERP results is in line with our predictions. First, interference control assessed by speed, intra-individual variability of speed (consistency) and accuracy declines with increasing age and task difficulty. Second, older age is characterized by large interindividual variability (diversity) in performance. Third, we hypothesized that old high performers should show enhanced processing efficiency as reflected in increased amplitudes of the pre-target CNV and/or the targetlocked P2/N2 complex, especially with increasing task difficulty, where the executive demands are high. This hypothesis was derived from previous results from our and other labs showing enlarged ERPs associated with superior performance in older age both in cross-sectional and randomized controlled studies (De Sanctis et al., 2009;Ga al and Czigler, 2018;, 2015a, 2018a2018b;Getzmann et al., 2013;Küper et al., 2017;Lubitz et al., 2017;Olfers and Band, 2018;Th€ ones et al., 2018;Wild-Wall et al., 2007. The final hypothesis was related to predominant mode of cognitive control. As we observed pronounced CNV and P2/N2 components, we conclude that proactive (CNV) as well as reactive control mode (P2/N2) were enhanced in old high performers. Moreover, the CNV and P2/N2 amplitudes were correlated in the old high performers. This suggests that the additional effort to prepare for the upcoming task in advance may intensify target-related processing. On the other hand, failure of advanced preparation may postpone task processing beyond target onset. This is in line with the shift of proactive to reactive mode of processing proposed in the dual-mechanisms of cognitive control framework by Braver (2012). The finding that the CNV and P2/N2 amplitudes were correlated in high performers suggests that proactive and reactive modes of control are not independent and reactive control may be gained by mobilization of resources underlying proactive control (Karayanidis et al., 2011). The enhancement of cognitive resources in advance as reflected in the CNV may pre-activate (or in more functional terms may lower the activation threshold) the relevant stimulus-response sets maintained in working memory as reflected in a pronounced P2 that in turn facilitates selection and implementation of the correct response as reflected in the N2.
The crucial question is what are the functional underpinnings of cognitive reserve in older age and what can the present study contribute to this understanding?
Previous studies indicated that large inter-individual variability in cognitive functioning is not only due to genetic dispositions but to large extent due to environmental and lifestyle factors and that cognitive reserve protects against negative effects of aging and cognitive decline (Ballesteros et al., 2015;Beydoun et al., 2014;Gajewski and Falkenstein, 2015b;Hertzog et al., 2009;Stern, 2009). Functional and structural models suggest that compensatory processing is the key for understanding inter-individual performance differences in elderly. For example, Dennis and Cabeza (2008) and Park and Reuter Lorenz (2009) propose that hemispheric asymmetry is reduced in older age (called HAROLD effect: hemispheric asymmetry reduction in older adults), leading to cognitive deficits. Successful compensation is based on greater recruitment of additional areas in both brain hemispheres that led to asymmetry reduction. A similar account has emphasized the posterior-anterior shift in aging (PASA; Davis et al., 2007;Dennis and Cabeza, 2008) and proposes that older individuals compensate activation deficits in parietal brain areas by recruitment of frontal areas to enhance cognitive resources for successful task performance. Also, some ERP studies showed increased compensatory activity in frontal brain areas in healthy elderly or in patients with cognitive deficits (Angel et al., 2010a(Angel et al., , 2010bCovey et al., 2017;Moussard et al., 2016). However, both models are still under debate as some studies reported contradictive activation pattern associated with preserved performance (Logan et al., 2002;Fernandez-Ruiz et al., 2018).
Moreover, age-invariant behavior observed in fMRI studies in tasks such as conceptual repetition priming might be the result of a more sustained neural processing of stimuli in older adults compared to young adults. This finding has been interpreted as a form of compensatory neural activity (Ballesteros et al., 2013;Bergerbest et al., 2009;Daselaar et al., 2005). This sustained neural processing seems to act as a compensatory mechanism used by older adults to achieve the same level of cognitive performance as younger adults.
Apart from the idea of compensatory brain activity, an alternative approach to explain inter-individual differences in performance in older age are long-term structural, functional and morphological differences of brains constituting cognitive reserve. This means that the correlate of superior performance in elderly are general differences in gray matter volume, synaptic density and brain metabolism, whereas to lesser extend the recruitment of additional brain areas. There are several lines of evidence supporting this notion. On the one hand, genetic polymorphisms affect brain metabolism and ERP activity in executive control due to inter-individual differences in the availability of neurotrophins in the brain that stimulate neuro-and synaptogenesis (e.g. Egan et al., 2003;Gajewski et al., 2011;Getzmann et al., 2013), which may affect Stroop distractibility and ERP activity of the reactive mode of control . Furthermore, pro-inflammatory activity (Gajewski et al., 2013) or even infectious diseases like latent Toxoplasma gondii (Gajewski et al., 2016) may affect brain activity and executive control in old age.
On the other hand, lifestyle factors, like habitual physical activity, bilingualism, dancing or cognitive enhancement may produce similar effects on executive functions as genetic factors. Higher performance in physically active individuals in executive control tasks is related to increased blood perfusion and neurogenesis induced by additional release of neurotrophins (Curlik and Shors, 2013;Gomez-Pinilla and Hillman, 2013). This is in line with fMRI studies that found enhanced activity in prefrontal and frontal brain areas in physically more active older individuals compared to sedentary ones using different executive control tasks (Colcombe and Kramer, 2003;Colcombe et al., 2004;Colcombe et al., 2006;Erickson et al., 2010Erickson et al., , 2011Kramer et al., 2004;Erickson, 2007Hayes et al., 2013;Voelcker-Rehage and Niemann, 2013, for reviews) including the Stroop task Weinstein et al., 2012). Accordingly, long-term physical activity was associated with reduced Stroop interference and larger ERP in seniors (Gajewski and Falkenstein, 2015a). Moreover, there are additional sources of inter-individual variability in executive control. For example, enhanced cognitive performance accompanied by enlarged target-related ERP in executive control tasks was observed after cognitive training compared to control groups Küper et al., 2017) and even in relation to long-term flexible vs. repetitive work (Gajewski et al., 2010). Furthermore, Kramer et al. (2004) provides evidence for greater synaptic density and more complex brain networks in higher educated individuals. More recently, Pergher et al. (2018Pergher et al. ( , 2019 has demonstrated an association between education, performance in a working memory task, P3 amplitude, and gray matter volume in older age. In line with this, our high and low performing participants substantially differed with respect to their level of education, usage of foreign language and IQ. Similar effects were reported in word-stem priming and word-stem recall studies in high-performing older adults with a high educational level (e.g., Osorio et al., 2009Osorio et al., , 2010. Thus, it could be assumed that cognitively stimulating aspects of life experience play a protective role against cognitive decline and/or support cognitive reserve in older age, whereas the total duration of occupation or personality traits are not the key factors for the observed performance differences. The same is true for the level of physical fitness, at least in our sample. In contrast, it seems that living with a partner or family, as well as good physical health have an impact but the precise contribution of these factors to cognitive reserve require confirmation by future studies. Taking together, there are several factors constituting cognitive reserve in older age, which contributes to maintaining a high level of performance in executive tasks. In particular, there are some ways to improve cognitive control and these gains may be mediated or even amplified by a number of biological factors, which may work in complex interdependence (Ballesteros et al., 2015;Cesp on et al., 2018;Gajewski andFalkenstein, 2015b, 2018b;Greenwood and Parasuraman, 2010;Hertzog et al., 2009;Stern, 2009).

Limitations and future directions
This study has some limitations that should be acknowledged. First, we were not able to systematically assess all factors that may contribute to the performance differences between the groups of older adults. This is due to a cross-sectional design of the study and correlational analyses do not imply causation. Second, this study did not focus on evaluation of different functional models underlying inter-individual differences in cognitive control like dynamic compensatory processing (HAROLD or PASA) or structural, metabolic or morphological changes of the brain. Instead, the aim was to analyze functional mechanisms of performance differences in older age and to focus on well-known ERP components associated with proactive and reactive control during interference processing. A combination of different imaging methods using the same experimental paradigm may offer the possibility to investigate specific neural mechanisms underlying cognitive reserve in more detail.
Moreover, long-term studies with follow-up measures would be necessary to track changes of biological and lifestyle factors across a longer time period like decades and to evaluate the corresponding changes of cognitive functions and ERP activity. Finally, a crucial aspect that should be addressed in future studies in the area of cognitive aging and prevention of mild cognitive impairment and dementia is to assess the relative contribution of the factors promoting healthy aging and to investigate the interaction between biological and lifestyle factors to understand the mechanisms of cognitive reserve in older age.

Conclusions
This ERP study involving a large sample of young, middle-aged and old individuals indicates that successful interference control in older age is related to efficient proactive (assessed by the CNV) and reactive (assessed by the P2/N2) mode of cognitive control, facilitating a performance level usually observed in younger adults. This suggests that the proposed concept of cognitive reserve in older age has a neural substrate in terms of enhanced brain activity involved in executive control tasks.

Declaration of competing interest
None of the authors had any financial or other conflict of interest regarding the methods and content of this paper.