P3b amplitude as a signature of cognitive decline in the older population: An EEG study enhanced by Functional Source Separation

&NA; With the greying population, it is increasingly necessary to establish robust and individualized markers of cognitive decline. This requires the combination of well‐established neural mechanisms, and the development of increasingly sensitive methodologies. The P300 event‐related potential (ERP) has been one of the most heavily investigated neural markers of attention and cognition, and studies have reliably shown that changes in the amplitude and latency of the P300 ERP index the process of aging. However, it is still not clear whether either the P3a or P3b sub‐components additionally index levels of cognitive impairment. Here, we used a traditional visual three‐stimulus oddball paradigm to investigate both the P3a and P3b ERP components in sixteen young and thirty‐four healthy elderly individuals with varying degrees of cognitive ability. EEG data extraction was enhanced through the use of a novel signal processing method called Functional Source Separation (FSS) that increases signal‐to‐noise ratio by using a weighted sum of all electrodes rather than relying on a single, or a small sub‐set, of EEG channels. Whilst clear differences in both the P3a and P3b ERPs were seen between young and elderly groups, only P3b amplitude differentiated older people with low memory performance relative to IQ from those with consistent memory and IQ. A machine learning analysis showed that P3b amplitude (derived from FSS analysis) could accurately categorise high and low performing elderly individuals (78% accuracy). A comparison of Bayes Factors found that differences in cognitive decline within the elderly group were 87 times more likely to be detected using FSS compared to the best performing single electrode (Cz). In conclusion, we propose that P3b amplitude could be a sensitive marker of early, age‐independent, episodic memory dysfunction within a healthy older population. In addition, we advocate for the use of more advanced signal processing methods, such as FSS, for detecting subtle neural changes in clinical populations. Graphical abstract Topographic and functional behaviours differences between P3a and P3b: a comparison between channels and source space. ERPs and topographic maps for the three groups (Young vs. HP vs. LP) on Cz and Fz channels and FSP300. Top Panel (Functional Source Space) – Blue, magenta and red lines indicate FSP3a and green, cyan and brown lines indicate FSP3b for Young, HP and LP groups respectively. Last right column represents the superimposition of the P3a and P3b in the three groups. Bottom Panel (Channel Space) – Grey lines indicate the butterfly representation of all the EEG channels. Blue, magenta and red lines indicate CzP3a selected channel and green, cyan and brown lines indicate FzP3b selected channel for Young, HP and LP groups respectively. The black circle on the topographic map represent Cz and Fz channel positions. Figure. No caption available. HighlightsBoth P3a and P3b peak latency increased, and peak amplitude decreased, with age.Only P3b amplitude discriminated early episodic memory dysfunction in older individuals.Differences in cognitive decline were 87 times more likely to be detected using FSS.FSS P3b produced the highest classification accuracy (78%) of elderly cognitive decline.


Introduction
The P300 was one of the first event-related potentials (ERPs) to be reported (Sutton et al., 1965), and its discovery is largely credited for the application of EEG methods to cognitive neuroscience (Polich, 2012). The P300 ERP presents as a large positive waveform with a centro-parietal topography peaking approximately 300 ms after stimulus onset. Although the term P300 (or P3) is often used in the literature, it is important to clarify that the P300 encompasses at least two distinct subcomponents; the P3a and P3b (Squires et al., 1975). Whilst the P3b has a parietal topography, the P3a presents with a more anterior focus peaking around central electrodes most likely indicating that the P3a and P3b ERP components are underpinned by distinct neural sources (Linden, 2005;Polich, 2003). In addition, the morphology of P3a and P3b ERPs also differ. Whilst the peak of the P3a is relatively consistent across trials, the P3b peak is response-locked and as such it changes from trial-to-trial depending on the speed of the response (Hillyard et al., 1971;Squires et al., 1973). When P3a and P3b trials are averaged across subjects, this often means the grand average P3a presents with a clear peak around 300 ms (due to the consistent single trial peaks), whereas the P3b has a wide spread morphology ranging between 300 and 600 ms. These differences in scalp topography and ERP morphology suggest that the P3a and P3b might sub-serve different cognitive processes (Linden, 2005;Polich, 2007Polich, , 2003. The P3a is often driven by rare non-target (distractor) stimuli, and as such it is believed that the P3a ERP represents stimulus-driven frontal attention mechanisms (Linden, 2005;Polich, 2007). The P3b is often elicited by target detection paradigms (most famously the oddball paradigm), however there is still ongoing debate about whether the P3b encodes allocation of attentional resources (Nieuwenhuis et al., 2011), context and memory updating (Donchin and Coles, 1988;Polich, 2007), the uncertainty of the stimulus (Mars et al., 2008), or the accumulation of sensory evidence .
Along with being one of the most investigated ERPs in typical populations, the P300 has proven to be a useful marker of clinical pathology. Perturbations of the P300 have been shown in ADHD, Autism, Depression, Obsessive Compulsive Disorder, and Parkinson's disease to name but a few (Polich and Criado, 2006). However, the largest number of P300 studies have focussed on how the P300 ERP changes with age and whether this might be a marker of cognitive decline seen in Mild Cognitive Impairment (MCI) and Alzheimer's disease (AD) (Rossini et al., 2007). Polich (1997) previously showed that increases in P300 latency (which we assume to be the P3b based on the paradigm) and decreases in P300 amplitude correlate with age (20-80yrs; n ¼ 120, 10 male and 10 female per decade). Walhoovd and Fjell (Walhovd and Fjell, 2001) expanded on this by using a combination of two-and three-stimulus oddball paradigms to distinguish the P3a and P3b ERPs. They similarly found that both P3a and P3b peak latency increase with age and P3a and P3b peak amplitude decrease with age. Studies by Polich and Corey-Bloom (2005) and Van Deursen et al. (van Deursen et al., 2009) have shown that the amplitude of P300 responses further decreased whilst peak latency further increased in both MCI and AD compared to matched elderly controls.
Compensatory neural mechanisms that occur with aging, along with increasing frontalization of brain responses, have been formalized in the Compensation-Related Utilization of Neural Circuits Hypothesis (CRUNCH; (Reuter-Lorenz and Cappell, 2008)) and Posterior-Anterior Shift in Aging (PASA; (Davis et al., 2008)) models respectively. Some studies have suggested that this increased frontal activity in elderly individuals is important for task performance (Davis et al., 2008;Goh and Park, 2009), however studies by Fabiani et al. (Fabiani et al., 1998) and West et al. (2010) have also shown that elderly individuals who show a larger frontal P300 response perform more poorly on neuropsychological tests of executive functions. Although it is not clear whether the increased frontal activity that occurs with aging is a help or a hindrance, it is clear that understanding age-related changes and cognitive decline requires investigating both frontal and parietal mechanisms in parallel rather than just exploring single electrodes in isolation.
EEG source reconstruction suffers from the fact that multiple potential source configurations result in the same measured EEG potentials. This issue is especially problematic when a very low number of electrodes are used, which is often the case in clinical applications. For this reason, EEG studies investigating the P300 in aging, MCI, and AD have avoided source localization and relied upon one single electrode or a localised subset of electrodes. However, the generators of EEG activity cannot be reliably inferred on the basis of a-priori selected single channels, or a limited group of channels, due to the electric/magnetic field propagation problem (Rusiniak et al., 2013;Siegel et al., 2012). Moreover, using information coming from only one electrode can be misleading especially when the activated network is spread among the entire scalp, as in the case of the P300. As such, this traditional approach of using a limited number of electrodes to investigate age-related changes may lack the sensitivity to detect the neural changes underpinning cognitive decline and thus only pick up on the much clearer age-related changes in ERP activity. In order to overcome this issue, we propose using a novel signal processing method -Functional Source Separation (FSS - (Porcaro and Tecchio, 2014;Porcaro et al., 2009aPorcaro et al., , 2008Tecchio et al., 2007). FSS overcomes an important limitation of most previous studies by using the best combination of all available electrodes to detect the P300 generators. This approach has already been applied to extract EEG specific features for primary motor , primary sensory (Porcaro et al., 2009a,b;, and primary visual (Porcaro et al., , 2010 areas. We believe this approach will provide much richer information than traditional EEG approaches, especially given the larger network of brain regions involved in producing the P3a and P3b responses. Here for the first time, we have used FSS as a tool to investigate the P3a and P3b ERP responses in young and older individuals. Here we propose two complementary hypotheses: 1) differences in P300 ERP responses will correspond with individual differences in cognitive decline (based on neuropsychological measurements) in the older population, and 2) the FSS approach (utilizing a weighted combination of all electrodes) will be more sensitive to these differences than traditional approaches based on individual electrodes.

Participants
Sixteen young (18-28 years old; mean age, 22.4 AE 3.28 years) and thirty-four elderly individuals (65-78 years old; mean age, 70.22 AE 4.31 years) were recruited for this study. Data from two elderly participants had to be discarded due to technical problems and poor EEG data quality, and one young and one additional elderly participant were excluded for poor performance on the task (d-prime <1). Table 1 shows demographic details for young and old participants. All participants were right-handed according to the Edinburgh Handedness Questionnaire (Oldfield, 1971). Participants gave written informed consent before the study, which was approved by the Trinity College Dublin School of Psychology Ethics Committee. On a day separate from the EEG testing (elderly group: 163.43 AE 118 days; young group: 161 AE 81 days), participants also underwent a neuropsychological battery consisting of the Mini Mental State Examination (MMSE; (Folstein et al., 1975)), the National Adult Reading Test (NART; estimate of intelligence; (Nelson, 1982)), the Stroop test, category fluency (animal), the Logical Memory subtest of the Wechsler Memory Scale III (WMS; (Wechsler, 1997)), and the Hospital Anxiety and Depression Scale (Zigmond and Snaith, 1983). Participants who scored more than 8 on either the anxiety or depression subscales of the Hospital Anxiety and Depression Scale were excluded from the study. Participants were not taking any psychiatric or neurological medications at the time of testing.
We subdivided old participants into cognitively high performing (HP) and low performing (LP) individuals as suggested by previous research (Dockree et al., 2015). To do so, logical memory delayed recall scores and NART estimated IQ scores were converted into Z scores. LP individuals were identified as having a logical memory score one standard deviation below their NART estimated IQ scores. This approach of baselining memory scores using NART IQ has been shown to be more sensitive than using raw values to categorize individuals (Dockree et al., 2015).

EEG recording
EEG recordings were acquired with a 32-channel BrainAmp system (BrainProducts, Munich, Germany). Thirty-three EEG electrodes were placed on the scalp, including the reference electrode positioned at FCz and the ground electrode placed at position AFz. One external electrode was applied to the subjects back to acquire the electrocardiogram (ECG). Electrode impedances were maintained at less than 10 kΩ. Data were recorded on a laptop computer using Brain Recorder v1.04 software (BrainProducts, Munich, Germany) at a sampling rate of 5 kHz with a band-pass filter of 0.016-250 Hz.

Experimental procedure
Participants performed a 3-stimulus visual oddball task previously described in O'Connell et al. (2012a). Every 2075 ms a stimulus appeared on the screen for 75 ms. Standard stimuli consisted of a 3.5-cm diameter purple circle and appeared on 80% of trials. Target stimuli were a slightly larger purple circle (4 cm diameter) and appeared on 10% of trials. Distractor stimuli were a black and white checkerboard and appeared on 10% of trials. Participants were asked to make a speeded response to target stimuli using the response box placed in their right hand. The stimulus array was pseudo randomly designed such that between 3 and 5 standard stimuli were presented after any target or distractor stimulus. Thus, the minimum interval between a P300 eliciting events was 8300 ms but because targets and distractors were randomly interspersed throughout the task the interval between 2 targets or between 2 distractors could be as high as 64 s. The average was approximately 20 s, and approximately 70% of trials occurred between 8 and 22 s. All stimuli were presented on a grey background and participants were asked to maintain fixation on a white cross-presented at the centre of the screen.

EEG pre-processing
The data were re-referenced to common average and down-sampled to 512 Hz. The data were bandpass filtered (0.5-30 Hz) prior to the offline analysis. Whilst it has been argued that a highpass filter > 1Hz is preferable for ICA, this filter setting significantly reduces the amplitude of the P300 ERP, which has an optimal highpass filter of 0.1Hz (Kappenman & Luck, 2010). For this reason, a highpass filter setting of 0.5Hz was chosen, as in previous studies (Murphy et al., 2011). The first step of the analysis we have employed is a semiautomatic independent component analysis (ICA)-based procedure to identify and remove cardiac and/or ocular artefacts without rejecting the contaminated epochs (Barbati et al., 2004;Porcaro et al., 2015).

Functional source separation
Functional source separation (FSS - , Porcaro et al., 2009aPorcaro et al., 2011Porcaro et al., , 2010Tecchio et al., 2007) is a semi-blind source separation method (Porcaro and Tecchio, 2014) which uses some well-known distinctive features of electrophysiological signals. The aim of FSS is to enhance the separation of relevant signals by exploiting a priori knowledge without renouncing the advantages of using only information contained in original signal waveforms. FSS, analogous to ICA, models the set of EEG recorded signals x as a linear combination of an equal number of sources s via a mixing matrix A. Differing from other constrained ICA models (Khan et al., 2012;Lu and Rajapakse, 2005;Wang and James, 2007), FSS identifies a single source at a time, building a contrast function for that source that exploits fingerprint information associated to the neuronal pool to be identified (Porcaro and Tecchio, 2014). In general, FSS starts from the original EEG data matrix x for each source, and returns one functional source (FS) with the required functional property. This scheme gives us the possibility to extract the FS that maximizes the functional behaviour in agreement with the functional constraint (Porcaro and Tecchio, 2014). A modified cost function (with respect to standard ICA) is defined as: F ¼ JþλR where J is the statistical constraint normally used in ICA, while R accounts for the a priori information known about the sources. The relative weight of these two parameters can be adjusted via λ ( ) -Appendix A). λ has been chosen to both minimize computational time and maximize the functional constraint R. Moreover, the FSS contrast function F is optimized by means of simulated annealing (Kirkpatrick et al., 1983), thus allowing prior information about the FS to be described by a non-differentiable function.
Our study aimed to investigate the activity on P300 in both young and old participants. Thus, we first identified the functional source underlying the P300 processes maximizing the P300 response (around the following time window [320-480 ms] for both young and old participants) named FS P300 . The functional constraints used were defined as follows: jEAðtÞj with the evoked activity, EA, computed by averaging signal epochs of the source FS P300 , triggered on the visual stimulation (t ¼ 0); t 1 is the lower bound (320 ms) and t 2 is the upper bound (480 ms) of the window chosen to maximize the temporal fingerprint of the source. The baseline was computed in the time interval from À500 to 0 ms.

Functional sources behaviour
Once the sources (FS P300 ) were extracted, ERP analysis and topographic distribution were calculated in both young and old participants to characterize and validate the source extracted. In particular, ERP Table 1 Participant demographics for young compared to elderly individuals. T values were generated using Students two sample t-test (no variables showed a significant difference in Levene's test of equality of variance). Significant differences between young and old individuals are highlighted in bold (p < 0.05). Effects size (Cohen's d) and Bayes Factor are also reported to highlight the magnitude of the effect. Cohen's d values between 0.2 and 0.5 are considered small, 0.5-0.8 medium, and greater than 0.8 are considered large. Bayes Factors between 1 and 3 are considered anecdotal, 3-10 moderate, and greater than 10 are considered strong. analysis was performed on P3a and P3b to investigate different behaviours in the two groups (Young vs. Old) and 'High Performing' vs. 'Low Performing' (HP vs. LP) Elderly groups. Target trials (P3b) and distractor trails (P3a) were epoched (À500 to 1000 ms) separately and were baseline corrected for the interval from À500 to 0 ms. For each subject, single trails were averaged excluding error trials (i.e. missing response for Target trials, or a response for Distractor trials). We performed pointwise statistical analysis on the averaged target (P3b) and distractor (P3a) waveforms conducting two-sample permutation t-tests (10,000 permutations) on every point of the ERP waveform (0-1000 ms). We used false discovery rate (FDR) to correct for multiple comparisons. In addition, peak amplitude and latency values were extracted and analysed using a combination of frequentist and Bayesian approaches. These analyses were conducted in JASP (The JASP Team, 2017) v0.8.2 -https:// jasp-stats.org).

Support vector machine (SVM) learning
We trained separate sets of classifiers to predict from EEG data whether an individual belonged to the Young vs HP, Young vs LP, or HP vs LP groups. Each set of classifiers used peak amplitude and latency as the only features, but these features were derived from either the P3a or P3b ERP which were extracted from (i) FSS data, or from the single electrodes (ii) Fz, (iii) Cz, or (iv) Pz. In total we implemented eight separate sets of classifiers (P3a vs P3b x FSS vs Fz vs Cz vs Pz) using linear support vector machines (SVM, default parameters of Matlab R2016a). Accuracies were determined using a leave-one-subject-out cross-validation scheme. For example, for the LP versus HP classification we trained the classifier on a test dataset consisting of thirty participants and validated it on one participant who has been left out. This procedure was repeated 31 times (i.e. leaving out each of the participants) and the overall accuracy was calculated as the percentage of participants which were accurately classified during the validation step. Feature permutation was used to assess the significance of the classifier performance under the null-hypothesis of independence between features and labels. The pvalue was calculated as the fraction of classification accuracies from the permuted dataset larger than accuracies obtained from the original dataset, here based on 1000 such permutations (Ojala and Garriga, 2009). Subsequently, all p-values were corrected for multiple comparisons using the false discovery rate to account for the fact that we used multiple classifiers. Table 1 summarizes demographic information, results from neuropsychological tests and the oddball task performance for young and elderly individuals. In all cases frequentist and Bayesian two sample ttests were performed to establish group differences. Young subjects had significantly more years of education and performed better on the MMSE while old individuals performed significantly better on the NART compared to young individuals. However, a complementary Bayesian analysis found that these results were in the anecdotal range (BF 10 < 3). There were no significant differences in oddball performance (all p ! 0.1; BF < 1). In addition, when the old group was split into HP and LP individuals, there were no significant differences in oddball performance ( Table 2). The only significant between-group differences were performance on the logical memory subscale of the WMS III, and the difference between the logical memory subscale and the NART.

Functional source behaviour validations
FSS successfully extracted P300 ERP responses in both old (HP and LP) and young participants (FS P300 ). The outcomes of the FSS approach mirror a number of well-established properties of the P300 in aging. First, topographic maps show a posterior to anterior shift in the old group with respect to the young participants ( Fig. 1, left column). Second, there was an apparent delay in the peak latency of the FS P300 ERPs in the old group (420 AE 44 ms; mean AE standard deviation) compared to the young group (380 AE 30 ms). All of these findings are well established in previous electrophysiology studies of aging and the P300 (Balsters et al., 2013;O'Connell et al., 2012a).

Discrepancy quality check
In order to validate the quality of the FS P300 source extraction we performed the ERP P300 on Raw Data, reconstructed data by FS P300 and discrepancy (i.e. the ERP Raw Data minus the ERP obtained reconstructing the data using the source FS P300 extracted). As shown in Fig. 2, for both groups we were able to identify all the electrical activity at around 380 ms and 420 ms for the P3a and P3b peaks respectively. The discrepancy topography map also showed that the residual activity of the P300 peak is near to zero for both P3a and P3b. Table 2 Participant demographics for HP compared to LP individuals. T values were generated using Students two sample t-test (an asterisk indicates a significant difference in Levene's test of equality of variance in which case Welch's t-test was used). Significant differences between HP and LP individuals are highlighted in bold (p < 0.05). Effect sizes (Cohen's d) and Bayes Factor are also reported to highlight the magnitude of the effect. Cohen's d values between 0.2 and 0.5 are considered small, 0.5-0.8 medium, and greater than 0.8 are considered large. Bayes Factors between 1 and 3 are considered anecdotal, 3-10 moderate, and greater than 10 are considered strong. 3.2.3. Group differences in P3a and P3b derived using FSS Fig. 3 shows P3a and P3b ERPs and topographic maps for each group separately (Young, HP, LP). To assess whether there were any significant group differences we began by running a one-way ANOVA on peak latency and amplitude in the P3a and P3b (see Fig. 4). The P3a showed significant main effect of group in both latency (F(2,43) ¼ 21.01, p < 0.001, partial eta squared ¼ 0.494) and amplitude (F(2,43) ¼ 5.785, p ¼ 0.006, partial eta squared ¼ 0.212). In both cases group differences were driven by the young group compared to both HP and LP. Post-hoc ttests confirmed that young individuals had faster peak latencies in the P3a compared to HP group (t(30) ¼ 5.717, pFDR<0.001) and the LP group (t(27) ¼ 5.531, pFDR<0.001). There were no significant differences between the HP and LP group in P3a peak latency (t(29) ¼ 0.084, pFDR ¼ 0.934). Post-hoc t-tests also confirmed that young individuals had larger P3a peak amplitudes compared to HP group (t(30) ¼ 2.797, pFDR ¼ 0.015) and the LP group (t(27) ¼ 3.085, pFDR ¼ 0.011). There were no significant differences between the HP and LP group in P3a peak amplitude (t(29) ¼ 0.431, pFDR ¼ 0.669).
In summary, whilst latency and amplitude in both the P3a and P3b distinguished young and old individuals, only P3b peak amplitude distinguished between HP and LP individuals. The same pattern of results was clear when ERPs were compared on a time-point basis (see Fig. 5).

Individual differences in P3a and P3b derived using FSS
In an effort to move beyond group-level approaches we applied machine learning techniques (SVM) in order to establish whether P300 ERP responses (peak amplitude and latency) were capable of accurately categorizing Young, HP, and LP individuals (see Table 3). Highest overall accuracy to distinguish between the 3 groups was found using FSS P3b data (average 81.25% accuracy across all comparisons using FSS P3b ERP Fig. 1. FSS extracted signal between 300 and 600 ms in elderly and young participants. The topographic map and Event Related Potential (ERP) are shown for the FS P300 source extracted (young top row; old bottom row). In the ERP plot, the vertical dashed line indicates the stimulus onset and the continuous lines indicate the maximum peaks for the P3a and P3b respectively. The vertical continuous line in topographic map indicate maximum topographic peak to emphasize the shift between the two groups.  information). In particular, this feature enabled us to classify HP from LP participants with the highest accuracy (78.1% accuracy, 76.47% sensitivity, 80% specificity, pFDR< 0.001). It was also possible to distinguish between all comparisons using P3b ERP data from electrode Pz, however this was not as accurate or specific as using FSS derived ERPs. We note that when comparing HP and LP individuals, sensitivity (true positives) was the same for both FSS ERPs and Pz ERPs (76.47%), however FSS ERPs showed higher specificity (true negatives) compared to classification using Pz ERP data (80% compared to 66.67%).
To further compare FSS and channel-based methods we used Bayesian independent samples t-tests. There are a number of advantages for using Bayesian methods over traditional frequentist approaches. First, Bayesian approaches allow one to make inferences about null hypothesis as well as the alternative hypothesis. Second, it is possible to compare Bayes factors across analyses and, based on the magnitude of the Bayes Factor, make statements about whether one result is stronger than another. Using the FSS approach, there was a significant group difference in P3b amplitude (HP > LP: t(29) ¼ 4.379, p < 0.001, Cohen's d ¼ 1.58, BF 10 ¼ 156.252). P3b peak amplitude extracted from electrodes Fz and Pz was not significantly different between HP and LP (Fz: t(29) ¼ 1.537, p ¼ 0.135, Cohen's d ¼ 0.555, BF 10 ¼ 0.826; Pz: t(29) ¼ 1.23, p ¼ 0.228, Cohen's d ¼ 0.444, BF 10 ¼ 0.604). P3b peak amplitude was significantly different between HP and LP groups (t(30) ¼ 2.149, p ¼ 0.04, Cohen's d ¼ 0.776, BF 10 ¼ 1.848), however, the magnitudes of the Bayes factors for group differences using FSS and Cz were very different (156.252 for FSS compared to 1.848 for Cz). This suggests that the alternative hypothesis (HP <> LP) was 1.8 times more likely than the null hypothesis using data from electrode Cz, whereas same the alternative hypothesis was 156 times more likely than the null hypothesis using FSS. This means the alternative hypothesis was 87 times more likely to be supported using FSS compared to the best electrode (Cz).
Finally, we used a regression analysis to investigate which parameters best explained individual variability in our metric of cognitive decline (Logical Memory-NART). Independent variables included: Age, MMSE, NART (Z transformed), FSS P3b amplitude, Fz P3b amplitude, Cz P3b amplitude, and Pz P3b amplitude. A significant regression equation was found (F(7,23) ¼ 3.215, p ¼ 0.016) with an R 2 of 0.495. Only the FSS P3b amplitude significantly explained cognitive decline scores across individuals (β ¼ 0.162, SE ¼ 0.048, t ¼ 3.367, p ¼ 0.003). A Bayesian regression using the same variables found that the best model to explain Logical Memory-NART scores was to use only the FSS P3b amplitude (BF 10 ¼ 205.485 compared to the null model). A Bayesian model average of the candidate models for each of the independent variables showed that only the inclusion of FSS P3b amplitude had a strong influence on the model (BF inclusion ¼ 59.741, all other variables had a BF < 1). A separate Bayesian regression was run using independent variables of: Fig. 4. Group differences in P3a and P3b peak amplitude and latency extracted using FSS. Box plots illustrating the first quartile, median, and third quartile and 95% confidence limits. Circles show the individual data points. Black horizontal lines highlight significant differences between the groups (*p < 0.05 FDR corrected, **p <. 0.005 FDR corrected).
Age, MMSE, NART (Z transformed), FSS P3a amplitude, FSS P3a latency, FSS P3b amplitude, FSS P3b latency. This analysis was run to establish which of the neural signals extracted by FSS best captured the variance in Logical Memory-NART scores. Once again, the winning model was a model that only included FSS P3b amplitude (BF 10 ¼ 205.485 compared to the null model). Similarly, Bayesian model averaging of the candidate models found that only inclusion of the FSS P3b amplitude variable influenced the model (BF inclusion ¼ 74.176, all other variables had BF < 1).

Discussion
Despite decades of research investigating the P300 ERP and aging, it has been unclear whether changes in the P3a and P3b ERPs only reflect the process of aging or whether these could be used as markers of cognitive impairment. We posited that signal extraction techniques commonly used in EEG analyses (i.e. using single electrodes or a small subset of electrodes) could have reduced signal-to-noise by excluding potential neural generators. The FSS approach used in this study utilised a weighted combination of all available electrodes in an effort to improve signal-to-noise ratio and avoid missing potential neural generators that contribute to the P300 ERP. First, we demonstrated using FSS that whilst both the P3a and P3b change with age, only changes in P3b amplitude discriminated between individual differences in cognitive decline (episodic memory baselined by IQ). Crucially, the relationship between P3b amplitude and cognitive decline could not be explained by age.
Second, we confirmed that EEG signals extracted using FSS were 87 times more sensitive (based on Bayesian analyses) to these individual differences compared to EEG signals derived from the best single electrode (Cz). These analyses not only highlight the importance of using more advanced EEG signal processing methods such as FSS, but further highlight the utility of the P3b ERP as an easy-to-acquire marker of early, ageindependent, loss of memory performance.

The advantages of Functional Source Separation (FSS)
Although non-invasive electrophysiological techniques, such as EEG, provide the opportunity to directly measure the activity of large-scale neuronal populations, different challenges remain in characterizing this activity. In particular, electrical potentials generated by neuronal activity are not only detected close to neuronal sources but also at distant sites due to electric field propagation. Therefore, each channel positioned across the whole head derives its signal from more than one source (Porcaro and Tecchio, 2014;Rusiniak et al., 2013;Siegel et al., 2012). Since the P300 undeniably arises from a widely distributed network (Fjell et al., 2007;Linden, 2005;Polich, 2007Polich, , 2003, selecting a channel or averaging a group of channels based on the topographic representation might be misleading if one aims to describe a distributed brain network. In this respect, we believe that methods capable of extracting the neural source under investigation (such as FSS) are suitable to avoid selection of channels and to overcome possibly misleading results. The FSS approach has been successfully applied to extract EEG specific features using Point-by-point analysis of P3a and P3b ERPs extracted using FSS (top panels) and the best performing single EEG channel (CZ; bottom panels). In all plots Young (blue Line), HP (Green Line), and LP (Magenta Line) are shown with the shaded area of the same colour highlighting standard error. Horizontal black, cyan and grey thick lines indicate a significant group difference between Y vs. HP, Y vs. LP and HP vs. LP respectively (permutation t-test at p ¼ 0.05; horizontal pink line indicates pFDR < 0.05). different number of channels for primary motor 28 MEG Sensors, ; EEG Electrodes, (Di Pino et al., 2012); primary sensory 28 MEG Sensors, ; 23 EEG Electrodes (Porcaro et al., 2009a(Porcaro et al., , 2009b; 39 EEG Electrodes (Porcaro et al., 2017) and primary visual 64 EEG Electrodes Porcaro et al., 2011Porcaro et al., , 2010 areas but this is the first demonstration that FSS can be used to extract non-primary and wide spread brain activations. Based on the spatial (source localization and topography) and functional behaviour of the sources extracted by FSS, it seems that the impact of the number of the electrodes is rather small. Moreover, functional behaviour and signal-to-noise ratio (SNR) seems to be improved when FSS is applied, independently of the number of channels used (Porcaro et al., 2009b(Porcaro et al., , 2017(Porcaro et al., , 2018(Porcaro et al., , 2009a making this a useful tool for clinical datasets collected using a low number of electrodes. Since the P300 has been intensively investigated, it provides a perfect framework to test the quality of functional sources extracted by FSS (i.e. comparing the results obtained by FSS with those obtained in previous studies and well established in the literature). Here, we used specific temporal fingerprint information, i.e. we maximized the P300 response in the chosen window 320-480 ms for both young old participants to extract the functional sources (FSs) of interest. The features extracted by FSS were well matched with those reported in the literature using pre-selected channels such as slower peak latency of the P300 response and a more frontal topography map in the old group than the young one (Balsters et al., 2013;O'Connell et al., 2012a). Moreover, our discrepancy measure (i.e. the P300 ERP of Raw Data minus the P300 ERP obtained reconstructing the data using the source FS P300 ) showed that the residual activity of the P300 peak is near to zero for both P3a and P3b as expected if all the electrical activity is in the FS P300 . This check assured us that we were able to extract all the electrical activity underlying the P300 response. Finally, results obtained by FSS seemed to be more sensitive for discriminating between HP and LP elderly than those obtained using the Cz channel which has been conventionally chosen to study the P3a and P3b components of the P300 (Johnson, 1993;Linden, 2005;Magnano et al., 2006;Polich, 2007;Rossini et al., 2007).
Independent Components Analysis (ICA) is the most common approach for extracting neural sources underpinning the P300 ERP response (Makeig et al., 2004). Recently, van Dinteren et al. (van Dinteren et al., 2017) used group ICA to investigate the P300 aging effect, however, a major drawback of using ICA compared to FSS is that multiple independent components might describe the P300 ERP. Porcaro et al. (2018) compared FSS and ICA approaches on data from primary motor and primary sensory areas. The results showed that ICA tended to describe the functional behaviour in multiple components whereas a single functional source was extracted by FSS allowing for simpler analyses. In the case of van Dinteren et al. (van Dinteren et al., 2017), they report four components that describe the P300 ERP, some of which show differences with age. In addition to being difficult to interpret, the ICA-based approach to extracting neural sources raises a number of methodological issues. First, the selection of 'relevant' components is usually biased by the choice of the user (depending on user experience). This selection process can also be very demanding and time-consuming on the user if they are working with high density EEG data (i.e. 128 channels or more). Finally, the combination of multiple components requires the user to reconstruct the data by removing all non P300 components from the data and returning to channel space. This then requires channel selection for analysis, re-introducing previously mentioned concerns about identifying sources. Compared to this, FSS exploits a well-known a-priori 'functional' property (i.e. relevant "fingerprint" information regarding frequency range or time course characteristics) to identify the source of interest. This means FSS gives the user one source that maximises the pre-defined functional property of the data, thus addressing all the concerns raised above. The potential drawback is that FSS cannot be used to extract unexpected or unknown brain activities, however previous studies have found that this semi-blind approach often outperforms fully blind source separation approaches like ICA Porcaro et al., 2011Porcaro et al., , 2010.

P300, aging, and cognitive decline
Consistent with previous studies we showed that both peak latency and peak amplitude of the P3a and P3b ERPs changed with age (Fjell et al., 2007;O'Connell et al., 2012a;Polich, 1997;Walhovd and Fjell, 2001). Specifically, the peak latency of both the P3a and P3b responses increased in the elderly group, whereas the peak amplitude of both the P3a and P3b decreased with age. However, only the P3b amplitude distinguished between the two elderly subgroups (HP vs LP). This effect was also apparent when ERPs were compared at each sampling point and when using machine learning to detect group classification. Fig. 5 highlights that there was no form of delayed P3b response in the LP group compared to the HP, rather there was an absence of the P3b ERP in the LP group. This is consistent with Dockree et al. (2015) who also subdivided older individuals into HP and LP groups using the same criteria as this study. Dockree et al. (2015) found that both young and HP showed greater EEG activity in centroparietal regions when recognising previously learnt words, whereas ERP responses in LP individuals were not different for learnt and novel words. Similar to this study (see Fig. 3), ERP topographies presented in Dockree et al. (2015) show additional frontal peaks in HP and LP individuals compared to purely centroparietal activity in the Young group. Whilst some have suggested that the increased frontal activity in aging reflects compensatory activity (Davis and Friedrich, 2010;van Dinteren et al., 2014), others have shown that increased frontalisation of the P300 response is associated with poorer performance in neuropsychological tests (Fabiani et al., 1998;West et al., 2010). It has previously been suggested that the prefrontal cortex contributes to the initial target detection trials, however, after a few trials a model of the task is created and prefrontal control is no longer required (Fabiani and Friedman, 1995;Richardson et al., 2011;West et al., 2010). In line with this, it has been suggested that increased frontalization of P3b responses in elderly individuals may reflect an inability to establish a strong mental representation of the task stimuli, and as such there is a continued reliance on both frontal and parietal regions in order to maintain task performance. One limitation of these previous studies has been the inability to localize this general frontalization effect to discrete neural sources. In order to address this problem, O'Connell et al. (Redmond G. O'Connell et al., 2012) used simultaneous EEG-fMRI recordings to investigate the neural generators underpinning this phenomenon. Crucially, O'Connell et al. (O'Connell et al., 2012a) demonstrated that this marked frontalization effect with age was driven by distinct neuroanatomical changes for the P3a and P3b responses. Whilst the increase in frontal P3a responses with age was produced by increased activity in the left inferior frontal gyrus and middle cingulate cortex, the increase in frontal P3b responses was driven by increased activity in the right middle frontal gyrus and putamen. This once again highlights that the P3a and P3b responses are underpinned by unique patterns of brain activity and are likely contributing to different cognitive processes. In addition, these findings suggest that the general process of frontalization that occurs during aging can be driven by distinct neural generators. Given that the LP group showed reduced amplitude for P3b only, it is possible that the cognitive decline may reflect aberrant cortico-striatal connectivity. This is in keeping with previous work investigating cortico-striatal connectivity in healthy elderly individuals by Ystad et al. (Ystad et al., 2011. Whilst Ystad et al. (2011) found that elderly individuals who performed better on executive function tasks had stronger connections between the putamen and dorso-medial parietal cortex (putatively dorsal attention network), Ystad et al. (2010) showed a negative relationship between the episodic memory and the number of coritco-striatal connections. This suggests that whilst increased connectivity between the putamen and parietal lobe is beneficial in aging, diffuse and more distributed connectivity between the putamen and multiple other regions is detrimental. This could suggest that increased frontal-striatal connectivity is actually decreasing task performance, and perhaps taking resources away from beneficial parieto-striatal connectivity. However, EEG is not suited to address cortico-subcortical hypotheses and further research in this area would require fMRI or simultaneous EEG-fMRI.
With the greying population there is a growing need for sensitive markers of cognitive decline in ageing. Here, we have defined cognitive decline in ageing using a combination of neuropsychological tests (ageadjusted scores from the Logical Memory subtest of the Wechsler Memory Scale III baselined against NART IQ). The results of this study build on a growing literature highlighting neural differences between HP and LP elderly individuals as defined here (Dockree et al., 2015;Hogan et al., 2011Hogan et al., , 2012O'Hora et al., 2013). Furthermore, LP individuals as defined above, have shown differential receptor expression and cytokine profiles indicative of inflammation that may be involved in the prodromal processes leading to the development of neurodegenerative disease (Downer et al., 2013). However, to our knowledge these findings have only been demonstrated at the group level, and it is not clear whether the any of these markers (including the P300 ERP) index individual differences in cognitive decline. In an attempt to address this issue, we used a machine learning approach to establish whether the P3b signals extracted using FSS could be a useful tool to categorize elderly individuals as HP or LP. This would be particularly advantageous clinically given the speed with which P300 ERP data could be collected (~10mins), which is much faster than the time needed to collect the neuropsychological battery (~2hrs). The results of our SVM analysis were promising, demonstrating that P3b signals extracted using FSS could categorize individuals as HP or LP with 78% accuracy (76% sensitivity; 80% specificity). However, it is important to treat cross-sectional results with caution (Fisher et al., 2018). Nyberg et al. (2010) found that whilst cross-sectional evidence showed an increase in prefrontal activity with ageing, a longitudinal examination showed the opposite trend. The results presented here provide an important contribution to the ageing literature, however, it is crucial to replicate these results with an independent sample, and to conduct longitudinal designs to establish whether individual changes in the P300 ERP index individual changes in cognitive decline.
It will also be important to move towards more mechanistic explanations of perturbed neural activity (Hanks and Summerfield, 2017;Montague et al., 2012). Converging evidence from mathematical modelling and primate electrophysiology have suggested that the P300 ERP (specifically the P3b) has all the characteristics of a 'decision variable' signal O'Connell et al., 2012b;Twomey et al., 2015). A decision variable signal represents the accumulation of sensory information over time until there is sufficient information (also referred to as evidence) to pass through a boundary criterion. The absence of the P3b response in the LP group could suggest that elderly individuals at risk of cognitive decline have a reduced ability to accumulate sensory evidence over time. This is in keeping with the findings of Dockree et al. (2015), who found LP individuals showed significantly slower evidence accumulation during an episodic memory task compared to HP individuals. Unfortunately, traditional EEG paradigms such as the three-stimulus oddball and the memory encoding paradigm used by Dockree et al. (2015) are not optimal for investigating evidence accumulation mechanisms. Further research is necessary to determine whether the absence of this fundamental neural mechanism (evidence accumulation) could become a crucial marker of cognitive decline and dementia.

Conclusion
In conclusion, using a simple well-established paradigm in combination with advanced signal processing techniques (FSS), we were able to reliably extract both subcomponents of the P300 ERP. Our analyses demonstrated: 1) P3b amplitude could be a useful and easy-to-acquire metric of age independent memory lossand possibly prodromal diseasein elderly individuals, and 2) more advanced signal processing methods such as FSS are necessary in order to detect subtle variations in EEG signals that are likely obscured by much larger effects such as agerelated differences.