Relation between centro-parietal positivity and diffusion model parameters in both perceptual and memory-based decision making

Several studies have suggested that the centro-parietal positivity (CPP), an EEG potential occurring approximately 500 ms post-stimulus, reflects the accumulation of evidence for making a decision. Yet, most previous studies of the CPP focused exclusively on perceptual decisions with very simple stimuli. In this study, we examined how the dynamics of the CPP depended on the type of decision being made, and whether its slope was related to parameters of an accumulator model of decision making. We show initial evidence that memory- and perceptual decisions about carefully-controlled face stimuli exhibit similar dynamics, but offset by a time difference in decision onset. Importantly, the individual-trial slopes of the CPP are related to the accumulator model's drift parameter. These findings help to further understand the role of the CPP across different kinds of decisions.


Introduction
Accumulator models of decision making have been very successful in accounting for behavioural data in various decision making paradigms. Well-known members of this class are the Drift Diffusion Model (Ratcliff, 1978), the Leaky Competing Accumulator Model (Usher and McClelland, 2001) and the Linear Ballistic Accumulator Model (Brown and Heathcote, 2008). The common idea of all of these models is that when we make a decision, we accumulate evidence for each option until the evidence for one of the options reaches a threshold level, after which the corresponding choice is executed.
In recent years, researchers have started to investigate how the evidence accumulation process for decisions manifests in the brain. Initial suggestions for neural correlates of accumulation processes came from animal studies that looked at single-neuron activity in the lateral intraparietal area while monkeys were deciding on the direction of random dot motion clouds (Gold and Shadlen, 2000;Newsome et al., 1989). Slightly later various studies using different neuroimaging modalities in humans followed, which were mostly focused on random dot motion decisions as well (e.g., Donner et al., 2009;Heekeren et al., 2008;van Vugt et al., 2012). While the evidence accumulation process is difficult to observe in functional neuroimaging (Heekeren et al., 2008;Ploran et al., 2007), electroencephalography (EEG) has a better temporal resolution on the millisecond scale. There have been several approaches to examining evidence accumulation in the EEG domain: focusing on event-related potentials at specific moments in time, or alternatively the dynamics of brain oscillations or event-related potentials over time. Ratcliff et al. (2009) showed that the amplitude of an EEG component consisting of a linear combination of various posterior parietal channels, occurring around 400 ms after the stimulus correlated with the amount of decisional evidence during a perceptual decision making task. They could use single-trial classifier estimates to distinguish trials with low evidence from trials with high evidence.
The problem with focusing only on a specific moment in time is that such a study says little about the dynamics of the decision process. For that reason, van Vugt et al. (2012) examined what brain oscillations would show a signal dynamics consistent with the accumulation of evidence over time, and demonstrated that parietal 4-9 Hz theta oscillations covaried with the accumulation of evidence during a random dot motion task. In other words, they found that the power of theta oscillations increased until the moment of the response in trials in which there was information to be accumulated, while theta power stayed flat in control trials in which there was no information to be accumulated, apart from the stimuli that were being presented. They also showed that individual differences in the slope of theta oscillations covaried with individual differences in the drift parameter of the model fits. Donner et al. (2009) related 12-36 Hz beta oscillations recorded from the motor cortex with magnetoencephalography to this same process. Specifically, they showed that beta oscillations increased over the course of the decision, and predicted the upcoming choice several seconds before the actual response. Another neural measure of the decision variable developing in real time was proposed by O'Connell et al. (2012). In an elegant target detection task without sudden stimulus onsets they isolated a centralparietal positive ERP component (the CPP), which peaked at the response latency and its rate of rise depended on the difficulty of the decisions, two key characteristics of a decision variable according to accumulator models (see also Loughnane et al., 2016). In a similar vein, Pisauro et al. (2017) examined the dynamics of the CPP in a valuebased decision task, using simultaneous EEG and fMRI. Using each individual's average predicted accumulation dynamics, they uncovered a correlate of evidence accumulation in posterior parietal electrodes that bears quite some similarity to the CPP, in the sense that it arose from similar centroparietal areas and peaked at the time of the response after a gradually increasing trajectory. This EEG signature was associated with BOLD activity in the posterior medial frontal cortex. This accumulation process was relatively task-general: they found a parietal electrode cluster to exhibit accumulation-like dynamics in a perceptual decision making paradigm that was similar to their finding in valuebased decisions.
The work by Pisauro et al. (2017) suggests that the CPP is a relatively task-general correlate of evidence accumulation. This is in agreement with the original intent of the DDM, which was developed to explain recognition memory decisions (Ratcliff, 1978) but in recent times has most often been used to explain simple perceptual decisions (see Heekeren et al., 2008, andMulder et al., 2014, for reviews). Nevertheless, also slightly more complex decisions such as judgments about whether one has seen an item before, which require a comparison between evidence presented on the screen and internal memory evidence-"memory decisions" (Donaldson et al., 2009;Donkin and Nosofsky, 2012;Mack and Preston, 2016)-can be explained with the same kind of framework. Support for the idea that the CPP reflects a decision-general process of evidence accumulation comes from numerous studies that have demonstrated that the P300/P3/P3b is a signal very similar to the CPP (Twomey et al., 2015) in its topography (centro-parietal), as well as its dynamics (both are a large-amplitude potential that increases until the response, and peaking around 300-600 ms post-stimulus). The P300 is sensitive to the difficulty of both perceptual (Hillyard et al., 1973;Squires et al., 1975) and memory tasks (Polich, 2007). In addition, the P300 has been associated with numerous cognitive processes ranging from computing uncertainty (Duncan-Johnson and Donchin, 1977) to orienting (Nieuwenhuis et al., 2005;Nieuwenhuis et al., 2011). Yet, viewed through the lens of accumulator models of decision making, these difficulty and attentional modulations could in fact reflect modulations of the drift rate-the speed of evidence accumulation. Indeed, the P300 (or CPP) has been found to have a slope roughly sensitive to the amount of evidence accumulated, and an amplitude changing across task conditions consistent with being a decision threshold (Kelly and O'Connell, 2013;O'Connell et al., 2012). If the P300/CPP is indeed a domain-general decision mechanism, then this suggests that it should be observed in memory-based, value-based and perceptual decisions. Moreover, for the CPP to be a specific accumulation signature, its dynamics should covary with model estimates of DDM parameters.
To examine the task-generality of the CPP, we developed a task that combined a delayed-match-to-sample memory-based decision task and a same-different perceptual decision making task with face stimuli. In a delayed-match-to-sample task, participants see a first stimulus ("memory item"), followed after a delay by a second stimulus ("probe item"), for which they have to decide whether it is identical ("match") or different ("non-match") from the stimulus just presented. The delayed-match-to-sample task is a task that has been used in many monkey studies of evidence accumulation (Deco et al., 2013;Romo et al., 2004). While in random dot motion the amount of evidence is manipulated with the proportion of coherently moving dots, in our delayed-match-to-sample based on face stimuli, the amount of evidence is manipulated with the amount of similarity between the two face stimuli that are being compared. Such a manipulation is possible because face generation software allows for precise manipulation of information about faces (e.g., Paysan et al., 2009;Wilson et al., 2002). Previously, we have shown that such gradations in face similarity affect decisions about whether an item is a match or a non-match with a remembered stimulus (van Vugt et al., 2009;van Vugt et al., 2013). Specifically, when the probe appears, the participant will retrieve or refresh their memory of the stimulus items, compare the probe to these stimuli and use this to drive the evidence accumulation process (Nosofsky and Palmeri, 1997). The higher the match or similarity between the probe item and the memory items, the higher the speed with which the accumulation process drifts towards the thresholds . If the CPP is an evidence accumulation signal, it should be sensitive to the amount of match in each trial, for both perceptual and memory decisions, and its single-trial behaviour should covary with estimates of DDM model parameters.
To preview our results, the CPP gradual build-up predicted by accumulator models is visible for both perceptual and memory decisions, and-importantly-its slope covaries with estimates of drift parameters of the DDM in line with the idea that the CPP is a specific reflection of the evidence accumulation process.

Behaviour
To make the neural correlates of perceptual and memory-based decision making maximally comparable, we aimed for the perceptual and memory decision tasks to be of roughly equal difficulty. We did so by manipulating difficulty by varying the similarity of the face stimuli (see Methods and Materials). In addition, we made the perceptual task more difficult by having the faces look outward at an angle. Fig. 1 shows that the difficulty matching between the perceptual and memory tasks was moderately successful. There was a significant difference in accuracy between the two tasks with slightly higher accuracy for the perception task (t(22) = 3.4, p < 0.01), although the means of the two conditions (perception: M = 0.82; memory: M = 0.78) were not far apart. The reaction times (RTs) were less comparable. The average RT for the perceptual task was 1326 ms, while the average RT for the memory task was 824 ms, which were significantly different (t (22) = 12.8, p < 0.001). We believe this behavioural difference arises from to the fact that in the perceptual task, RT is counted from the moment the two faces come on the screen. Participants first need to process both faces visually, then mentally rotate them before being able to make a response, and they may need to make a number of eye movements to do this. In the memory task, on the other hand, RT is counted from the onset of the probe face, so that only one stimulus needs to be encoded before a decision can be made and no rotation and eye movements are needed.
This explanation is consistent with the outcome of drift diffusion model fits (using the DMA Toolbox, VandeKerckhove and Tuerlinckx, 2008). We compared the model fits of models that did not vary any parameters across tasks (perception vs. memory), models that varied the non-decision time parameter across tasks, models that varied additionally the drift rate, and models that varied additionally the decision threshold. We found that for 70% of the participants (16 out of 23), the model that varied the non-decision time had the lowest BIC thereby suggesting that the differences in behaviour between the perceptual and memory tasks could best be explained by differences in non-decision time. The model that allowed non-decision time to vary between the two tasks fitted the data satisfactorily as judging from predicted versus observed quantile plots ( Supplementary Fig. S1).
As expected, fits of the DDM to each participant's behavioural data show that the main difference between the perceptual and memory decision tasks lies in the non-decision time, which represents time needed for visual stimulus processing and response preparation. Table 1 contains statistics on the estimates of three key model parameters over all 23 participants. As this table shows, the estimated drift rate and decision threshold are by design identical for the two tasks (since we chose models for which these two did not differ). The non-decision time is significantly different between the two tasks (t(22) = 40.8, p < 0.001). This non-decision time is nearly 300 ms longer in the perceptual task and may reflect the rotation and eye movement required in that task. This is consistent with previous work that has shown that mental rotation can be captured by non-decision time (Provost and Heathcote, 2015).
Next we checked whether our difficulty manipulation was successful. Difficulty firstly was manipulated in a continuous fashion through the similarities of the stimuli. In the perception task, trials are more difficult when the two presented faces are more similar. In the memory task, the difficulty depends on the sum of the similarities between the probe and each list item (Kahana and Sekuler, 2002;Nosofsky, 1991). When this sum of similarities is large, participants are more likely to think the probe item was a member of the just-presented list. When it is small, participants think the probe item was not a member of the just-presented list. Consequently, high summed similarities for non-matches make the trial more difficult because they make a "match" response more likely, while for matches, low summed similarities make the trial more difficult since they make a "non-match" response more likely. As Fig. 2 shows, accuracy on non-match trials gradually decreases with increasing (summed) similarity, as expected.
To examine whether this difficulty manipulation was picked up by the DDM, we obtained separate drift rate estimates for non-match trials in four different similarity bins. As (summed) similarity increases, the participant is more likely to endorse the item as a match, and hence task difficulty of non-match trials increases with summed similarity, predicting a lower drift rate for these trials. To test the hypothesis that drift rate increases with summed similarity, we performed a linear regression analysis for each participant and compared the resulting regression coefficients (β) against zero using a t-test. For both the memory and the perceptual task, we observed a significant negative relation between (summed) similarity and drift rate (perceptual task: mean β = −1.69, p < 0.01; memory task: mean β = −1.52, p < 0.001). This indicates that as expected, increased similarity is associated with greater perceived decision difficulty.

Relation between CPP and response time
To see whether the CPP component reflects the decision making process, we looked at its activity on several qualitative measures that were also used in the original study by O'Connell and colleagues. Specifically, accumulators should (1) have a slope that scales with response time, (2) differentiate correct from incorrect trials, (3) differentiate between easy and difficult trials and (4) predict the upcoming response. In addition, the slope of the CPP should be informative of a person's drift rate. First, examining how the CPP depends on response time, Fig. 3 shows that the peak of the CPP shifts with increasing response time, which is not an artefact of correctness because this analysis was done only on correct trials. From several hundred milliseconds after probe onset, there is a significant relation between the amplitude of the CPP and the eventual response time for each trial. This is consistent with the notion that the build-up of the CPP reflects the development of a decision variable. Moreover, when examining the CPP at the individual-trial level (Fig. 5), it is clear that it peaks right before the response as well, and therefore the observed relationship with response time is not an artefact of averaging.  Table 1 Mean (standard deviation) estimates of drift diffusion model parameters for the perceptual and memory tasks. Since the best-fitting model had the drift and decision threshold parameter fixed across the perceptual and memory conditions, those are the same for the two tasks.

CPP on correct vs. error trials
Next, we examined whether CPP amplitude differentiated between correct and incorrect trials. Since evidence for the correct response, on average, is stronger than for the incorrect response, the CPP should be higher for correct relative to incorrect responses. 1 Fig. 4 shows that this was indeed the case. In the perceptual condition, there was no significant difference between CPP amplitude for correct and error trials; in the memory condition the difference occurred from about 500 ms after probe onset. In agreement with the idea that memory and perceptual decisions exhibit a similar evidence accumulation process, correct decisions have a larger amplitude than incorrect decisions in both conditions, although the difference is only significant in the memory task. Moreover, the difference between the correct and incorrect signals started to emerge after approximately 500 ms for the memory condition, and in the perceptual condition a numerical (but not statistical) difference started to emerge after approximately 800 ms, in agreement with their 300-ms difference in non-decision time (Table 1).

Dependence of CPP on difficulty
According to accumulation models, the build-up of a decision variable should track the strength of the available evidence, which is the inverse of trial difficulty. We tested whether the CPP showed a graded response to our difficulty manipulation, (summed) similarity. Since the effect of similarity on difficulty is reversed for match compared to non-match trials, and because there is only one similarity value for match trials in the perceptual task (i.e., a perfect similarity score of 1) we only looked at non-match trials here. As can be seen in Fig. 6, CPP amplitude is highest on the easiest trials and lowest on the most difficult ones, while showing an intermediate response on trials of intermediate difficulty. This is consistent with the CPP reflecting a decision process, and this decision process being similar for perceptual and memory decisions. As before, the effect was only significant for the memory decisions. The absence of significance in the perceptual decision making task could be due to increased variability in that EEG Fig. 2. Accuracy (probability of endorsing an item as a match in non-match trials) as a function of similarity between the two faces in the perception task (left) and as a function of summed similarity between probe and all study items in the memory task (right). Error bar indicates standard error of the mean.     6. CPP activity for four different (summed) similarity bins for the perceptual (a) and memory (b) conditions. Lighter colors represent easier (low-similarity) trials; only non-match trials are included. Time point 0 corresponds to probe onset. The black bar indicates a significant correlation between CPP amplitude and similarity, with corrected p < 0.05. signal due to the longer average response times in that condition.

Dependence of CPP on selected response
We then examined whether the CPP is a signed representation of the accumulated evidence, that is, whether it is higher for one of the responses than for the other reflecting a decision threshold at different CPP amplitudes for the two responses. Fig. 7 shows that the CPP indeed distinguishes between the two responses that were actually chosen and therefore the CPP may reflect a signed signature of evidence accumulation. From 800 to 1400 ms for the perceptual and from 500 to 800 ms for the memory condition, the CPP exhibits higher amplitude for nonmatch than for match responses. In other words, the CPP does not only code for difficulty of the trials and correctness of the responses, but also the actual identity of those responses. This higher amplitude for nonmatch responses is consistent with older studies that showed evidence that participants adjust their speed-accuracy trade-off between match and non-match responses. Specifically, in most of those studies the decision threshold is lower for match than for non-match responses (Farell, 1985;Ratcliff and Hacker, 1981); similar to what we observed.

Relation between CPP and model parameters
Finally and crucially, we tested more directly whether the CPP was related to evidence accumulation by assessing whether its slope was associated with model parameters fitted on the behavioural data. We computed CPP slopes on single trials and used that to group the data into low-neural-drift and high-neural-drift trials. We observed that trials with a lower neural drift rate (i.e., CPP slope) also had a lower behavioural drift rate when estimating this parameter separately for the low-and high-drift rate categories (Fig. 8). The difference between drift rates for lower CPP slope was significant for both the memory task (t (22) = 2.78, p = 0.01) and the perceptual task (t(22) = 2.16, p = 0.04). In other words, the CPP does not merely exhibit a dynamics that is consistent with an accumulator model, but its slope is also consistent with between-trial differences in accumulator model fits. There were no differences in the decision threshold between the two slope groups (t(22) < 0.63, p > 0.53; and a Bayes Factor analysis confirmed that the slopes are indeed equivalent: BF 01 = 0.262 for memory trials and BF 01 = 0.23 for perception trials).

Timeline of decision making
Together, this starts to build a timeline of decision making that complements the work done by Philiastides and Sajda (2007), who focused more on the beginning of the trial. They showed that around 170 ms after stimulus presentation, there was a peak in the EEG associated with visual perception (of faces), also reflected in BOLD activity in the fusiform face area and superior temporal gyrus. Then, around 220 ms there was a peak in the EEG associated with trial difficulty, also associated with BOLD activity in areas such as anterior cingulate cortex and dorsolateral prefrontal cortex. After 300 ms, the decision processes started, associated with activity in lateral occipital complex and ventrolateral prefrontal cortex. From our study we can add that around this time, the CPP starts to also be predictive of response time, followed by modulation by the amount of evidence and correctness, and finally by modulation by the identity of the response itself (Fig. 9).

Discussion
Our study shows that the centro-parietal positivity (CPP), which was previously suggested to be a neural correlate of a decision variable during a perceptual decision making task, exhibits dynamics somewhat consistent with being a decision variable during more complex decisions. In recognition memory for faces-and to some extent in perceptual decisions about faces-the CPP satisfies several of the criteria for being an accumulator that were previously put forward by O'Connell et al. (2012). Crucially, Fig. 3 showed that there is a significant relation between CPP amplitude and response latency from several hundred milliseconds into the trial, after initial stimulus processing, and could even predict response times on a single trial (Fig. 5). O'Connell et al. also observed that the CPP amplitude was larger for correct responses than for misses. In our paradigm, there are no misses, but the closest analogues of misses are error responses, which presumably reflect an accumulation process that moves to the wrong decision threshold. Indeed, we observed that the CPP exhibited a stronger response on correct than on incorrect trials for the memory decisions (Fig. 4), as well as a graded decrease in response amplitude from easy to difficult trials in the memory decisions (Fig. 6). The CPP also predicted which response (match or non-match) a participant was going to make later in the trial for both perceptual and memory decisions (Fig. 7). In general, the CPP looked similar for perceptual and memory decisions, although the process was delayed by approximately 300 ms for the perceptual task due to a longer stimulus processing time. The significant effects are summarized in Fig. 9. Interestingly, the CPP is predictive of response time from quite early in the trial. After that, it starts to be predictive of the amount of evidence available for decisions, as well as whether the person will get the trial correct, and finally, the CPP starts to predict the Fig. 7. CPP activity for 'match' versus 'nonmatch' responses for the perceptual (a) and memory (b) conditions. Time point 0 corresponds to probe onset. Black bars indicate a significant difference in CPP amplitude between the two responses, with corrected p < 0.05. actual response given. Overall, the CPP effects occur later for the perceptual condition relative to the memory condition, most likely because these trials have a non-decision time that is approximately 300 ms longer, as we demonstrated with our drift diffusion model fits (Table 1). Importantly, further supporting a role for the CPP in evidence accumulation, we found that the CPP covaries with within-individual differences in model parameters (Fig. 8). Together, the fits and the timing diagrams suggest that the accumulation process for perceptual and memory decisions is quite similar to the extent that significant effects could be observed in the perceptual decision condition. However, the process is delayed by approximately 300 ms in perceptual decisions. The longer response time for the perceptual decisions may also be the reason that the CPP is not as predictive in the perceptual task as in the memory task, since evoked potentials become more noisy the farther they are away from the event to which they are time-locked. In summary this suggests that there is initial evidence that the CPP is a domain-general EEG potential that is sensitive to the evidence accumulation process involved in making decisions, although several of the criteria for neural accumulators are only satisfied by the shorter-duration memory decisions and not by the longer-duration perceptual decisions.
Our study has several limitations. First and foremost, the tasks used in our comparison of perceptual and memory decisions turn out to be not ideal. While they both make use of the same stimuli, response times are significantly longer in the perceptual task, most likely due to the need to rotate the stimuli before making a decision. This means that effectively an extra task stage is added to the task which makes it more challenging to compare the dynamics of the memory and perceptual decision conditions. The longer response times for the perceptual decisions also mean that the relevant CPP signal was further away from the stimulus onset, and therefore more noisy. It is therefore not strange that in several of the statistical tests, the hypothesized difference was only significant for the memory decisions and not for the perceptual decisions. Future research should replicate the CPP paradigm using tasks in which the response times and accuracies are more closely aligned.
Second, given that analysing the face stimuli involves comparing different parts of the face, a potential worry is that our EEG may be contaminated by a substantial number of eye movements, especially for the perceptual decisions. However, these eye movements occur primarily before the signal of interest, the CPP, takes off. In other words, these eye movements may dominate in earlier perceptual stages, and primarily in frontal electrodes that do not contain the CPP (Dimigen et al., 2011). Moreover, the amount of eye movements in our paradigm is probably not substantially different from that in visual search, for which P300 potentials-which are similar to the CPP-are routinely reported (Busch and Herrmann, 2003;Luck and Hillyard, 1994). Moreover, most eye movements were removed from the EEG data using independent component analysis (ICA), such that in principle their effects should have been minimized.
O'Connell et al. (2012) first identified the CPP in a very simple signal detection task and later described its activity in the classic random dot-motion task (Kelly and O'Connell, 2013). These studies established that the CPP is independent of stimulus identity or response implementation. We have taken this a step further and shown that the CPP still reflects the build-up of a decision variable when participants are deciding about complex face stimuli rather than simple perceptual information. In addition, the CPP responds similarly to trials in which Fig. 8. Drift parameters of trials that were grouped by CPP slope. Different colours indicate different participants. There is a larger drift rate for trials with higher CPP slope for both the perception (left) and memory (right) conditions. Fig. 9. Summary of the timing of the significant task effects on the CPP for the perceptual condition (darker top bars) and memory condition (brighter bottom bars). From top to bottom: response time effects (from Fig. 3), correctness effects (from Fig. 4), evidence effects (from Fig. 6), and response effects (from Fig. 7). the information that is accumulated is retrieved from memory as when it is derived from sensory input, and its magnitude is related to DDM model parameters. This evidence for accumulation processes in recognition memory is in line with several other recent studies which have taken the perspective that information retrieved from memory can be accumulated the same way as other decision input (e.g., Gluth et al., 2013;Kira et al., 2015) and that accumulator models can be used to describe memory-based decisions as well (Mack and Preston, 2016;Rutishauser et al., 2015).
Our study differs from earlier studies of the CPP in decision making in that our CPP does not go up to a fixed amplitude, which some argue is a requisite property of an accumulator (O'Connell et al., 2012;Purcell et al., 2010). For example, Purcell et al. (2010) have shown that in the frontal eye field the time at which neural activity reaches a fixed threshold determines response times, and O' Connell et al. (2012) showed that the CPP reaches a fixed level in their task. In addition, there is abundant evidence for adaptations of thresholds in the context of speed-accuracy trade-offs (Luce, 1986), and other studies have shown different final firing levels depending on the level of speed-accuracy trade-off in neurons in motor cortex (Heitz and Schall, 2012). We too have previously demonstrated with both modelling and electrophysiology (van Vugt et al., 2014) that in task conditions differing in decision thresholds, the neural accumulators (in that case measured by the Lateralized Readiness Potentials) differ correspondingly. Indeed, recent work by Steinemann et al. (2018) suggested that the change in CPP amplitude with response time and with shifts in speed-accuracy trade-off could potentially reflect collapsing boundaries (Drugowitsch et al., 2012). One could say that the potential is pushed down by the urgency signal, which would be larger for longer response times, thereby creating a lower amplitude for longer response times compared to shorter response times, as we observed (Fig. 3). Although a model which incorporates a collapsing bound is thus an interesting candidate model for our data, the relatively small amount of data does not allow us to obtain reliable fits for these models. For this reason, we have done our model fits using the pure DDM which only includes the main model parameters drift rate, non-decision time, and decision threshold (setting other parameters such at variability in drift rate to zero).
As mentioned above, a recent study (Twomey et al., 2015) has shown strong similarities between this signal and the classic P300 component found in event-related potentials (ERPs) during oddballtasks. P300-like components (e.g., P3b) are also found ubiquitously in the working memory/short-term memory literature. For example, Morgan et al. (2008) showed that in a face recognition task very similar to ours, the P300 amplitude decreased with study list length. Considering that evidence accumulation is expected to go slower on more difficult trials, this is consistent with the P300 reflecting a developing decision variable. Perhaps our extension of the CPP into the realm of recognition memory and face matching tasks will help to shed more light on the role of the P300 in these tasks as well.
Another slow potential that is similar to the CPP is the Contingent Negative Variation (CNV). It has been suggested that the CNV reflects an accumulation process, albeit an accumulation of time instead of an accumulation of decision evidence (Macar and Vidal, 2003;Ng et al., 2011; but see Kononowicz and van Rijn, 2014). The CNV is sensitive to a multitude of factors, including motivation, attention, task difficulty (Tecce, 1972). The CNV is in most studies associated with an EEG signal in more frontal electrodes than CPz (but see Macar and Vidal, 2003), and does not always go up until the moment of the response. Moreover, unlike the CPP, which is time-locked to the imperative stimulus, the CNV is typically defined as a potential that starts from the moment of a warning cue that precedes the imperative stimulus, continues while that stimulus is presented and then goes back to baseline at the moment of the motor response. It is therefore unlikely that the CPP and the CNV reflect the same decision signal.
We have previously reported that the amplitude of 4-9 Hz theta oscillations tracks a decision variable in perceptual decisions about random dot motion (van Vugt et al., 2012). In fact, it could very well be that the CPP studied arises from the same origin as theta oscillations. This is a particularly attractive idea given that several previous studies have suggested links between the P300 and theta oscillations (Basar-Eroglu et al., 1992;Klimesch et al., 2000).
The large-scale CPP signal that is consistent with evidence accumulation suggests it should be easy to find a neural correlate of evidence accumulation in electrocorticography (ECoG) signals, which have a much higher spatial resolution. Interestingly, we also searched for neural accumulation signals in a recognition memory task similar to the one represented here in ECoG data from epilepsy patients. In that case, we did not find overwhelming evidence for a neural accumulator signal (van Vugt et al., 2017;van Vugt et al., 2016). There are several potential reasons for failing to find a clear signal in these studies, even though the task was quite similar to the memory condition in the findings we present here. First of all, the participants in our ECoG study were all epilepsy patients with performance that is much more variable than healthy participants, thereby making it more difficult to find evidence for any effect. In addition, that study had relatively poor coverage over superior parietal and motor areas, which may have reduced chances of observing an electrocortical manifestation of the CPP/ P300.
In sum, we have shown initial evidence that the CPP behaves as a developing decision variable in these more complex decision making tasks, similar to what was previously observed in other tasks. While the results were quite consistent for memory decisions, they often failed to reach significance in the slower perceptual decisions. We further showed that the slope of the CPP could distinguish between trials with low and high drift rates and correlated with model estimates. Together, this is in line with the idea that the CPP reflects the process of evidence accumulation, and it suggests that the brain uses a similar mechanism for both perceptual and memory decisions. Future research should replicate these findings in a situation where the two tasks being compared are more similar in their response times and accuracies.

Participants
Participants were recruited in the city of Groningen and participated in return for a monetary reward. From a total of 23 participants, 11 were female. Ages ranged from 17 to 36 with a mean of 23.9 and a standard deviation of 4.2. All participants had normal or corrected-tonormal vision, and all were right-handed. Informed consent was obtained before testing. The protocol was approved by the Ethical Committee of the Department of Psychology at the University of Groningen and was conducted in accordance with the Declaration of Helsinki. Participants were paid €10 per hour for their participation, plus a bonus based on the number of points they obtained in the task (typically around €7).

Stimuli
Our experiment consisted of alternating blocks of two tasks. In the perceptual decision making task participants saw two images of a face on the screen, each facing outward (see Fig. 10(b) for an example), and were asked to decide whether these two faces represented the same person. In the recognition memory task shown in Fig. 10(a), participants studied two faces (which made up the memory set) and maintained those faces over a variable delay period of several seconds. After this delay, they were shown a test face (the probe), for which they needed to indicate whether or not it matched one of the two studied faces.
The images of faces we presented were created using the Basel Face model (Paysan et al., 2009). Based on 3D-scans of 200 real faces, this Matlab toolbox makes it possible to generate synthetic faces, which differ from an average face by a desired distance. We used this model because it captures many different attributes of faces and yet can be reduced to a set of principal components and their standard deviations. These principal components can then be varied to create a set of face stimuli that systematically covers a region of face space. In this case, our faces were defined by varying the first three principal components (PCs) of this face space. For the memory task, we used the space spanned by three standard deviations above and below the average face (i.e., the set of vectors comprising all possible combinations of the values [−3, 0, 3]). The use of standard deviations ensured that the perceived magnitudes of change on each PC would be comparable. This gave us a set of 27 different faces, with varying degrees of similarity between them that can be quantified as vector distances (e.g., face [−3 −3 −3] is very similar to face [−3 −3 0] but maximally dissimilar to face [3 3 3]).
We aimed for the two tasks to be comparable in difficulty. A behavioural pilot experiment showed that if the same faces were used in both tasks, performance in the perceptual task was significantly better than in the memory task. Therefore, we created a set of faces for the perceptual task where each PC was only varied by two standard deviations (i.e., [−2, 0, 2]), which made them more difficult to discriminate. In addition, we made the task more challenging by showing the two faces in the perceptual task facing different directions, rather than two identical images facing ahead. This had the added advantage that participants were encouraged to process the faces holistically, rather than comparing the two images as abstract stimuli.

Procedure & design
Participants were awarded a point for every correct trial, which at the end of the experiment was converted into a monetary reward. Because instructions to minimize response time while maximizing accuracy are typically very confusing to participants, we followed the methods used by van Vugt et al. (2012) and Simen et al. (2009), in which participants were given the instruction to trade off their speed and accuracy such that they would maximize their points in every block of four minutes. In other words: if they made faster decisions, they could do more trials in four minutes and therefore potentially acquire more points. However, if they decided too quickly, they would make many mistakes, and therefore not get as many points. Participants were allowed to take a brief pause between blocks. To obtain enough trials of both tasks, two blocks of the (longer) memory task were followed by one block of the perceptual task, and this cycle was repeated four times. Fig. 10(a) shows an example trial of the perceptual face discrimination task. Each trial of this task started with a 500 ms presentation of a fixation cross. After this, there was a wait period during which the screen was blank. The duration of this period was jittered between 700 and 775 ms (drawn from a uniform distribution), to avoid accidental temporal correlations between task events and on-going oscillations in the EEG. After the wait period, two faces were shown, each facing outward. Participants could respond as soon as the images appeared on the screen. They were instructed to press an M using their right hand if they thought the two faces were the same, and a Z using their left hand if they thought the faces were different. After the decision, feedback on accuracy and reaction time was presented for 500 ms. Then, after a 400-475 ms wait, the next trial started automatically.
In the memory decision task, participants needed to remember two faces over a delay period and then compare them to a test face. An example trial is shown in Fig. 10(a). Again, each trial of this task started with a 500 ms fixation stimulus period followed by a 700-775 ms blank wait screen. After this, the memory set was presented for 2000-2075 ms. Both faces in the memory set were shown at the same time. This was followed by a 1000-1150 ms blank delay period, after which the probe was shown until the participant responded. The response, again, was indicated by an M for a match, and a Z for a nonmatch. Feedback was shown for 500 ms and then a blank screen was shown for 400-475 ms until the automatic onset of the next trial. The experiment was presented using the software E-Prime (Psychology Software Tools, Pittsburgh, PA). The task can be downloaded here: https://figshare.com/articles/Face_Sternberg_task_E-prime_/5817420.
All participants performed the same 12 blocks of trials, but the order of blocks was randomized for each participant with the constraint of two memory blocks being followed by one perceptual block. Each block was constructed in such a way that half of the trials (randomly chosen) were match trials (i.e., the two faces were the same in the perceptual task, or the probe item was part of the memory set in the memory task) and the other half were non-match trials. The study set in the memory task always consisted of two different faces, and match probes were identical to either the left or the right study item on an equal number of trials. Furthermore, to avoid confusion due to recency, the same face could not appear on two occasions less than three trials apart. Within these restrictions, items were randomly assigned to trials.

Behavioural analysis
We computed average response time and fraction of correct trials. No outliers were removed. Average accuracies and response times across participants were compared between the memory and perceptual decision making task by means of a paired t-test. The DDM was fit with the DMA toolbox Tuerlinckx, 2007, 2008). Every participant was fitted individually, and only drift, decision threshold and non-decision time were allowed to vary between task conditions (memory versus perceptual task). The starting point of evidence accumulation was kept fixed halfway between the two decision thresholds. This version of the DDM, the "pure DDM" has previously been shown to be more stable than DDMs in which all parameters vary with conditions. Response times more than 3 standard deviations from the mean were removed before fitting the model. The model was fitted by minimizing the discrepancy between modelled and empirical response time distribution quantiles (10,30,50,70,90), with a chi-squared minimization procedure. For 16 out of 23 participants, the pure DDM model in which only the non-decision time varied between the conditions fit the data significantly better according to a Bayesian Information Criterion (BIC) than models in which no parameters were allowed to vary between the two conditions (perception versus memory), as well as models in which only the drift rate, or only the threshold varied between the perception and memory conditions, or more complex models in which non-decision times together with drift and/or decision thresholds varied with conditions. For our analysis of a potential relation between CPP slope and DDM parameters (see below), we divided the trials into a group of low and high slopes. We then considered the low-and high-slope trials to be separate conditions in the DDM fit (thereby increasing the number of conditions to four: low-slope perceptual trials, high-slope perceptual trials, low-slope memory trials, high-slope memory trials). For these model fits, we again chose for the pure DDM, which allowed us to examine what parameters (drift, threshold, non-decision time) differed between the low-and high-slope trials.
Most statistics were done with the Statistics toolbox in Matlab, but Bayes Factors were computed in JASP (Wagenmakers et al., 2017). Bayes Factors allowed us to verify whether the absence of significant results reflected the true absence of a difference between conditions, or alternatively, whether the data were too noisy to say anything about this comparison.

EEG recordings
EEG activity was measured using 30 tin scalp electrodes (Electro-Cap International) placed according to the 10/20 system. In addition, vertical and horizontal EOG activity and both mastoids were measured. Impedances were kept below 10kΩ for all electrodes. Activity was amplified and filtered using the Refa system (TMS International BV) and digitized at a sampling rate of 500 Hz using the software Portilab (TMS International BV).
After recording, data were re-referenced to the average of the mastoids and a bandstop filter was applied around 50 Hz to remove line noise. Artifacts were identified using three criteria. First, trials where an electrode's normalized amplitude exceeded 3.5 standard deviations from the group mean (calculated across all electrodes and trials for each participant) were excluded from analysis for that electrode. The same was done for trials where the signal's kurtosis exceeded 4.5 (Delorme and Makeig, 2004), or its variance was more than 1.75 standard deviations above the group mean. These thresholds were determined empirically by optimizing removal of visually identified artifacts in a subset of participants and then applied uniformly to all data. Finally, electrodes that contained artifacts on more than 50% of trials were excluded from further analysis entirely, and the same was done for trials in which more than 50% of electrodes contained artifacts. This led to the complete removal of approximately 5.4% of channels, and 8.0% of trials, across all participants. In addition, an average of 5.6 out of 30 channels were removed from individual trials due to incidental artifacts. Eye artifacts were removed by decomposing the data into independent components (as implemented in the Fieldtrip software), then removing components with a time course and topography suggestive of blinks or saccades, and finally recomposing the EEG using the remaining components.

EEG analysis
Data analysis was performed using the FieldTrip toolbox for EEG analysis (Oostenveld et al., 2011). Prior to analysis, the EEG signal was baseline corrected using the 300 ms before the probe stimulus onset as a baseline. In line with O'Connell et al. (2012), the centro-parietal positivity was defined as the average signal at electrode CPz and two of its neighbours, CP1 and CP2.
To examine whether CPP amplitude could predict reaction time (RT) on individual trials, and at which points in the trial this was the case, we averaged the time courses of electrodes CPz, CP1 and CP2 and performed a linear regression between the amplitude of this signal and RT for each time sample of each participant's data. We then used for each time one-sample a t-test to compare the regression coefficients of all participants against zero. A false discovery rate (FDR) procedure (Benjamini and Hochberg, 1995) was applied to the resulting p-values to correct for multiple comparisons. This resulted in a range of time bins for which there was a significant correlation between CPP amplitude and subsequent reaction time. While our preferred method for the correction for multiple comparisons is the cluster-based approach described in the next paragraph, this cluster-based method works only in Fieldtrip for comparisons between groups, and not for a regression. For this reason we resorted to FDR for this analysis, while we used clusterbased multiple comparison corrections for the other analyses.
To test if and when CPP amplitude could distinguish between correct and error trials, we performed an independent samples t-test comparing the average of the correct and error trials at each time sample on the EEG signal of the three CPP electrodes. Correction for multiple comparisons was performed using a cluster-based method (Maris and Oostenveld, 2007). Specifically, the t-statistics were binarized using a significance threshold corresponding to a p-value of 0.05, after which they were clustered in space (the three neighbouring channels) and time. Neighbours were defined with the Fieldtrip template elec1010_neighb.mat, and the minimum number of channels in a cluster was set to one. Each cluster was characterized by the sum of the t-statistics comparing the average correct and error CPPs for each participant. These summed t-statistics were then compared to clusters of t-statistics obtained from a Monte Carlo simulation with 10,000 iterations for which in each iteration, the difference between correct and error CPPs for each participant was randomly flipped in sign. This resulted in a range of time samples at which CPP amplitude was significantly different between whether or not the upcoming response was correct.
A similar method was used to examine whether CPP amplitude significantly differentiated between trials with different degrees of difficulty. Trials of both tasks were divided into four bins based on their (summed) similarity values (which determines task difficulty and therefore should change the speed of the accumulation process). For the perceptual task, the difficulty was simply the degree of similarity (between 0 and 1) between the two faces shown. For the memory task, we used as difficulty measure the sum of the similarities between the probe item and each of the two items in the memory set. Since there is only one value of similarity for match probes in the perceptual task (i.e., 1: the two faces are identical), we only used non-match trials of both tasks for this comparison. Significance was determined on the CPP signal averaged per category within each subject using the same Monte Carlo method described for correct and incorrect trials, the only difference being the use of a multivariate F-test instead of a t-test because in this case four categories were being compared (rather than two).
In supplementary Figs. S2-S4, we show response-locked CPPs. Those were created on the basis of the stimulus-locked CPPs, which were shifted in time such that they were all aligned to the moment of the response. We chose the duration of the trials on the basis of the average response time of the respective task conditions.
To determine whether the CPP amplitude was in fact predictive of accumulation activity, we examined the relation between CPP slope and model parameters. If the CPP is an accumulation process, then more shallow slopes should be associated with lower DDM drift rates than steeper slopes. For this reason, we computed the slope of the CPP between the moment of response and 300 ms before that on each individual trial, and used that to separate the trials into low slopes and high slopes (analogous to the methods used in Ratcliff et al. (2009) but now with slopes instead of classifier amplitudes on a single time point). We then compared parameter estimates between the shallow-slope and steep-slope trials across all participants with a dependent-samples t-test to assess whether there was a difference in drift rate between shallowslope and steep-slope trials, as there should be if the CPP reflects the evidence accumulation process.
EEG data (in Fieldtrip format after preprocessing) and behavioural data (in matlab structs) can be downloaded from https://unishare.nl/ index.php/s/CNNUyD3r16ZC5hy.