Motivation improves working memory by two processes: Prioritisation and retrieval thresholds

Motivation can improve performance when the potential rewards outweigh the cost of effort expended. In working memory (WM), people can prioritise rewarded items at the expense of unrewarded items, suggesting a fixed memory capacity. But can capacity itself change with motivation? Across four experiments (N = 30 – 34) we demonstrate motivational improvements in WM even when all items were rewarded. However, this was not due to better memory precision, but rather better selection of the probed item within memory. Motivational improvements operated independently of encoding, maintenance, or attention shifts between items in memory. Moreover, motivation slowed responses. This contrasted with the benefits of rewarding items unequally, which allowed prioritisation of one item over another. We conclude that motivation can improve memory recall, not via precision or capacity, but via speed-accuracy trade-offs when selecting the item to retrieve. v0.14.1; JASP Team, 2020) to generate inclusion and exclusion Bayes Factors to determine the evidence for or against the null hypothesis. Inclusion Bayes Factors (BF incl ) give the ratio of likelihoods of all the models with a specific term included against all those models without that term, and represents the evidence for including that term in the model. BF incl greater than 1 represent evidence for including the term, while values below 1 represent evidence for excluding it. Exclusion Bayes Factors (BF excl ) are the reciprocal of BF incl , and give the evidence for excluding that effect.


Introduction
Working memory (WM) is a short-term, flexible store for manipulating items in memory. There has been a wealth of research aimed at finding the limits of WM capacity, with most agreeing there is a fixed limit on the amount of information that can be stored, which can be spread between the items to be stored (Bays & Husain, 2008;Burak & Fiete, 2012;Miller, 1956;Schneegans & Bays, 2016;van den Berg, Shin, Chou, George, & Ma, 2012), although the exact limit is still unclear. Most such studies have measured WM without any associated motivation manipulations, calling into question the claims of finding the upper limit of WM capacity. A fixed limit seems at odds with our subjective experience of memory, where it feels like we can improve our performance when we are truly motivated to do so, or let items slip from memory when not motivated. Here we investigate whether people can actually improve WM when motivated, and what mechanisms might underlie this. offset these costs and motivate us to spend extra effort for the greater value outcomes on offer (Manohar et al., 2015).
Studies employing such incentivised motivation in WM have found that people can improve their WM when incentivised (Gilbert & Fiez, 2004;Kawasaki & Yamaguchi, 2013;Sanada, Ikeda, Kimura, & Hasegawa, 2013), and ERPs have been used to suggest that incentives boosted capacity (Kawasaki & Yamaguchi, 2013). However, these studies either did not find or did not report any associated changes to reaction times, making it hard to tell whether they reflected true improvements in capacity, or simply a trade-off with speed. It is clear that if incentives are offered only for remembering certain items, then people prioritise those items, resulting in relatively faster and more accurate responses than for the unrewarded item (Hitch, Hu, Allen, & Baddeley, 2018b;Hu, Allen, Baddeley, & Hitch, 2016;Hu, Hitch, Baddeley, Zhang, & Allen, 2014;Klink, Jeurissen, Theeuwes, Denys, & Roelfsema, 2017). The cost of worse memory for the non-prioritised items suggests that there is indeed a fixed WM capacity or resource which is strategically reallocated between items depending on their value and likelihood of being probed (Hitch et al., 2018b;Klink et al., 2017). Such prioritisation can also make items more susceptible to retroactive interference from a 'suffix' item (Hu et al., 2016), another form of trade-off. This trade-off even operates between different modalities, where rewards for visual items improved visual WM capacity at the cost of auditory WM capacity on a concurrent task (Morey, Cowan, Morey, & Rouder, 2011).
Prioritisation has usually been measured using a pre-stimulus cue (Hitch, Hu, Allen, & Baddeley, 2018a;Hu et al., 2016), and comparing this against cues given after the stimuli (during maintenance or at retrieval) suggested prioritisation was only possible if the cues were present before (or possibly during) encoding (Klink et al., 2017). But encoding is not the only stage of WM, and it may be that maintenance and retrieval stages are open to modulation by reward. For instance, sustained activity and synaptic plasticity are both proposed to play a role during WM maintenance (Manohar, Zokaei, Fallon, Vogels, & Husain, 2019;Schneegans & Bays, 2018;Stokes, 2015) and are modifiable by dopamine (Shen, Flajolet, Greengard, & Surmeier, 2008;Vijayraghavan, Wang, Birnbaum, Williams, & Arnsten, 2007), which primarily signals reward and motivation (Berridge, 2007). WM retrieval, although relatively understudied, has been described by a decision process (Pearson, Raskevicius, Bays, Pertzov, & Husain, 2014), where memory evidence is accumulated until a decision threshold is reached. To optimise reward, such decision processes ought to be under motivational control, since simple adjustments to the threshold induce speed-accuracy trade-offs, making responses either fast and inaccurate, or slow and accurate (Bogacz, Brown, Moehlis, Holmes, & Cohen, 2006). The decision stages of WM are indeed flexible (Pearson et al., 2014) and presumably amenable to motivational control, so it is surprising that no such links have been found yet.
One explanation is that the reward manipulations used in the prioritisation tasks usually take the form of differential relative rewards, where some items are worth more than others, allowing strategic prioritisation of a fixed resource in order to maximise expected rewards. However, this will not always be the case in real life; sometimes we need to remember everything well, in which case improving performance for all items in memory would be favoured, if it were possible. The resource rational theory of WM (van den Berg & Ma, 2018) suggests that rather than a fixed WM capacity, it is the costs of decreasing WM errors that limit performance, giving a mechanism for rewards to offset this cost and thus improve WM. A recent study from these authors failed to find such a reward-based improvement (van den Berg, Zou, & Ma, 2020), which may have been due to the between-subject design and focus on precision rather than item selection during analysis. WM motivational processes could involve better encoding for all items (e.g. increased WM capacity), better maintenance over time (e.g. via reduced drift and decay of memories), or better retrieval of items (e.g. slower and more accurate decision processes). We aimed to see whether people can improve their WM when all items are highly rewarded (and thus prioritisation is not possible), and if so, what mechanisms are invoked.
We first measured the reward benefits on encoding, maintenance and retrieval phases of WM. We next replicated our findings with a greater focus on the encoding stage. Thirdly, we investigated whether reward may be affecting preparation of items for retrieval, by cueing an item during the delay. We then directly compared the motivational effects of equal rewards, where prioritisation is not possible, to unequal rewards, which allow prioritisation. Finally, we present data from control tasks without any WM demands to investigate how much of the reward effects can be attributed to the motor aspect of responses rather than WM per se.

Experiment 1
In the first experiment, we tested whether people could improve their WM accuracy when higher rewards were available, and which stage of WM this affected: encoding; maintenance; or retrieval.

Procedure
Each experiment used a variation of an orientation WM task. The basic task showed a fixation target on a black background, followed by a set of stimuli (coloured arrows randomly selected without replacement from 8 colours, 96 pixels long, 80 pixels wide) shown spaced around an invisible circle. After a short delay, a coloured dot was shown to probe the memory of one item (randomly chosen). Participants were instructed to respond with the orientation of the item of that colour, and moved the mouse to respond; the dot became and arrow pointing to the cursor, and participants clicked to submit their answer. The cursor could not move further than 300 pixels from the screen centre to avoid it hitting the edges of the screen. Response time was unlimited. We recorded the trajectory of these mouse responses.
Participants were cued about the value of a correct response, with a 1p or 50p coin shown before (pre-cue) or after (post-cue) the stimuli. Low and high pitched tones (666 Hz or 1000 Hz, 500 ms) were played with the 1p and 50p coins, respectively. Accurate responses received the rewards, and this was calculated using a median rule; responses more accurate than the median accuracy over the last 20 trials were rewarded, responses less accurate were unrewarded. This rule was applied to each reward level separately, to keep the average reward rate for both reward levels at 50%. Rewards received (+0p, +1p or +50p) were printed on the screen, and a bar along the bottom was incremented to show the total won. Wins were accompanied by soundsa high-pitched 'ding' for low Fig. 1. General methods. a) Timeline of a single trial from experiment 1 (full details of each experiment in text and later figures). Participants see an incentive cue, followed by the stimuli, and after a delay must reproduce the angle of the arrow matching the colour of the circular probe, and receive the reward depending on their accuracy. b) Histogram of all angular errors for experiment 1. c) Cartoon illustrating how the mixture modelling works, with a response distribution which is a mix of responses centred on the target, a non-target (misbinds) and uniform random guessing. d) Illustration of the Linear Ballistic Accumulator (LBA) model. The mixture model is used to generate the probability that each response is a target, misbind, or guess, with the former being classed as correct and the latter two incorrect. The model accumulates evidence towards a threshold for correct and incorrect responses, controlled by separate drift rates; the first one to pass the threshold gives the response, with a response time that includes non-decision time. Starting bias and the standard deviation of drift rates are fixed in the model. e) Sample mouse trajectory during the response (white line), with the crosses showing the position at 6 interpolated time-points. f) The angular error for the sample trajectory, with the same interpolated time-points shown. g) Kernel-smoothed densities of the angular errors across all trials in experiment 1 at the 6 interpolated timepoints. The peak at zero grows over the response. h) Sample mixture model parameters using the angular errors at each interpolated time-point from panel g. Imprecision is plotted on the right-hand axis, and the probability of guessing and misbinding is on the left-hand axis. The parameters decrease over time, with imprecision decreasing the most, and the others levelling off earlier.
rewards, and a cash register sound for high rewards, each lasting 600 ms. The feedback screen lasted 700 ms.
Trials were played in 10 blocks of 48 trials, with unlimited time between blocks, and were pseudorandomised in order. There were 8 practice trials at the beginningeach trial type was practiced once (random order), and the increments to the total bar were not saved during the practice.
Participants were instructed to respond quickly and accurately, and were told they would win the coin if their response was close to the real orientation, and would be paid a bonus proportionate to their total winnings on all trials. Their total winnings were divided by the maximum amount possible, and scaled to between £0 and £4, plus a base rate of £8 (mean bonus = £2.80).
To investigate encoding, we presented the reward cue either before or after the stimuli; if motivation improves encoding, the trials where 50p was shown before the stimuli should have a larger benefit than those where it was shown afterwards. In order to match the perceptual salience of the reward cues, a beige dot (luminance matched to the two coins) along with a white-noise tone was presented at the time the cue was not shown (i.e. after the stimuli when reward was pre-cued, before the stimuli when reward was post-cued). To investigate maintenance, we gave either a short (2 s) or long (4 s) delay; if motivation improves maintenance there should be different benefits depending on delay. If neither encoding or maintenance were found to benefit memory, this would suggest a retrieval benefit. Thus, there were three factors (high/low reward, pre/post cue, short/long delay), tested in a factorial design, giving eight conditions.

Analysis
All analyses were performed in Matlab R2018b (see Supplementary Materials for full details). We used repeated-measures ANOVA (rmanova function from the matlib repository; www.github.com/sgmanohar/matlib) to analyse mean absolute angular error, and reaction time (RT). RTs were log-transformed for statistical testing (figures show median raw RT). When presented with null effects, we used Bayesian ANOVA (JASP v0.14.1; JASP Team, 2020) to generate inclusion and exclusion Bayes Factors to determine the evidence for or against the null hypothesis. Inclusion Bayes Factors (BF incl ) give the ratio of likelihoods of all the models with a specific term included against all those models without that term, and represents the evidence for including that term in the model. BF incl greater than 1 represent evidence for including the term, while values below 1 represent evidence for excluding it. Exclusion Bayes Factors (BF excl ) are the reciprocal of BF incl , and give the evidence for excluding that effect.
It is possible that people might be able to improve their accuracy simply by spending more time 'fine-tuning' their responses to make them as accurate as possible, which would result in longer RTs also. This would be in contrast to the low-reward trials where people would get close to their desired orientation and think "good enough". To investigate this possibility, we looked at how accuracy developed over the time-course of the response trajectory. We interpolated 100 time-points between the start and end of the mouse movement, and calculated the absolute angular error from the angle of the mouse cursor to the target location at each of these points (see Fig. 1e-g). As there can be differences in response onset times, the interpolation allows us to compare the trajectories in the same unitspercentage of movement completed, although these could still be influenced by the delay in starting the movements. If rewards led to fine-tuning, high-rewards should decrease error most strongly at the end of the trajectory, whereas any reward benefits present at the onset of movement would argue against this explanation, regardless of any differences in movement preparation time. We compared the errors between high and low reward conditions using cluster permutation testing (collapsed across other conditions), with family-wise error rate controlled at 0.05 (using permutationOLS from matlib).

Modelling.
To see what types of errors were affected by motivation, we fit mixture models to the angular errors. The misbinding model (Bays, Catalao, & Husain, 2009) decomposes errors into three sources ( Fig. 1c): imprecision (the width of the Von Mises distribution centred on the target orientation); misbinding (the proportion of responses centred around non-target locations); and random guesses (the height of a uniform distribution spread over the entire circle of possible orientations). We fit this model to each participant and condition separately via maximum likelihood estimation, and report the mean parameters for the misbinding model. The model fitting always converged. The parameters in this model are not entirely independent, as the proportion of guesses, misbinds and target responses must sum to 1, and innacurate responses for items can be due to guessing or imprecision. To see whether to include the misbinding component, we fit a model without the misbinding component (i.e. only imprecision and guessing; Zhang & Luck, 2008) to all trials (i.e. across all conditions) per person for all experiments, which fit worse (Δ mean BIC = 13.7; BIC was higher in 92/124 individuals), indicating evidence for including the misbinding component in our analysis.
To investigate how these sources of errors developed during the course of responding, we applied the mixture model to different points during the response movement (Fig. 1h); this model is identical to that used for the final response, but now applied at each timepoint of the interpolated response trajectory. We calculated the angular error between the mouse at these time-points and the target's orientation, and fit the mixture model to this. This gave us estimates of imprecision, misbinding and guessing during the response movement, and we compared the effect of high rewards with cluster-wise permutation tests. This allowed us to see whether, for example, reward only improved target selection towards the end of the responses, or whether it improved selection before the movements even started.
To characterise strategic changes in response time with reward, we applied a sequential sampling model that allows interpretation of accuracy and RT together. An ideal model for these responses would harness both error and reaction times simultaneously. However at the time of writing, no packages are available to fit joint distributions of RT and circular response errors that allow for misbinding of responsesa technique known as circular drift diffusion modelling (Kvam, 2019;Smith, 2016). Hence, we first split the sources of error and the conducted accumulator modelling on the RTs with the Linear Ballistic Accumulator (LBA) model (Brown & Heathcote, 2008), which fits distributions of RTs for correct and incorrect responses (Fig. 1d). As continuous errors are not easily classed as correct or incorrect, we used the misbinding model to generate the posterior probability of each trial being a guess, misbind or target response, using the fitted parameters. We then took the most likely of those three (i.e. the highest probability) to classify the trial. Target responses were taken as correct responses, while guesses or misbinds were grouped together into incorrect responses. Thus, the decision we are modelling is the retrieval process for selecting an item. We fit the model to each participant and condition separately, using first a grid search and then fminsearch with maximum likelihood estimation. The free parameters were the drift rates for correct and incorrect responses, non-decision time, and the threshold; starting bias was fixed at zero and SD of drift noise was fixed at 1.

Descriptive statistics
Participants had smaller absolute angular error when they were playing for 50p than 1p (p <.0001, see Fig. 2c and Table 2 for statistics), while they had larger errors when delays were long (vs short; p <.0001). There was no difference between trials with pre-or

Fig. 2.
Reward improves WM accuracy via lower guessing and slower RT in Experiment 1. a) Trial structure for Experiment 1. A low/high reward was shown either before or after the four stimuli, and a beige placeholder dot was shown at the other time. After a short/long delay, one stimulus was probed by the colour of the dot, which became an arrow tracking the cursor when the participant began their response. After they clicked to give their response, a reward screen was shown, tracking their current and total rewards. b) Table showing the factors in the experiment: reward level, cue-time, and delay length, which were all tested in a factorial design giving eight different trial types. c) Higher rewards decreased angular error, independent of cue-time or delay length. d) Reward slowed RT, regardless of other factors. e) Scatterplot of mean reward effects on error and RT for each participant. The majority of points lie above and left of zero, indicating rewards increase RT and reduce error in most participants. The line shows a Spearman's correlation (with 95% CI), which was significant (r = − 0.4204, p =.0216). f) Reward decreased angular error from the start of the response movement (black bar = p <.05, cluster permutation tests). g) Imprecision of responses was not affected by reward. h) Reward decreased guessing, i) but did not significantly affect misbinding. All error bars show SEM.
post-stimuli, although there was a cue-time * delay interaction (p =.0116) as there were greater errors when short-delays followed a post-cue than a pre-cue. However, there were no interactions between reward and cue-time or delay (p >.05), which suggests that reward had the same benefit before or after encoding, and with different maintenance durations. Accordingly, there was evidence in favour of the null hypothesis (Bayesian ANOVA) for motivation not improving encoding (BF excl = 4.292, weak evidence), or maintenance -although the evidence regarding maintenance was inconclusive (reward*delay BF excl = 1.658).
RT was significantly slowed by higher rewards (p <.0001; Table 2, Fig. 2d), with no other significant effects or interactions (p >.05). This suggests people slowed their responses when motivated to perform better, which coincided with increased accuracy. Supporting this, participants who were slowed by reward also decreased their error more (r = -0.4204, p =.0216, Fig. 2e). But are people only more accurate because they are spending more time making their responses?
To answer this, we looked at accuracy during the response movement itself. We measured the absolute angular distance from the target orientation to the response orientation as a function of time during the movement. Permutation tests showed high rewards significantly lowered errors from the very start of the movement through until the end (Fig. 2f, cluster of p <.05). This means that people were not simply continuing their responses for longer to become more accurate when high rewards were available, but that they were already more accurate even when they started their response.

Mixture modelling
We fit the misbinding model to each condition separately to see whether reward was improving accuracy by decreasing Fig. 3. Reward improves WM accuracy via less misbinding, and slows RT in Experiment 2. a) Low or high rewards were shown before or after stimuli (beige dot shown at other time), with a fixed 2000 ms delay period. On 12.5% of trials, after the response phase, participants were asked which reward they were playing for, with incorrect answers negating any rewards won for that trial. b) Factors for this experiment were reward level and cue-time. c) High rewards decreased mean absolute angular error for both pre-and post-stimuli cues. d) Rewards slowed RT, and this effect is greater for pre-cues. e) Scatterplot of mean reward effects on error and RT for each participant. Most participants are left of zero, indicating rewards decreased error, but there was no significant correlation with reward effects on RT (r = 0.0264, p =.8821). f) Response trajectories show reward significantly decreases errors across the whole movement. g) Reward did not affect imprecision. h) Reward did not affect guessing. i) Reward significantly decreased misbinding. All error bars show SEM.
imprecision, misbinding, or guessing. Rewards did not affect imprecision (p =.978, BF excl = 14.493; see Table 2 and Fig. 2g) but increased target selection (p <.0001) and decreased guessing ( Fig. 2h; p <.0001), without affecting misbinding ( Fig. 2i; p >.05, BF excl = 4.065). This pattern contrasted with the effect of delay, which increased imprecision (p =.0065), and decreased target selection (p <.0001) by increasing both random guessing (p <.0001) and misbinding (p =.0029). Importantly, there were no interactions between reward and delay or cue-time (p <.05, BF incl < 1; although cue-time and delay did interact for target selection (p =.0328) as delay had less effect if the cue was given after the stimuli). This all suggests that reward was decreasing error by increasing target selection, which was independent of cue-time or delay.

LBA modelling
Reward increased the threshold for the item-selection decision (p =.0415; see Supplementary Table S1 for full statistics), without affecting any other parameters (p >.05). In contrast, long delays decreased the drift rate for the target (p <.0001) and the threshold (p =.002). This suggests that when deciding which stimulus to select during retrieval, long delays can decrease the amount of information available, while rewards increase the decision threshold, making the decision more conservative by trading speed for accuracy.

Discussion
Experiment 1 showed that people are able to improve their WM accuracy when motivated by rewards. However, the benefit was not modulated by whether the incentives were seen before or after encoding, indicating that the motivation benefit cannot be explained solely by improved encoding. Additionally, we found no evidence that rewards enhance maintenance of items in memory, as a function of time. One interpretation of this is that motivation improves retrieval. In line with this, RTs were slowed by rewards, suggesting people may be responding more accurately by slowing their responses (i.e. a speed-accuracy trade-off). Mixture modelling revealed that motivation was not affecting the precision of the memories, but rather the target selection stage of the retrieval process, and RT Table 3 Statistical outputs from two-way repeated measure ANOVA (reward*cue-time) on the behavioural and model parameters in Experiment 2. Significant effects are highlighted in red, * = p <.05, ** = p <.01, *** = p <.001.
modelling showed that this could be accounted for by increased decision thresholds. Critically, the reward benefits were seen from the beginning of the response movement, suggesting that people are not simply spending longer fine-tuning their motor responses.
The lack of encoding benefit of motivation was surprising, as previous studies on prioritisation have found that showing rewards before encoding gave greater benefits (Allen & Ueno, 2018;Klink et al., 2017;Wallis, Stokes, Arnold, & Nobre, 2015). We thus aimed to replicate the findings, but with greater emphasis on the encoding stage of the task.

Methods
We altered the task to make it easier for people to use motivational cues at encoding ( Fig. 3a; see Supplementary Materials). We used a longer reward-cue duration, and gave six blocks of pre-cued trials, and six blocks of post-cued trials (pseudorandomised block order) to remove any switching costs of cue-time. A single maintenance delay was used. On 12.5% of trials, after memory recall, we asked participants which reward they were playing for, to check they remembered this information ("catch trials"). Incorrect answers meant participants did not win any rewards even if their orientation response was accurate enough. There were 12 blocks of 32 trials, given in a pseudorandomised order, to 34 participants.

Descriptive statistics
Participants were quicker and more accurate to answer the catch trials when the reward was high (p <.0001; see Supplementary Materials), but catch trial accuracy did not affect angular error (p >.1), and moreover catch trials only differed from non-catch trials after memory recall, so we collapsed them together for subsequent analysis.
Reward decreased mean absolute error ( Fig. 3c; Table 3; p =.0016), as did pre-cues (p =.0298), but there was no interaction (p =.9766; BF excl = 1.869), replicating our Experiment 1 finding. RT was slowed by rewards as before ( Fig. 3d; Table 3; p <.0001), but also by post-cues (p =.0260) and these interacted (p =.0414) as the slowing was greater if pre-cued. In this experiment there was no association between reward-induced changes in RT and error (r = 0.0264, p =.8821; Fig. 3e). The reward benefit on error was again seen right from the start of the response movement and persisted until the end ( Fig. 3f; p <.05), confirming that rewards do not improve accuracy only by slowing responses.

LBA modelling
No parameters were significantly affected by rewards, cue-times or the interaction of the two (p >.05; see Supplementary Table S1).

Discussion
We replicated the motivational benefit of reward on WM, and again found it did not differ when the reward cues were given pre-or post-stimuli, suggesting encoding was not being modified by motivation. The benefit was again confined to the target selection source of error, not imprecision, and occurred with slower RTs, and was present from the start of the response movement. However, motivation reduced misbinding rather than guessing, and the LBA modelling did not find any differences in threshold (or other parameters).
Experiment 2 confirms that people do not improve their encoding of multiple WM items when motivated to do so, but can still improve performance, perhaps at the target selection stage of retrieval. In the first two experiments, participants needed to select one item based on the probe colour, and then retrieve and report its orientation. Feature retrieval may require that the item is first moved into the focus of attention so it is active, and the associated features can then be recalled (Manohar, Zokaei, et al., 2019). It is unclear whether motivation is improving the selection process, the refocusing of attention to the selected item, or both processes.

Experiment 3
We wanted to find out what mechanisms could underlie the observed benefit of motivation on retrieval target selection, so Experiment 3 investigates whether motivation facilitates moving a selected item into the focus of attention (FoA). We asked participants to use one of the items in memory during the delay, independently of which item was probed at the end of the trial. This "incidental retrocue" brings one item into the FoA, improving subsequent recall for the cued item (Souza & Oberauer, 2016). If reward facilitates shifts of attention, it should benefit items that were initially outside the FoA more than those already in the FoA, leading to interactions of reward and retrocue.

Methods
We presented the reward cue, then two arrows ( Fig. 4a; see Supplementary Materials), on the left and right side of the screen. During the delay, we printed the colour of one arrow on the screen and asked participants which side of the screen this arrow had appeared on, which they answered with a left/right mouse button click. This incidental retrocue was uninformative about which item had to be recalled at the end of the trial: on half the trials the cued arrow was probed (congruent trials) and on the other half the uncued arrow was probed (incongruent trials). There were 10 blocks of 32 trials, completed by 30 participants.

Descriptive statistics
The retrocue response accuracy and RT were not affected by reward (p >.1), so we removed all trials where participants answered the retrocue incorrectly (1.29% of trials overall). The findings did not change materially if we left these in, or if we excluded the one participant who scored below 95% on the retrocue.
As expected, congruent retrocues decreased memory error ( Fig. 4c; Table 4; p <.0001), as did high reward (p =.0216), but importantly there was no interaction (p =.9394; BF excl = 1.434), suggesting that rewards had the same benefit for items in or out of the Fig. 4. Reward affected congruently and incongruently cued items equally in Experiment 3. a) Trial structure: a high or low reward is shown before the two stimuli, then the colour of one stimulus is written and the participant must decide if that coloured arrow was on the left or right of the screen (mouse click response). This is the incidental retrocue, which activates that coloured arrow in WM. After a fixed delay, either the cued arrow is probed (congruent trials) or the uncued arrow (incongruent trials), and participants respond. b) The experimental factors are reward level and congruency of the incidental retrocue. c) High rewards decreased mean absolute angular error equally for congruent and incongruent items. d) Congruent retrocues sped up RT, but reward slowed RT. e) Scatterplot of mean reward benefits on error and RT for each participant. The majority of points are above and left of zero, indicating rewards slowed RT and decreased error, and these effects were correlated across participants (r = − 0.6325, p =.0002). f) Response trajectories show smaller errors from the start of the movement. g) Rewards decreased imprecision similarly for congruent and incongruent retrocues. h) Rewards did not significantly affect guessing. i) Rewards decreased misbinding. All error bars show SEM.
Again, rewards improved accuracy from the very start of the response trajectories until the end of movements ( Fig. 4f; p <.05), showing the reward benefit was present at movement initiation.

LBA modelling
Reward did not affect any parameter, while congruent retro-cues increased the target drift rate (p =.0013; see Supplementary  Table S1).

Discussion
Reward decreased errors regardless of congruency of retrocue, suggesting the effect occurs even when attention has already been shifted to an item. We replicated the previous findings that reward slowed RT, and the reward benefit was present from the start of the movement. According to RT modelling, congruency affected drift rate, while reward did not affect any parameter.
Interestingly, the mixture modelling revealed that while reward decreased misbinding to increase target selection, it also decreased imprecision, in contrast to the previous experiments. Unlike the previous experiments, 2 items had to be remembered instead of 4,
which lead to much lower misbinding. This could have shifted the effect of motivation onto imprecision, either because people were unable to improve misbinding due to a floor effect, or simply due to better estimation of the imprecision parameter during fitting, as very few errors were due to misbinding.
In summary we did not find that motivation affected attended and unattended items any differently. However there was inconclusive evidence against the interaction of reward with congruency, meaning we cannot be sure of this. The results suggest but do not confirm that improved focusing of attention is not the mechanism by which reward improves WM retrieval.

Experiment 4
As we have found no interactions of the motivational effect with encoding, maintenance, or focusing attention, which contrasts with previous findings of reward prioritisation, we aimed to replicate previous findings that rewarding one item leads to its prioritisation in WM, and directly contrast this with the motivational strategy found in experiments 1-3. Specifically, previous studies have found that when one item is rewarded more than others, it is prioritised, giving higher accuracy for that item at the cost of lower accuracy for the others, and this effect is greater when the cues are shown before the stimuli, and offers protection from imprecision Reward affects WM differently when prioritisation is possible (Unequal) or not (Equal). a) Trial structure: rewards were shown for each stimulus, and could appear before or after the stimuli. The rewards were equal or unequal, and each could be high or low. One stimulus was probed, and that stimulus' reward was given if correct. b) The factors were reward, cue-time and equality of rewards. c) Rewards decreased mean absolute angular error, especially for unequal rewards. d) Rewards slowed RT, as did pre-cues, and these interacted only for the unequal rewards; when cued after stimuli, unequal rewards did not differ in RT as even 1p items were slowed. e) Scatterplots showing mean reward effects on error and RT for the equal and unequal trials, per person. Equal rewards showed a negative correlation of effects, indicating that people who slowed down had greater error reduction (r = − 0.5012, p =.0053), while no such correlation was seen for unequal rewards (r = − 0.1826, p =.3326). f) Response trajectories show reward decreases error across the whole movement. g) Imprecision is decreased by unequal reward, but unaffected by equal reward. h) Guessing was decreased by unequal rewards, but not equal rewards. When unequal rewards are cued before the stimuli, and people are probed on the low-reward item, they have high guessing rates. i) Equal and unequal rewards decreased misbinding, with the greatest effect for post-cued unequal rewards. All error bars shown SEM.
Additionally, while the previous experiments have attempted to rule out an encoding strategy by a lack of cue-time*reward interaction, here we attempt to rule it out by the presence of an interaction. If equal rewards given before encoding cannot improve memory, this would contrast with the predicted cue-time*reward interaction for unequal rewards, as indicated by a three-way reward*cue-time*equality interaction attributable to the guessing parameter mainly (thought to reflect encoding quality).

Methods
We showed participants two coloured arrows, each of which was worth a separate reward amount if remembered accurately when probed ( Fig. 5a; see Supplementary Materials). The reward cues were given either pre-or post-stimuli, in the same position as the associated arrow, and a fixed delay was used. No sound cues were given, to avoid having to play two different tones at the same time. Thirty participants completed 10 blocks of 48 trials.

Descriptive statistics
We first used a three-way repeated measures ANOVA, followed up by two two-way ANOVAs to compare equal-reward and unequalreward trials separately when there was an interaction with equality.
High rewards for the probed item decreased mean angular error ( Fig. 5c; Table 5; p <.0001), and this effect was greater when the rewards were unequal (reward*equal interaction: p <.0001), indicating that prioritisation conferred greater benefits than motivation.
The separate ANOVAs showed that when both stimuli had equal rewards, high reward decreased error (Table 6; p =.0051), while post-cues increased error (p <.0001) with no interaction (p =.2337). This replicates the findings from the previous experiments. When rewards were unequal, high rewards gave a much larger decrease in error (Table 6; p <.0001).
RT was again slowed by reward ( Fig. 5d; Table 5; p <.0001), and there was an interaction of reward*equality (p <.0001). When rewards were equal between stimuli, high rewards slowed RT as in previous experiments (Table 6; p <.0001), and post-cues sped RT (p =.0016), with no interaction. However, when rewards were unequal, responses to the high-reward stimulus were slower (Table 6; p =.0097) while cue-time did not have a significant effect (p =.0991), and these interacted (p =.0014). People were slow to respond when unequal rewards had been presented after the stimuli, for both low-and high-rewarded items.
In the equally rewarded trials, people who slowed down more for rewards also decreased errors more (r = − 0.5012, p =.0053; Fig. 5e green colours), while no such association was seen for unequally rewarded items (r = − 0.1826, p =.3326; Fig. 5e blue crosses).
Response trajectories were more accurate from the start of movement when probed on a high-reward target ( Fig. 5f; p <.05), indicating the reward benefit was present when responses started.

Mixture modelling
Reward decreased imprecision ( Fig. 5g; Table 5; p <.0001), but this interacted with equality (p <.0001). Separate two-way ANOVA showed this was driven by prioritisation only, with reward decreasing imprecision when the items were unequal (Table 6; p <.0001) but not when they were equal (p =.2031; BF excl = 2.314). Unequal rewards also had a greater benefit on precision when pre-cued (reward*cue-time: p =.0219).
Guessing was decreased by reward ( Fig. 5h; Table 6; p <.0001), which interacted with equality (p =.0049). Separate ANOVAs showed that high reward decreased guesses only when items were unequal (Table 6; p <.0001) but not when they were equal (p =.2595; BF excl = 4.049). In other words, participants guessed less only when asked about a prioritised item. This unequal reward benefit on guessing was greatest when the rewards were presented before the stimuli (reward*cue: p =.0026), which was driven by very high guessing for pre-cued low-reward stimuli, perhaps reflecting lack of encoding for this low value item. There was no reward*cue-time interaction for equally rewarded trials (p =.5950, BF excl = 14.184), giving strong evidence for motivation having the same effect before or after encoding, in line with experiments 1 and 2.
Misbinding was also decreased by reward ( Fig. 5i; Table 5; p <.0001), and there was an interaction of reward*equality (p =.0371). When rewards were equal, pre-cues (Table 6; p =.0247) and high rewards (p =.0061) both decreased misbinding. When rewards were unequal, these effects were much larger (p =.0017; p <.0001). As there were only two stimuli in this experiment, misbinding means they reported the item with the other reward valueso more misbinding when probed on the low-reward item means participants were reporting the high-reward item more, especially when unequal rewards were given. In addition, when unequal rewards were shown after encoding, misbinding was higher for both reward values, suggesting that whatever prioritisation process occurred during the maintenance period (e.g. refocusing of attention, removal of an item from WM) may have also increased misbinding of features between them.

LBA modelling
Target drift was increased by reward (p <.001; see Supplementary Table S1 for full statistics), with no interaction with equality (p >.05). Threshold was not affected by reward (p =.1980), although there was a reward*equality interaction (p =.0092), as equal rewards increased the threshold (p =.0062), while unequal high rewards did not (p =.3570).

Discussion
Reward decreased WM errors when items were equal and unequal, but different patterns of responses were seen. Equally-rewarded items prevented prioritisation, giving a smaller reward benefit, slower RT, increased target selection and decreased misbinding, but no effects on imprecision. These mirror the effects from Experiments 1-3. Fig. 6. Reward decreases error and increases RT even in the absence of WM demands in Experiments 5a & 5b. Experiment 5a: a-e. a) Trial structure: low/high reward was shown before the stimulus, and the stimulus remained on screen during response, removing the need for WM. b) Reward was the only factor. c) Reward decreased error (after excluding first two trials). d) Reward slowed RT. e) Reward benefits on RT and error were uncorrelated across participants (r = − 0.0771, p =.5189). f) Response trajectories show reward only decreased error towards the end of the response. Experiment 5b: f-j. g) Trial structure: Reward was shown before the stimuli, which remained on screen during response to remove the need for WM. h) Reward was the only factor. i) Reward did not significantly decrease error. j) Reward did not significantly slow RT. k) Reward effects on RT and accuracy were uncorrelated (r = − 0.2821, p =.3074). l) Reward decreased errors at the start of responses, but there was no significant difference by the end of the movements.
In contrast, when unequally rewarded items allowed for prioritisation, the highly rewarded item had a large accuracy boost. This accuracy boost was driven by lower imprecision, guessing and misbinding. The low-reward item shows a very large error in these trials, attributable to frequent guessing when the cues were delivered pre-stimuli, and more misbinding when delivered post-stimuli. This suggests that the low-reward item is potentially not encoded properly when pre-cued, and that prioritising the high-reward item after encoding affects feature binding. Importantly, the three-way cue-time*equality*reward interaction on guessing would not support the idea of equal rewards improving encoding and reducing attentional lapses that result in guesses, at least when compared to unequal rewards. The motivational encoding effect was only seen in the unequal rewarded trials, not equally rewarded trials.
Unequal rewards also slowed RT, but when the rewards were presented after the stimuli, both low-and high-reward trials had slower RT, suggesting that the prioritisation process may have slowed responses. The RT modelling demonstrated that while rewards increased drift rates when equal or unequal, they only increased decision thresholds when equal.
All of this indicates that the effects of motivation in WM reported in Experiments 1-3 have a distinct pattern to that reported previously when prioritisation is possible.

Experiment 5a
The majority of the motivational effects reported above coincided with a speed-accuracy trade-off, which potentially could result from people taking longer to fine-tune their final orientation response, in the absence of memory benefits. We have already offered evidence against this, such as the modelling showing reward benefits target selection rather than precision, and the benefit being present from the beginning of the response. However, to test this more directly, we included two motor control experiments where the response was the same but there were no WM demands.

Methods
We used a version of the previous task without any memory demands; one arrow was used and it stayed on the screen during the probe and response phases ( Fig. 6a; see Supplementary Materials). There were 24 trials, 12 for each reward, and this task was run before several of the above experiments, giving 72 participants.
We did not perform mixture modelling as there were no distractors to misbind to, and LBA modelling was not possible since there would be too few trials classed as errors since far<1% of responses were classed as guesses.

Results
There were large practice effects with a steep drop in error and RT over the first two trials, so we excluded these two trials. Reward decreased error (Table 7; p =.0180) and slowed RT (p <.0001). This reduction in error was only seen towards the end of the response movement trajectory (Fig. 6c; p <.05), in contrast to the WM experiments. People who slowed more were no more accurate (r = -0.0771, p =.5189; Fig. 6e).

Discussion
We found that reward slows RT in a task without WM demands, and also decreases errors, with the reward benefits only seen towards the end of the movements. We were unable to fit the misbinding model as there were no possible non-targets. There were also large practice effects, with the first two trials having very large errors. For this reason, we adapted this task and ran it in several more participants.

Table 7
Statistical outputs from one-way repeated measure ANOVA (reward) on the behavioural and model parameters in Experiments 5a and 5b. We excluded the first two trials in experiment 5a as there were large practice effects on those. Significant effects are highlighted in red, * = p <.05, ** = p <.01, *** = p <.001.

Methods
As the previous data showed large practice effects and was fairly noisy due to low trial numbers, we modified the task to have more trials and to more closely match the memory tasks, only without the memory component. Four stimuli were shown ( Fig. 6g; see Supplementary Materials), and remained on the screen during probe and response, and there were six practice trials before two blocks of 30 trials, completed by 15 participants.

Results
Reward did not significantly decrease error ( Fig. 6i; Table 7; p =.1181; BF excl = 1.244) or affect RT ( Fig. 6j; p =.1827; BF excl = 0.700). There was a reward benefit early on in the response trajectory, but this did not persist until the end of the trial (p >.05). People who slowed more were no more accurate (r = -0.2821, p =.3074; Fig. 6k).

Correlations
As a further test of whether reward benefits can be explained without changes to WM, we compared the effect of reward on error and RT in the motor and WM experiments. If the WM accuracy benefit of reward is due to more time spent fine-tuning the motor response (which is present in both motor and WM tasks), then there should be correlations between the reward effects on error and RT in the motor task and the WM task. We regressed the motor reward effects for the 84 participants who completed either version of the motor task, with the reward effects from whichever WM experiment they completed (Experiment 2, 3 or 4).

Discussion
Experiment 5b found no significant effects of reward on error or RT, although Bayes Factors gave inconclusive evidence. As the sample size was only 15, these effects may have been significant with a full sample of 30 participants. The reward benefit was seen at the start of the response movement but did not persist until the end.
Finally, we found that RT in these motor tasks was related to the RT in WM tasks (Experiments 2-4), while angular error was not. This suggests that part of the slower reaction times in the WM experiments is due to more time spent responding (i.e., non-WM time), but that this cannot explain the WM accuracy improvements.
These motor control tasks suggest that the motivational benefits on WM accuracy found in Experiments 1-4 are due to WM processes, not motor response processes.

Across-experiment trajectory modelling
While the motor-WM correlations above provide some evidence that reward benefits on WM are not simply due to more time spent making responses, it is still not definitive. We therefore applied the misbinding modelling to the mouse trajectories during responses. Fig. 7. WM RT, but not error, is association with motor task responses. We regressed WM reward effects (difference between 50p and 1p conditions across Experiments 2-4) with motor reward effects (50p -1p for Experiments 5a&b) in the same people. a) Motor and WM errors were not associated (p =.0809), while b) RT was significantly associated between experiment types (p <.0001). Orange lines are significant regression lines with 95% confidence intervals shaded, while grey shading is 95% confidence intervals on non-significant regression line which include y = 0 line. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) To compare the effects of reward across tasks that have different additional factors, we regressed each of these other factors out for each task separately (in Experiment 4 we only looked at the equal trials) leaving only residual effects of time and reward. We then ran regressions on each time point to get a p-value for the effect of reward at each time point, and ran time*reward regressions to see if the reward effect differed by timethese were done for each model parameter separately.
Time and reward significantly affected each parameter ( Fig. 8; p <.05), but only guessing rate had a reward * time interaction (p =.0055), meaning that the effect of reward on imprecision, target responding and misbinding was the same across all time points.
When we did a similar analysis for the unequal reward trials from experiment 4, it showed significant effects of reward, time and reward*time for all parameters, meaning that unequal reward effects differ across the response time. Imprecision benefits grew over time, while the other benefits shrank over time (this was likely due to initial trajectories in low unequal reward being biased towards the high-reward item). Importantly, there were reward*time*equal interactions for all parameters (p <.0001), indicating that these time*reward interactions differed when rewards were equal or unequal.

General discussion
Using trial-wide rewards, rather than item-specific rewards, we showed that people can improve their working memory accuracy, and this process does not depend on when the reward-cue is presented. Trial-wide rewards did not improve memory more when presented before rather than after encoding, whereas unequal rewards had greater benefits when shown before the memoranda, indicating that only unequal rewards were able to modulate encoding. We also found no evidence for rewards affecting the maintenance of items over delays. The slower RT suggests a speed-accuracy trade-off, which was associated with greater target selection (via less guessing and/or misbinding) without changes in imprecision of responses. This suggests that when strategic prioritisation of WM capacity is not possible because all items are equally rewarded, people instead slow the item selection stage of retrieval down, in order to improve accuracy and maximise reward.
One explanation for this is that time during retrieval is costly, and that rewards motivate people to pay that cost to maximise rewards. There are many forms this cost could take, for example in a recent WM model retrieval is proposed to occur via sampling spikes from neurons storing the items, and greater accuracy can be achieved at the cost of longer sampling times and more energy spent on generating spikes (Schneegans, Taylor, & Bays, 2020). Alternatively, time spent on one process is time not spent on another process, leading to opportunity costs for every cognitive function, which are balanced depending on the values associated with their outcomes (Kurzban, Duckworth, Kable, & Myers, 2013;Shenhav et al., 2017). Our task had a fixed number of trials rather than a time limit, so slowing responses to increase accuracy may have been an optimal strategy to maximise rewards received, which may not always be the case in experimental tasks. Our work also challenges the resource-rational theory of WM (van den Berg & Ma, 2018), which proposes a cost associated with using neural resources to encode items in WM, and predicts better encoding when rewards offset this cost.
LBA modelling of the retrieval decisions offered some support for this strategic slowing, with experiment 1 and 4 finding increased thresholds when equal rewards were higher, which contrasted with no change to thresholds when unequal rewards were increased. However, our experiments 2 and 3 demonstrated no such changes to decision thresholds, which was surprising. Experiment 2 ′ s fits revealed no significant effects whatsoever, and had a much higher BIC than experiment 1, which may have suggested poorer fits to the orientation data, obscuring detection of any real effects. Experiment 3 ′ s fits found no reward effects on the other parameters either, although it did find that congruency increased drift rate.
The reward benefits were seen from the start of the response movement, suggesting that they were not solely due to more time spent fine-tuning the response to get it closer to the remembered orientation, and this is backed up by the lack of precision improvements, and by the independence of accuracy on the WM and motor-control tasks. This could reflect people initiating their responses when they have already selected the item, or could reflect some pre-threshold decision activity (Freeman, Dale, & Farmer, 2011;Gallivan, Chapman, Wolpert, & Flanagan, 2018;Jason, Chapman, & Masson, 2014) which is more accurate when people are motivated.
The reward benefits did not affect the precision of the feature recall (orientation), but rather improved selection of the target, and this benefit was the same when items were already active in WM (via an incidental retro-cue). This suggests that reward did not facilitate the "loading" of an item into active WM to allow feature recall, and may suggest that the memory storage aspect of this task was unaffected by motivation. Previous studies have also found no benefit of reward on WM precision (van den Berg et al., 2020;Wallis et al., 2015), and others have found rewards and punishments affect item selection during WM (Fallon, Dolfen, Parolo, Zokaei, & Husain, 2019;Wallis et al., 2015). Some of these tasks used trial-wide rewards, while others used item-specific rewards, and the motivational processes will depend heavily on this as well as when the rewards were known. When stimuli have different rewards, people can strategically choose to focus all their effort on only the high-reward ones, as increasing the accuracy of that item is worth enough to offset the loss of the low-reward. Such motivated prioritisation can improve feature precision, and has greatest effect when the rewards are shown before encoding (Allen & Ueno, 2018;A. L. Atkinson et al., 2018;Klink et al., 2017). We also found that unequal rewards improved WM precision, and the mixture modelling and RT showed that this benefit depended on cue-time. When prioritisation occurred before the stimuli, people were quick to respond and more likely to guess when probed on the low-item, whereas if the prioritisation occurred after the stimuli, they were slower and more likely to misbind. Proactive prioritisation could include attentional resourcingin this task, people knew where the items would appear so could even avoid looking at the low-reward item, preventing encoding entirely and forcing them to guess when probed on it. This pattern shows clearly that unequal incentives given before the stimuli could affect encoding. However, retroactive prioritisation means people must shift their resources to one item, which may have increased the misbinding of features between them. The fact that guessing was much lower for the low-reward item when post-cued suggests that it was not fully removed from memory, only de-prioritised.
An entirely different pattern was seen on the equally reward trials, as shown by the three-way interaction, due to high-rewards having the same effect on guessing when presented before or after encoding (BF excl = 14.184), suggesting that encoding itself was not changed. This would fit with a previous study which found that retrieval-related ERPs were strengthened when rewards were available (Sanada et al., 2013), while attentional components were not affected. It may be that the speed-accuracy trade-off is easier or less-effortful to implement than modulating encoding or storage on a trial-by-trial basis, so was the strategy used by participants here.
The reward benefits we have found here are relative benefits, and could reflect people expending less effort on the low-reward trials rather than giving more effort on the high-reward trials. Without a baseline unrewarded condition, these possibilities are hard to distinguish. Even in "unrewarded" conditions there will still be effects of intrinsic motivation, such as an inherent desire to perform well or to please the experimenter. One study that compared groups of participants working for different levels of money found no better memory for those in the $10 bonus condition than those in the $0 bonus condition (van den Berg et al., 2020). This may have been because the between-subject design adds in extra factors, including intrinsic motivation and reward sensitivity, which may have decreased the power to detect reward effects. How can this discrepancy be explained? Plausibly, the motivational effects we observe may emerge only when different rewards are contrasted within an experiment, trial-by-trial. This is supported by work indicating that value is an inherently relative construct (De Martino, Kumaran, Holt, & Dolan, 2009;Stewart, Chater, & Brown, 2006), and may be subject to strong anchoring and contrastive effects (Kahneman & Tversky, 1979). Alternatively, according to a resource-rational theory (van den Berg & Ma, 2018), variation in motivation over time would enable rational prioritisation of a fatigable resource, which is not feasible when incentives are constant. To interrogate these deeper issues, future work could measure baseline WM performance before the notion of rewards is introduced to the participant, which may reveal whether people were getting better for high rewards, or worse for low rewards, in a within-subject design.
Whether motivation decreases guessing or misbinding was unclear across these experiments, as some found effects on one or the other parameter. One explanation is that the misbinding parameter does not measure only true 'neural' misbinding (i.e. featureswaps), but also picks up on educated guesses where participants can remember the orientations of non-target items but not the target and so respond close to those (Pratte, 2018). Thus, a process that leads to more guessing can also affect the 'misbinding' parameter in these models. Measuring confidence in participants' responses may be needed to distinguish these effects.
A growing literature investigates the ole of reward on cognitive processing (Botvinick & Braver, 2015;Chiew & Braver, 2014;Fröber & Dreisbach, 2014;Frömer, Lin, Dean Wolf, Inzlicht, & Shenhav, 2021), and separately, investigating the nature of WM resources (Luck & Vogel, 2013;Oberauer & Lin, 2016) including how attention prioritises information in WM (A. Atkinson, Oberauer, Allen, & Souza, 2021;Hall-McMaster, Muhle-Karbe, Myers, & Stokes, 2019;Wallis et al., 2015). The current study brings these two traditions together, asking how reward might control these resources. Our results point out the need to distinguish the separate stages of the decision process. In particular, we uncover a partially surprising result that improvements can be driven by participants allowing more time for internal selection of the target in memory.

Conclusion
We found that people did not increase their WM capacity when motivated by equal rewards. Instead, they tuned their retrieval strategy to improve WM accuracy despite this limit. This is in keeping with reward paying a cost of time during the retrieval decision. This differs from the prioritisation strategy possible when rewards are unequal between memoranda, which preserves precision for one item's feature at the cost of much worse performance for other items, and may reflect different strategies depending on when the cue is given. These strategies are not mutually exclusive, as people can still slow down during retrieval for unequal rewards, and this demonstrates that motivation can still improve performance despite hard limits embedded in systems, and illustrates the flexibility in the WM system for goal-directed enhancement.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
All data and code are publicly available, with the links in the Methods section.