No evidence that the retro-cue benefit requires reallocation of memory resources

Selective mechanisms allow us to prioritize items held in working memory. Does this reflect reallocation of working memory resources? We examined a critical prediction of this account — that reallocating more resources from one item to another should provide a greater benefit. We used a reward manipulation to create variable allocation of resources. Subsequently, a retro-cue instructed participants to drop a memory item. This retro-cue improved performance for the prioritized items relative to a neutral baseline. However, in contrast to the pre- vailing reallocation account, we found no difference between dropping a higher versus lower reward item. Importantly, removal of high versus low reward items led to better encoding of subsequently presented items, demonstrating that our reward manipulation was successful. While allocation of resources can influence the encoding and storage of new information into working memory, reallocation does not appear to be essential for selection effects in working memory.


Introduction
Working memory holds information that is no longer in view for brief periods of time. Despite its limited capacity, we can prioritize items held in working memory. To investigate this, studies have presented retrocues after memory encoding to indicate that some items are relevant (and some are not) for the memory test (Griffin & Nobre, 2003;Landman, Spekreijse, & Lamme, 2003; for reviews, see Myers, Stokes, & Nobre, 2017;. Enhanced memory for cued items (i.e. retro-cue benefit) suggests a top-down selective mechanism in working memory. A common explanation for the retro-cue benefit is that memory resources are reallocated from un-cued items to cued items (e.g. Pertzov, Bays, Joseph, & Husain, 2013;Souza, Rerko, Lin and Oberauer, 2014), which fits with the idea that working memory performance is primarily constrained by access to a limited capacity store. It has been proposed that reallocation of resources may protect cued items from time-based decay (Pertzov et al., 2013), inter-item interference (Pertzov et al., 2017), or perceptual interference (Barth & Schneider, 2018;Schneider, Barth, & Wascher, 2017; see also Makovski & Jiang, 2007;Souza, Rerko, & Oberauer, 2016).
However, the evidence for the resource-based account is far from decisive. One prediction from this account is that there should be a resource trade-off between cued and un-cued items (Myers, Chekroud, Stokes, & Nobre, 2018;Pertzov et al., 2013). Consistent with this view, several studies have shown that un-cued items have worse performance than a neutral baseline (Astle, Summerfield, Griffin, & Nobre, 2012;Gözenman, Tanoue, Metoyer, & Berryhill, 2014;Griffin & Nobre, 2003;Gunseli, van Moorselaar, Meeter, & Olivers, 2015;Pertzov et al., 2013;Williams, Pouget, Boucher, & Woodman, 2013;Williams & Woodman, 2012). However, others have pointed to a dissociation between benefits to cued items and costs to un-cued items. For instance, Gunseli et al. (2015) manipulated the reliability of retro-cues (80 vs. 50% valid) and found that while less reliable retro-cues still produced consistent retrocue benefits, they led to largely reduced costs for un-cued items. A recent study also failed to observe invalid costs despite robust retro-cue benefits, and further showed that memory for cued and un-cued items within a single trial is not correlated (Myers et al., 2018). Work using a sequential retro-cue paradigm also revealed that retro-cue benefits may arise without costs for un-cued items. In this paradigm, one or two consecutive retro-cues are presented during the retention interval. Importantly, when the first cue is followed by a second cue, initially uncued items are not entirely lost and can be recovered with this second cue (Landman et al., 2003;Rerko & Oberauer, 2013;Van Moorselaar, Olivers, Theeuwes, Lamme, & Sligte, 2015), whereas the initially cued item still retains a benefit (Li & Saiki, 2014). These findings raise the possibility that retro-cues improve memory in a way that is not directly linked to the amount of memory resources allocated away from un-cued items.
In addition, studies investigating the effects of memory load on the retro-cue benefit have provided mixed evidence for resource reallocation. There is some evidence that the retro-cue benefit increases with memory set size (Kuo, Stokes, & Nobre, 2012;Souza, Rerko, Lin, et al., 2014), which is consistent with the idea that being able to reallocate more resources from un-cued items provides a larger retro-cue benefit. Nevertheless, others have failed to observe an interaction between memory load and retro-cues (Makovski, Sussman, & Jiang, 2008;Matsukura, Luck, & Vecera, 2007). This might suggest that resource reallocation is not necessary to explain the retro-cue benefit, and that other mechanisms may be responsible for this benefit. Others have proposed mechanisms of the retro-cue benefit that may not necessarily involve reallocation of resources: that retro-cues strengthen cued items (Myers et al., 2017;Rerko & Oberauer, 2013), or that retrocues protect cued items from subsequent interference (Makovski & Jiang, 2007;, or that retro-cues allow more time for evidence accumulation . It is worth noting that these mechanisms are not mutually exclusive and can be combined to explain the retro-cue benefit (for a review, see . The current study aimed to provide a more direct test of the reallocation account by examining whether the retro-cue benefit depends on the amount of resources freed up from un-cued items. Although work has examined the consequences of invalid retro-cues, no previous study tested how the memory strength of un-cued items modulates the benefit to cued items. Here we varied the reward magnitude of items during memory encoding to create an unequal distribution of resources. Specifically, we had participants associate each memory location with a low, medium, or high reward value. Previous work has found that assigning different reward values to memory items provides a benefit for high-value versus low-value items (Allen & Ueno, 2018;Atkinson et al., 2018;Atkinson, Allen, Baddeley, Hitch, & Waterman, 2020;Klink, Jeurissen, Theeuwes, Denys, & Roelfsema, 2017;Klyszejko, Rahmati, & Curtis, 2014;Manga, Vakli, & Vidnyánszky, 2020;Yin, Havelka, & Allen, 2021), suggesting that more memory resources are allocated to high-value items. Further, during the memory delay, a retro-cue indicated that participants could remember only a subset of memory items and drop un-cued items from memory, as un-cued items would never be probed. If the retro-cue benefit arises from reallocation, dropping items with a higher reward value should yield a larger benefit in memory, since these items consume more resources.

Method
The data and materials are available at https://osf.io/9ns4x/

Participants
Fifty-six participants (34 female, mean age = 27.4 years, age range: 18-39) were recruited online via Prolific (www.prolific.co) for compensation of £5.2 per hour. Participants may receive additional bonus payment (M = £2.05) based on their performance. The maximum bonus was £4, and the minimum bonus was £0. Three additional participants were tested but excluded because they had an average response error above our a priori cutoff of 2.5 standard deviations from the mean for all participants (73.72 • ). Including these participants would not have impacted our results in any meaningful way. The required sample size was determined using G*Power (Faul, Erdfelder, Lang, & Buchner, 2007). Anticipating that we will obtain a moderate effect size (η p 2 = 0.06) of dropping a higher versus lower reward item, a power analysis showed that 53 participants would be sufficient to achieve a power of 0.95 at alpha level of 0.05. All participants reported normal or corrected-to-normal vision. Each participant provided online consent before the experiment. The study was approved by the New York University Abu Dhabi Institutional Review Board.

Stimuli
All experiments were programmed in HTML Canvas and Javascript. Since all studies were conducted online, the stimuli could differ in size depending on monitor size and viewing distance. Here we report stimuli size in pixels.
The screen background remained black during the experiment. The memory stimuli were three colored circles (radius 25 pixels) equally spaced on an imaginary circle (radius 100 pixels) around the screen center. The colors for each trial were randomly chosen without replacement from 180 possible values drawn from CIELAB space centered at L* = 54 (luminance), a* = 18, b* = − 8 and a radius of 59.

Procedure
Trial procedure is illustrated in Fig. 1a. Each trial began with the presentation of a white fixation cross at the screen center for 200 ms. The memory stimuli were presented for 400 ms. The location of the memory stimuli indicated their reward value for the memory test. We chose to use location as the reward cue as pilot studies suggested that the task was too complex for participants when it involved both a per trial reward and cue manipulation.
Following a blank interval of 1000 ms, the retro-cue (length 50 pixels) was displayed for 500 ms. Neutral cues (1/3 of trials) were three white arrows pointing toward each memory stimuli, thus providing no information on which item would be tested. Valid cues (2/3 of trials) were two green arrows pointing toward two memory stimuli. Each cued item had equal probability of being tested, and participants were informed that they would never be tested on the un-cued item and could drop it from memory. Following a delay of 2500 ms, the memory test was presented. A probe stimulus appeared along with a color wheel centered around fixation. The color wheel was rotated randomly across trials to prevent participants from associating the spatial positions with colors. Participants reported the color of the probed item by selecting a value on the response wheel. Upon mouse movement, the probe stimulus was updated according to the angular position of the mouse cursor, and a white line appeared on the outer edge of the response wheel to indicate the currently selected value. Participants clicked on the mouse to confirm their response. The memory response was not speeded.
In the feedback display, the probe stimulus was replaced by a circle showing the correct color of the target, and response error in degrees from the target was displayed. The feedback display also informed participants about the bonus points they earned for the current trial (e.g. "You have earned x out of x points") and their total accumulated points. Participants clicked on the mouse to proceed to the following trial, which began after an intertrial interval of 500 ms.
A practice block of 9 trials preceded the experiment to familiarize participants with the task. Participants then completed 270 trials (9 blocks) for the main experiment. Trials were evenly divided between the 3 probe reward conditions (low, medium, high). In addition, one third of trials were neutral cue trials, and the remaining two thirds were valid cue trials. On valid cue trials, the probed item was always cued, along with one of the two other items. This resulted in 30 neutral cue trials and 60 valid cue trials for each probe reward (low, medium, high) condition.
Each memory display contained a low, medium, and high reward item. Thus, participants could receive bonus points (low, medium, or high) depending on the location of the tested item. These rewardlocation associations were randomized across participants and were kept constant throughout the entire experiment. A reward training task ( Fig. 1b) preceded each block to ensure that participants learned these reward-location associations. During the reward training task, a placeholder array (white circle frames) appeared at memory locations.
Instructions on screen asked participants to click on the location corresponding to a specific reward value (e.g., "Click on the high reward location"). Feedback (correct or incorrect) was provided after participants indicated their response. If their response was incorrect, they would be prompted to click on another placeholder until they correctly indicated the location associated with this reward value. Once they correctly indicated the reward location, this procedure was repeated for the remaining reward locations. The instruction order was randomized for each block. Participants were awarded an additional 50 points if their response in the reward training task was correct on the first attempt.
To determine the bonus points for each trial, we first computed response error by taking the angular difference between the true color and the selected response. We then converted the absolute value of the response error into bonus points using this formula: ( 180− Absolute response error 180 ) × Probe reward multiplier, rounded to the nearest integer. The values of the Probe reward multiplier were 10, 100, and 1000, respectively, for the low, medium, and high reward probes. They represent the maximum bonus points participants could earn for each reward level (e.g., when getting a response error of 0 • ) on a single trial. For example, if participants' response deviated from the true color by 18 • for a high reward item, they could earn (180-18)/180 × 1000 = 900 points for that trial. In contrast, if participants had the same degrees of error for a low reward item, they could only earn (180-18)/180 × 10 = 9 points. The large differences in points between the different reward levels maximized the likelihood that participants would differentially allocate resources to the items. In addition, we used a tiered bonus system. Participants qualified for a monetary bonus if their total points exceed 80,000 (minimum reward threshold). This threshold was approximately 80% of maximum total points participants could have possibly earned. Participants received a bonus of £0.5 for the first 80,000 points earned on the task, plus £0.5 for every 2000 points (~ 2% of maximum total points) above 80,000. Participants who earned more than 94,000 points would receive the maximum bonus (£4). Critically, the present experiment was designed to isolate effects of reallocation while controlling for benefits associated with probe reward. First, each trial had the same reward structure (low, medium, and high) so that the division of resources was equal from trial to trial. Second, our analysis of reallocation focused on items that have the same reward value (presumably given equivalent resources during encoding). For instance, when the high reward item is probed, we could examine differences between when the low item could be dropped (un-cued lower) and when the medium item could be dropped (un-cued higher). By using items of the same value as the basis of comparison, any performance difference between the two conditions can be attributed to resources reallocated from the dropped item and not differences in the tested items.

Results
To examine whether the reward and cue manipulations were effective, we first analyzed mean absolute response error (Fig. 2) with a 3 (probe reward: low, medium, high) × 2 (cue: neutral, valid) ANOVA. Greenhouse-Geisser corrections were applied in case of sphericity violations. We found a main effect of reward, F(1.510, 83.043) = 15.00, p < .001, η p 2 = 0.214. Bonferroni planned contrasts further showed that memory performance for the high reward item (24.28 • ) was better compared to the medium reward item (28.60 • ), t(55) = 4.23, p < .001, d z = 0.57, and the low reward item (32.76 • ), t(55) = 4.72, p < .001, d z = 0.63. Memory performance was also marginally better for the medium reward item than the low reward item, t(55) = 2.43, p = .055, d z = 0.33. We also found a significant retro-cue benefit: memory performance was better for valid trials (  Trial procedure for Experiment 1. (a) On each trial, participants remembered items of different reward value (high, medium, or low) based on their locations. During the memory delay, they were cued to remember two items (and drop one) with a valid cue (shown here, 2/3 of trials) or cued to remember all items with a neutral cue (three white arrows, 1/3 of trials). During response, participants adjusted the color of the probe stimulus to match their memory using the color wheel. Immediately after the memory response, we provided performance feedback by displaying the response error as well as the true color of the target. Further, participants could receive bonus points proportional to memory accuracy and the reward value of the probed item. (b) Reward-location associations were randomly assigned to participants at the start of the experiment and remained constant throughout the experiment to minimize task difficulty. To familiarize participants with these reward locations, a reward training task was presented before each block. In this task, participants clicked on the high, medium, and low reward locations according to the instructions on screen.
retro-cue benefit did not differ across reward items. Importantly, these findings suggest that participants used both reward expectations and cues when performing the task.
To test the resource account, we examined whether the size of retrocue benefits depends on the reward value of the un-cued (dropped) item. On valid cue trials, one item was cued and probed, another was cued but un-probed, and the remaining item was un-cued. Accordingly, we could further sort trials into two conditions depending on which reward item was un-cued. Given our assumption that more resources could be released and reallocated from the un-cued item with a high reward value, we used the relative reward value of the un-cued item (lower or higher, compared to the cued but un-probed item) to refer to these two conditions. For example, when the high reward item was probed, the uncued item could be either the low reward item or the medium reward item. Since the low item is lower in value than the medium item, we used "un-cued lower" to refer to when participants drop the low item and "uncued higher" to refer to when participants drop the medium item.
Mean absolute response errors on valid cue trials were further submitted to a 3 (probe reward: low, medium, high) × 2 (un-cued item: lower, higher) ANOVA. Consistent with the analysis for all trial types, there was also a main effect of reward, F(1.581, 86.963) = 18.77, p < .001, η p 2 = 0.254, showing that memory was better for items with higher reward values. However, there was no effect of un-cued item, F(1, 55) = 0.19, p = .662, η p 2 = 0.003. Dropping a higher reward item (27.44 • ) did not benefit performance more than dropping a lower reward item (27.14 • ). This is inconsistent with the reallocation account. The interaction was not significant, F(1.996, 109.766) = 1.56, p = .216, η p 2 = 0.028. To examine the effects of dropping an un-cued item more closely, we conducted a paired t-test for each probe reward level. We also performed Bayesian t-tests in R using the BayesFactor package (0.9.12-4.3) and the JZS default prior to quantify evidence for the null or alternative hypothesis (Rouder, Speckman, Sun, Morey, & Iverson, 2009 For the high reward item, we again found no difference in dropping the low (22.96 • ) versus the medium item (23.04 • ), t(55) = 0.08, p = .934, d z = 0.01, BF 01 = 6.83. The only Bayes factor result that did not find at least moderate evidence for the null was for the low reward item, and this effect went in the opposite direction as predicted by the resource account.
Next, we examined the time taken to provide the memory response. Analyses of mean response times (RTs) revealed a significant effect of reward, suggesting that participants were more careful in selecting responses to items with higher reward values. However, there was no difference in RTs between the valid and neutral cue conditions. Further, we found that RTs did not differ depending on whether the higher or lower reward item could be dropped from memory (see the Supplemental Material for details). This confirms the results in response error that there is no benefit for dropping high reward items.

Discussion
We found that reward influences performance, which is consistent with work showing that items can be prioritized based on their importance during encoding (Dube, Emrich, & Al-Aidroos, 2017;Emrich, Lockhart, & Al-Aidroos, 2017;Klink et al., 2017;Klyszejko et al., 2014). We also observed retro-cue benefits when one of the items could be dropped from memory, replicating work showing a benefit of cueing multiple items (Matsukura & Vecera, 2015; but see Barth & Schneider, 2018;Makovski & Jiang, 2007). Finding both reward and cue effects allowed us to ask whether retro-cue benefits depended on the reward (and thus the allocated resources) of the dropped item. Critically, there was no evidence for a larger boost in memory when the dropped item was a higher reward item. There was no difference even for the extreme comparison of dropping the high versus low reward item, despite a 100fold difference in potential points and a large difference in performance (8.57 • average error difference). While reallocation is often the assumed explanation for cueing effects (e.g. Pertzov et al., 2013;Souza, Rerko, Lin, et al., 2014), our results failed to confirm the predictions of this account.

Experiment 2
Experiment 1 showed that the retro-cue benefit was independent of the memory strength of the dropped item. However, it is possible that we failed to observe a differential benefit because we used a low memory load. Previous work has observed larger retro-cue benefits or costs as memory load increases (Astle et al., 2012;Kuo et al., 2012;Souza, Rerko, Lin, et al., 2014). Thus, it is possible that participants would drop un-cued items from memory only when the number of items exceeds memory capacity. Additionally, the number of un-cued items might be critical in finding robust retro-cue benefits. Although in Experiment 1 we found that cueing two out of three items led to a small benefit, some other studies found a benefit in accuracy only when two out of four items werecued, but not when two out of three items were cued (Barth & Schneider, 2018;Heuer & Schubö, 2016).
Here we aimed to replicate the null findings under conditions where we increase the likelihood of finding larger retro-cue benefits. We used a similar paradigm as Experiment 1, but now we increased the memory load to four items and retro-cued two items. We designed the task so that the memory array consisted of one low reward item and one medium reward item on one side, and one high reward item and one medium reward item on the other side. During the retention interval, the retrocue indicated that only one side of the display (a medium item plus either a high or low item) was relevant.

Participants
Fifty-six new participants (20 female, mean age = 28.7 years, age range: 18-39) were recruited via Prolific. Participants also received Fig. 2. Results in Experiment 1. Mean absolute response error for Experiment 1 (n = 56) across probe reward (low, medium, high) and cue conditions (neutral vs valid [un-cued lower, un-cued higher]). In the valid cue condition, the probed item was always cued, and the un-cued (dropped) item could be either of the two un-probed items. Importantly, the un-cued item could be of lower or higher reward value (consuming fewer or more resources) compared to the other un-probed item. Error bars represent 95% within-subjects confidence intervals.
performance-based bonus (M = £2.44). One additional participant was tested but removed for high average response error (above 2.5 standard deviations from the mean for all participants).

Procedure
The task (Fig. 3) was similar to Experiment 1, and we used the same timing parameters. In the memory display, four circles were presented at intercardinal positions (45 • , 135 • , 225 • , 315 • ). The colors of all memory stimuli were randomly selected from the color space used in Experiment 1. For these memory locations, two locations on one diagonal were associated with medium reward items, and two locations on the other diagonal were associated with low and high reward items. We randomized these reward-location associations across participants and kept them constant during entire experiment. A reward training task preceded each experimental block to ensure that they correctly associated the reward values with stimuli locations. Participants were instructed to click on the memory locations corresponding to low, medium, and high reward values. Participants had to repeat the process until they clicked on the correct locations. Participants would be given 50 bonus points if their response in the reward training task was correct on the first attempt.
On neutral trials, the retro-cue pointed toward both left and right sides of the display and provided no information about the to-be-tested item. On valid trials, the retro-cue pointed toward the left or the right, indicating that participants had to store items on the cued side and drop items on the other side. For each participant, one side of the display always contained low and medium items, and the other side of the display contained medium and high items. Therefore, in the valid cue condition, participants were cued to drop the side that contained a low reward item (un-cued low) or the side that contained a high reward item (un-cued high). Our main comparison was whether dropping the low or high reward item provides a differential benefit for the medium item.
As in Experiment 1, the amount of bonus points participants earned for each trial scaled with performance and the reward level of the probed item. If participants' total points exceed 55,000 (minimum reward threshold, around 75% of maximum total points), they would receive a bonus of £0.5 for the first 55,000 points earned on the task, and £0.5 for every 1500 points (~ 2% of maximum total points) above 55,000. Participants with total points more than 65,500 would receive the maximum bonus (£4). Prior to the study, participants completed 8 practice trials. The main experiment consisted of 240 trials (eight blocks). There were 80 neutral cue trials (20 low reward, 40 medium reward, 20 high reward) and 160 valid cue trials (40 low reward, 80 medium reward, 40 high reward). In the valid condition, trials were evenly divided between un-cued low and un-cued high conditions.

Results
Mean absolute response error (Fig. 4) was submitted to a 3 (probe reward: low, medium, high) × 2 (cue: neutral, valid) ANOVA. Greenhouse-Geisser corrections were applied in case of sphericity violations. We found a main effect of probe reward, F(1.326, 72.924) = 31.73, p < .001, η p 2 = 0.366. Bonferroni planned contrasts further showed that memory performance for the high reward item (27.94 • ) was better compared to the medium reward item (35.62 • ), t(55) = 5.21, p < .001, d z = 0.70, and the low reward item (46.96 • ), t(55) = 6.24, p < .001, d z = 0.83. Memory performance was also better for the medium reward item than the low reward item, t(55) = 4.69, p < .001, d z = 0.63. We also found a significant retro-cue benefit: memory performance was better for valid trials (34.94 • ) relative to neutral trials ( Further inspection of the data suggested that some participants may have opted for a strategy of avoiding storing the low reward items at all (performance on this condition was very bad for some participants). This is problematic for two reasons. First, this raises the possibility that performance differences between high and low reward items may be Fig. 3. Trial procedure for Experiment 2. Reward locations were assigned to participants at the start of the experiment. Two locations on one diagonal consisted of medium reward items, and two locations on the other diagonal consisted of low and high reward items. The retro-cue indicated the relevant side of the memory display. The cue display contained either a valid cue (green arrow) indicating the relevant side for the memory test, or a neutral cue (white arrows pointing to both sides) indicating that all items were still relevant. Participants provided the memory response by selecting a value on the color wheel. Performance and reward feedback was provided upon the memory response. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 4.
Results in Experiment 2. Mean absolute response error (n = 56) across probe reward (low, medium, high) and cue conditions (neutral vs valid). Error bars represent 95% within-subjects confidence intervals. explained by a strategy of remembering only the higher value items, rather than a differential allocation of resources. Second, if participants are not storing the low reward item, differences between dropping low and high reward items may not just be due to differences in resource reallocation, but may reflect that only dropping the high reward item would lead to a retro-cue benefit. Therefore, we removed participants that were frequently guessing (on either condition). Specifically, we used mixture modeling analyses (Zhang & Luck, 2008) to estimate guess rates for each participant and condition. We excluded participants whose guess rates in either the low or the high conditions were greater than 3 standard deviations above the mean (Fougnie & Alvarez, 2011) for all participants. Based on this criterion, 11 participants were excluded for performing poorly on low reward trials (none for high reward trials). Importantly, removal was based on performance in the low or high reward trials, and therefore is orthogonal to our main comparison of interest for medium reward trials.
We designed the task so that in the valid cue condition, participants could remember the relevant side containing a high item and drop the side containing a low reward item (un-cued low), or to remember the side containing a low item and drop the side with a high reward item (un-cued high). Therefore, we could compare the effects of dropping the low versus high item based on performance for the medium reward item (Fig. 5). A paired t-test found no significant difference between dropping the low (32.46 • ) versus high conditions (32.44 • ), t(44) = 0.01, p = .992, d z = 0.0015. A Bayesian t-test also found a Bayes factor (BF 01 ) of 6.19, providing strong evidence in support of the null hypothesis.
We should note that the data still favor the null hypothesis (although to a lesser extent) before removing low performing participants. Descriptive statistics before and after excluding these participants were listed in Table 1. A paired t-test found no significant difference between dropping the high (32.55 • ) versus the low side (35 • ), t(55) = 1.53, p = .132, d z = 0.20. A Bayesian t-test found a Bayes factor (BF 01 ) of 2.29, providing anecdotal evidence in support of the null hypothesis. The presence of a numerical trend provides some (albeit weak) evidence for selective storage of high reward items. Taken as a whole our data provide much stronger support for the null than the alternative hypothesis.
Analyses on mean RTs also found that participants were slower during memory responses for items with higher reward values. This is consistent with results in Experiment 1 and might suggest that they spent more time fine-tuning their responses for a more rewarding item. In addition, there was a retro-cue benefit in RTs: participants were faster when responding to a validly cued item versus a neutral item. However, this benefit did not depend on whether participants dropped the high or lower reward item from memory (see the Supplemental Material).

Discussion
Here we cued one side of the display and increased the number of uncued items in order to increase the size of the retro-cue benefit. The results replicated Experiment 1 in showing clear benefits of reward and retro-cue. Despite finding a larger retro-cue benefit (3.79 • ), we still failed to find a significant difference for dropping high versus low reward items, as in Experiment 1. Taken together, this challenges the influential idea that the retro-cue benefit comes primarily from reallocation of working memory resources.

Experiment 3
In our previous experiments, we failed to show an effect of freed-up resources on retro-cueing. Experiment 3 was designed to verify our key assumption that more memory resources are allocated to high versus low reward items. This assumption is supported by evidence that performance depends strongly on reward. However, one possibility is that differential performance reflects differential encoding of information but that, once stored, each item is given roughly equal access to resources. To rule out this explanation, we tested whether the reward value of the dropped item impacts encoding and storage of new information into working memory.
Specifically, Experiment 3 presented items across two sequential displays. Participants encoded all items from the 1st display (consisting of high and low reward items), but a retro-cue later instructed them to drop high or low reward items from memory. Critically, this retro-cue could appear after the 2nd display was encoded or before the 2nd display appeared. Previous work has shown that dropping items from memory improves performance for subsequently encoded items , presumably because more resources are available to store new information. Accordingly, observing greater benefits for encoding the 2nd display when high versus low reward items could be dropped from the 1st display would provide strong evidence that high reward items consume more resources. Further, as a control condition, on some trials we presented the retro-cue after the 2nd display to show that the same benefit does not extend to existing working memory representations.

Participants
One hundred new participants (51 female, mean age = 28.5 years, age range: 18-40) were recruited via Prolific. Participants also received performance-based bonus (M = £2.07). None of the participants were removed using the same exclusion criteria as in Experiment 1 (average response error above 2.5 standard deviations from the mean for all participants).
We were interested in finding an interaction between cue display order and un-cued item (i.e., whether dropping high versus low items led to a benefit for encoding of 2nd array, compared to when the 2nd array has already been encoded). Power analysis was conducted based on results from a preliminary experiment (n = 24), which found an effect size of η p 2 = 0.057 and repeated-measures correlation coefficient of 0.22 for the interaction effect of interest. According to G Power (Faul et al., 2007), a sample size of at least 86 would be required to reach a power of 0.95, given a significant level of 0.05.

Procedure
On each trial (Fig. 6) two memory arrays were sequentially presented. To minimize confusion between the displays, stimuli from these two arrays were placed at different distances from fixation and were Fig. 5. Results for the medium reward item condition after exclusion of participants who guessed frequently in either the low or high reward conditions (n = 45). A neutral cue indicated that all items were still relevant, whereas a valid cue indicated that participants could either drop the side that included a low reward item (un-cued low) or drop the side that included a high reward item (un-cued high). Error bars represent 95% within-subjects confidence intervals. distinguished by their shapes (circles and squares of width 50 pixels). The colors of all memory stimuli were sampled from the color space used in Experiment 1. The 1st array contained four circles arranged in intercardinal positions (45 • , 135 • , 225 • , 315 • ) at the outer ring (110 pixels from fixation). Two of these positions were associated with high reward items, and two were associated with low reward items. Reward position was assigned on a per participant basis, with stimuli locations at diagonally opposite quadrants having the same reward value. As in previous experiments, the reward locations were constant across trials to reduce task complexity, and participants performed a reward training task on the 1st array prior to each block to ensure that they correctly associated the reward values with stimuli locations. Participants were asked to indicate the memory locations associated with low and high reward values. If they clicked on the correct locations on their first attempt, they would receive 50 bonus points. Otherwise, they had to repeat the task until they provided a correct response. The 2nd array used three square stimuli on equidistant positions on the inner ring (50 pixels from fixation, 0 • , 120 • , 240 • ). All items in the 2nd array had medium reward value.
The cue display contained a bidirectional arrow (length 100 pixels) pointing toward two diagonal quadrants and a placeholder array (white circle frames) indicating the stimuli locations for the 1st array. On half of the trials, the two high reward items were cued, and on the other half, the two low reward items were cued. Participants were instructed that they should store only the cued items and drop the un-cued items from memory. In addition, we removed the neutral cue condition to maximize the number of trials per condition. We already observed evidence in previous experiments that participants were using cues, and it is reasonable to assume that retro-cue benefits would persist, and may even be stronger, under high memory load.
There were two types of experiment blocks, which differed in whether the retro-cue (for the 1st array) was presented before or after the 2nd array. In the before 2nd array block, each trial began with the presentation of the fixation cross for 200 ms. The 1st array was presented for 400 ms, followed by an interval of 1000 ms. The retro-cue was then displayed for 500 ms. After a post-cue delay of 2500 ms, the 2nd memory array was presented for 400 ms, followed by a delay of 1500 ms preceding the memory test. In the after 2nd array block, trials also began with the presentation of the fixation cross (200 ms). The 1st array was presented for 400 ms, followed by an interval of 1000 ms. Afterwards, the 2nd array was presented for 400 ms, followed by a delay of 1500 ms. The retro-cue was presented for 500 ms. After a post-cue delay of 2500 ms, the memory test was presented.
On all trials, participants were given two sequential memory tests. Participants adjusted the probe stimulus to match the remembered color using the response wheel. Our critical comparison was whether memory for the 2nd array depends on reward level of the dropped items. Therefore, to provide the most sensitive tests for our hypotheses, we first Mean absolute response error as a function of probe reward condition (low, medium, or high) and cue condition (neutral, valid). In the medium reward condition, a valid cue indicated that participants could either drop the low reward item (un-cued low) or drop the high reward item (un-cued high).

Fig. 6.
Trial procedure for Experiment 3. Two memory arrays were presented sequentially. In the 1st array (four circles), items at diagonally opposite locations had equal reward value (high or low). In the 2nd array (three squares), all items had medium reward value. During the memory delay, a cue informed participants that they could drop two high reward items or two low reward items from the first array. The cue could be presented either before or after the 2nd array (blocked). There were two memory tests on each trial: an item from the 2nd array was probed, followed by a probe for a cued item in the 1st array. Afterwards, perfomance feedback and bonus points for the two memory tests were presented simultaneously.
probed memory for the 2nd array to minimize effects of output interference and time on that response. After participants had responded to the memory test for the 2nd array, participants then made a memory response to a probed item (always one of the cued items) from the 1st array. Note that we could not fairly compare the benefits of dropping high or low reward items based on memory performance for the 1st array, since these two conditions would involve testing items of different reward values. Performance feedback for both memory tests was provided after the second memory test. The inner and outer parts of the probe stimuli displayed the correct colors and the selected feature values, showing the disparity between the two. In addition, the feedback screen displayed the response error, the points earned for each memory test ("xx out of xx points"), trial points, and total points. Participants received bonus payment if their total points exceed 100,000 (minimum reward threshold, around 65% of maximum total points). They would receive a bonus of £0.5 for the first 100,000 points earned on the task, and they could earn £0.5 for every 3000 points (~ 2% of maximum total points) above 100,000. Participants would receive maximum bonus (£4) if their total points were more than 121,000.
Prior to the study, participants completed 8 practice trials: 4 trials each for the before 2nd array and after 2nd array conditions. The main experiment consisted of 256 trials divided into eight blocks of 32 trials. In half of the blocks, the retro-cue was presented before the 2nd array and, and in the other half, the retro-cue was presented after the 2nd array. The order of blocks was randomized. There were 64 trials for each cue display order (before 2nd array, after 2nd array) × un-cued item (low reward, high reward) condition.

Results
In Experiment 3, we excluded the neutral cue condition to increase the number of trials per condition, and thus we could not directly compare how valid cues improved performance relative to neutral cues. While we were unable to measure retro-cue benefits, we believe such benefits would have existed for both the 1st and 2nd arrays. Validity effects in Experiment 1 and Experiment 2 were strong and in line with the findings of many studies (Matsukura & Vecera, 2015;Williams & Woodman, 2012). Further, our task imposed a high memory load such that it would be advantageous to use the cue to maximize performance and reward. Finally, evidence that reward value of the to-be-dropped items matters for encoding new displays suggests that participants were using the retro-cues.
The interaction effect was not significant, F(1, 99) = 0.17, p = .684, η p 2 = 0.002. This suggests that our reward manipulation worked, and that reward did not interact with cue display order.
For the second array (Fig. 7b), both main effects were significant. Memory was better when the cue was presented before (40.23 • ) than after the 2nd array (62.64 • ), F(1, 99) = 270.08, p < .001, η p 2 = 0.732. This is consistent with the idea that retro-cues free resources for encoding new items . Overall there was a benefit for dropping the high (50.30 • ) compared to dropping the low reward item (52.57 • ), F(1, 99) = 10.39, p = .002, η p 2 = 0.095. The critical comparison is whether the effect of reward depends on the timing of the 2nd array. Consistent with this, the interaction effect was significant, F(1, 99) = 4.85, p = .030, η p 2 = 0.047. Pairwise comparisons revealed a significant benefit in dropping the high reward items relative to low reward items when the cue was presented before the 2nd array, t (99) = 4.54, p < .001, d z = 0.45. but not when the cue was presented after the 2nd array, t(99) = 1.00, p = .321, d z = 0.10. Thus, this aligns with results from previous experiments that there was no benefit of removing items of high versus low reward when information was already encoded into memory. However, we demonstrated that encoding and storage of new information depend on the removal of high versus low reward items, suggesting differential allocation of resources based on reward.
In addition, we examined the time taken to provide the memory response to the 2nd array probe. There were no main effects or interactions (F's < 0.14, p's > 0.707). RTs did not differ depending on the timing of cue, or the reward value of the un-cued item (see the Supplemental Material for details). This suggests that the effects observed in response error could not be simply due to speed-accuracy tradeoff.

Discussion
Our results show that dropping high versus low reward items provides a benefit for encoding new items, but (just as in Experiment 1 and Experiment 2) not for information already encoded and stored in working memory. This alleviates concerns that the results of previous experiments reflected that memory items consumed equal resources. Fig. 7. Results in Experiment 3. Mean absolute response error in Experiment 3 (n = 100) for the 1st array probe (a) and 2nd array probe (b). Error bars represent 95% within-subjects confidence intervals. In this task, participants encoded two sequential memory arrays. The 1st array contained both high and low reward items, and a retro-cue indicated that participants could store only high reward items and drop low reward items from the 1st array (un-cued low), or vice versa (un-cued high). Importantly, the retro-cue could appear before or after presentation of the 2nd array. Note that we did not include a neutral cue baseline. Thus, for the 1st array we compared effects of probe reward value (high or low reward tested), whereas for the 2nd array we compared effects of dropping high versus low reward items.
One limitation of our study is that the online experiments could not monitor for eye movements. Therefore, participants might have fixated closer to high reward items during encoding, and this could explain some of the benefit to high reward items. However, Experiment 3 suggests that the performance difference between high and low reward items was not merely due to differences in eye position. There was a differential benefit for dropping high versus low value items when encoding a subsequent display, which is better explained by a difference in resource allocation.
Our hypotheses relate to how the dropped items' reward value leads to performance differences, and not how timing of events changes the overall performance. Therefore, our task structure was designed to equate the retention interval but not to equate cue timings for before and after 2nd array conditions. Therefore, a limitation of this study is that the timing differences could, in theory, be causing the differences in performance. We think this is unlikely for several reasons. Earlier studies showed that varying the delay between the stimuli and the cue has little impact on performance (Nouri & Ester, 2020;Van Moorselaar et al., 2015; but see Rerko & Oberauer, 2013). Further, our study used a 2500 ms delay following the cue, which is longer than the time required to make full use of the cue (Pertzov et al., 2013), or to remove items from working memory (Lewis-Peacock, Kessler, & Oberauer, 2018).

General discussion
Working memory is surprisingly limited given its importance in everyday tasks. Fortunately, we can purposefully control its contents. One demonstration of this is the retro-cue paradigm, in which cues presented after encoding identify and enhance relevant items. How do retro-cues improve performance for cued items? While several ideas have been proposed , a common view is that memory resources, tied up in un-cued items, are released by the retrocue and reallocated to cued items. While little direct evidence exists for this reallocation view, the language used to explain retro-cueing often explicitly or implicitly invokes ideas of resources (or something equivalent) being shuffled from one item to another. The present study aimed to test the reallocation account by using a reward manipulation to vary the allocation of resources among items. In Experiment 1, we found that dropping a high versus low reward item provided no greater benefit, inconsistent with the reallocation account. In Experiment 2, even when we increased the likelihood of finding such benefits, there was still little evidence that the retro-cue benefit was primarily driven by resource reallocation. Importantly, Experiment 3 found that dropping high reward items benefits encoding of new information into working memory. Thus, we do find evidence that high reward items consume more resources, and that removal of items frees up an amount of resources relative to reward. Taken together, we conclude that while allocation of resources is critical for encoding of information, reallocation of resources is not the dominant factor determining the size of retrocue benefits.
It might seem surprising that encoding of new information should be more sensitive to extra resources released from a high reward item than already stored items. However, there are fundamental differences between the two tasks. During encoding, when perceptual information is still accessible, allocating more resources to encode and store an item can improve its quality (e.g. Bays, Gorgoraptis, Wee, Marshall, & Husain, 2011). In contrast, for already encoded items, whose information is no longer available to our perceptual system, reallocation may be unable to improve representational quality (Bays & Taylor, 2018). A recent study showed another way that memory improvements differ for already encoded versus to-be-encoded information: Providing free time may improve memory for subsequently encoded items, but not for already encoded items (Mızrak & Oberauer, 2021). They suggested that while encoding resources are limited and that each of the sequentially presented items takes up some amount of this resource, the encoding resource can recover over a longer delay between items. Future work may further explore the role of timing in memory prioritization.
The present results are consistent with accounts proposing that retrocues strengthen or facilitate access to cued representations (Myers et al., 2017;Rerko & Oberauer, 2013), or that retro-cues protect items from subsequent visual interference (Makovski & Jiang, 2007;. Our results suggest that mechanisms underlying this benefit may relate more to the prioritization of an item and may not be directly linked to the quality of the un-cued representation. However, this does not imply that un-cued items were not de-prioritized. While our study never tested un-cued items, many studies have observed both benefits for cued items and costs for un-cued items (Astle et al., 2012;Griffin & Nobre, 2003;Gunseli et al., 2015;Pertzov et al., 2013) or found reduced neural signatures of un-cued items (Kuo et al., 2012;Lewis-Peacock, Drysdale, Oberauer, & Postle, 2012). While costs for un-cued items are often taken as evidence for a resource allocation account, they may not be a direct consequence of retro-cue benefits (Günseli et al., 2019;Myers et al., 2018), and may have arisen due to strategic removal of irrelevant items or due to effects of probe anticipation (Myers et al., 2017). Since the present study did not present direct evidence for removal of un-cued items, we could not entirely rule out the possibility that participants would remove un-cued items only when the memory load is high. Therefore, it is important for future work to more thoroughly examine when and how removal of irrelevant information occurs. In addition, although our task always cued more than one item, there might be additional benefits for cueing one item compared to cueing a subset of items. Previous work has found that retro-cues can protect multiple items from interference, but perhaps one-item cues can further reduce interference from other items or facilitate response planning (Barth & Schneider, 2018;Schneider et al., 2017).
It is highly debated whether working memory limitations reflect discrete storage limits (e.g. slots or pointers, Zhang & Luck, 2008), a more flexible resource (Bays & Husain, 2008), or interference by concurrent representations (Oberauer & Lin, 2017). While it is natural to posit that high reward items consume more of a flexible memory resource, our results should not be taken as support for a flexible resources account. Alternatively, high reward items may tie up multiple slots or cause more interference. Critically, our findings are consistent with all these possibilities and suggest that high reward items consume more of that limited storage.
Besides challenging reallocation accounts, our results constrain alternative accounts by suggesting that retro-cues act independently of the distribution of memory strength among stored items. Importantly, our findings suggest that the retro-cue benefit is not tied directly to the fate of the un-cued item(s). Further, our results complicate proposals that retro-cue benefits reflect the properties of a privileged state within working memory (Rerko & Oberauer, 2013;Zokaei, Ning, Manohar, Feredoes, & Husain, 2014). For example, some researchers believe that while multiple items may be stored in working memory, only one item can be held in a special and privileged state (Cowan, 2011;Oberauer & Hein, 2012). It is logical to presume that high reward items are held in this privileged state (Allen & Ueno, 2018;Atkinson et al., 2018) or are encoded before other items (e.g. Ravizza, Uitvlugt, & Hazeltine, 2016; see also Klink et al., 2017). Our Experiment 1 and Experiment 2 observed retro-cue benefits even when the high reward item was probed. This might suggest that that the focus of attention is not necessary to explain retro-cue benefits or that high reward items are not held in a privileged state. Future work is necessary to more fully elucidate the mechanisms behind top-down control of working memory. However, the present results place strong limitations on plausible mechanisms and challenge theoretical accounts based on reallocation of a limitedcapacity storage medium.

Declaration of Competing Interest
None.