Introduction

Research on attention commonly distinguishes top-down from bottom-up attention. Bottom-up attention is studied by characterizing how salient information in the external world captures attention. Top-down attention is studied by manipulating how plans and goals affect what is attended, typically with explicit instructions. Although useful, this dichotomy omits another key conceptual distinction: That both of these forms of attention can be influenced by different types of memories (Aly & Turk-Browne, 2017; Awh et al., 2012; Chen & Hutchinson, 2019; Hutchinson & Turk-Browne, 2012). These memories may derive from distinct sources, perhaps recently acquired and maintained in working memory (WM) or stored in long-term memory (LTM) (Nobre & Stokes, 2019). It can be argued, in fact, that this prospective property of memory – its ability to guide attention – effectively constitutes its ecological purpose (Nobre & Stokes, 2019; van Ede & Nobre, 2023).

Despite recognition that both WM and LTM guide attention, it is largely unknown how they may interact when given the opportunity to guide attention together. In fact, there is relatively little research on how multiple active memories may guide attention, nor consensus regarding the conditions under which guidance from multiple items is exhibited (see, e.g., Frătescu et al., 2019, for a relevant debate). Even less well studied is how memories of different types (e.g., LTM and WM) may cooperate or compete with one another to guide attention when active in the same task. Some studies have examined attentional guidance from multiple co-active features from either WM or from LTM (Bahle et al., 2020; Chen & Du, 2017; Zhang et al., 2018; Zhou et al., 2020), but don’t examine LTM-WM interactions. These findings indicate, however, that at least two WM or two LTM items can be co-active in memory-guided attention, and that they are capable of both cooperating and competing for attentional resources. The limited research on interactions between LTM and WM during attentional guidance has found that they can both guide attention in the same task, but this work tends to examine LTM-WM interactions in only one direction (e.g., how WM may interfere with LTM-guided search; Günseli et al., 2016) or lacks baselines to permit distinction between cooperation and competition (e.g., Schwark et al., 2013).

The present study aims to fill these gaps by exploring how LTM and WM might be used to cooperatively guide attention together, but also how they may compete – slowing attentional guidance when the two representations conflict as to where attention should be directed. Addressing these questions allows us to propose a model of how memory-guided attention is orchestrated when multiple memories are active in a given task.

We were inspired by work showing that both WM and LTM can guide attention (Baddeley, 2003; Desimone & Duncan, 1995; Fan & Turk-Browne, 2016; Stokes et al., 2012; Summerfield et al., 2006). When information in WM and LTM is consistent — i.e., suggests the same attentional goal – we might therefore expect cooperation between these memories. This would lead to behavioral facilitation relative to when only one representation guides attention. Such cooperative facilitation between memory types would accord with the literature on “redundancy gains” whereby a behavioral advantage is conferred when multiple pieces of information support the same decision, relative to when only one supports it (Bahle et al., 2020; Danek & Mordkoff, 2011; Fan & Turk-Browne, 2016; Miller & Low, 2001). That is, an environmental stimulus matching two active memory representations – regardless of whether sourced from WM or LTM – should more efficiently guide attention than a stimulus that matches a WM or LTM in isolation. To our knowledge, no evidence suggests that such guidance should be asymmetrical, in that WM would help LTM-guided search more or less than LTM would help WM-guided search; so, in the case of cooperation, we expect dual guidance from WM and LTM will entail symmetrical facilitation.

It is not always the case, however, that memories suggest the same attentional goal. In such a scenario, how might WM and LTM compete for attention? Research has demonstrated that attentional guidance from WM or LTM can occur automatically, even when irrelevant to the current task and employing them would lead to distraction from current attentional goals (Downing, 2000; Fan & Turk-Browne, 2016; Günseli et al., 2016; Nickel et al., 2020; Soto et al., 2005, 2008). This suggests that when WM and LTM are placed in competition with one another – that is, when a WM representation and an LTM representation suggest attentional goals that are inconsistent with one another, even if only one memory is currently task-relevant – behavioral deficits should emerge relative to when only one memory guides attention.

We test two alternative hypotheses about how WM and LTM may compete with one another. First, because both WM and LTM can lead to automatic attentional capture (Downing, 2000; Fan & Turk-Browne, 2016; Günseli et al., 2016; Nickel et al., 2020; Soto et al., 2005, 2008), one might hypothesize that competition would be symmetrical. That is, irrelevant WM representations may hinder LTM-guided search just as much as the other way around. This hypothesis would be consonant with findings showing common representational formats for WM and LTM (Cowan, 1993; Cowan, 1998; Cowan, 2008; Fuster, 1997; Ranganath & Blumenfeld, 2005; Vo et al., 2022), and/or the notion that activated representations in LTM must be “placed” into WM (Atkinson & Shiffrin, 1968). If this is the case, then there might be no functional difference between items “in WM” and (activated) items “in LTM” – both should be represented similarly. If so, they should compete with one another symmetrically, as noted above.

An alternative hypothesis, however, is that competition between WM and LTM may be asymmetrical. Research has suggested that a “flexible gate” governs the transmission of information from LTM to WM: LTM is admissible only when it would be useful given current task demands, and blocked if it would hinder performance (Mızrak & Oberauer, 2021; Oberauer et al., 2017; see also Verschooren et al., 2021). If so, there should be a larger cost to performance when using LTM to guide attention in the face of distraction from WM, relative to when WM is used in the face of distraction from LTM. That is, WM-guided attention may be protected from interference by accessory LTM representations when they would hurt performance; but LTM-guided attention may be more prone to distraction from irrelevant WM representations.

Mechanistically, this asymmetry may be explained by the concentric activation model (Oberauer, 2002, 2009), which distinguishes between three functional states of WM: The activated part of LTM (aLTM), the region of direct access (RDA), and a single-item focus of attention (FoA). According to this model, the memory selected for the next cognitive operation should be provided the privileged status afforded to items in the FoA. WM information that is accessory to the primary task at hand – currently irrelevant but still active – should be represented in the RDA. It should therefore be distinguishable from accessory LTM information, theoretically represented in aLTM. If these unique functional states confer differential effects when guiding attention, we may observe competitive asymmetry: Items in the RDA may be more likely to capture attention than items in aLTM. Asymmetry in competition may also arise because of differences in susceptibility to interference. WM is more vulnerable to interference, and requires active maintenance, or it may be lost; LTM on the other hand, if temporarily lost during WM-guided attentional processes, may still be capable of being reactivated at a later time. Given this, irrelevant WMs may be more distracting than irrelevant LTMs.

To test these hypotheses, we use a novel paradigm that allows us to dissociate cooperative and competitive interactions between WM and LTM during memory-guided search. Participants performed a task in which they searched for an item maintained in WM — or activated from LTM – while holding in mind an accessory item cued from LTM or WM. Three search display conditions were used: The accessory item either guided attention to the same spatial location as the prioritized memory, a different spatial location, or was not present in the display. Following each trial, participants precisely reported features of the accessory item. This paradigm allowed us to determine how WM and LTM compete or cooperate to guide attention.

Experiment 1

Methods

Participants

Data were collected until the final sample comprised 115 participants who met the inclusion criteria (described below). The final sample size of 115 was determined a priori based on a review of relevant literature, in which RT was a dependent variable and memory contents competed for attentional control (e.g., Beck et al., 2012 (807 observations/condition); Chen & Du, 2017 (across-experiment average: 1,448 observations/condition); Fan & Turk-Browne, 2016 (across-experiment average: 1,380 observations/condition); van Moorselaar et al., 2014 (across-experiment average: 1,020 observations/condition); Zhang et al., 2018 (1,152 observations/condition)). The total number of observations per condition (trials x participants) were calculated for these relevant studies (Baker et al., 2021) and the sample size for this study determined by approximately matching the highest values from that calculation while accommodating specific constraints of our study (e.g., need to balance the number of times a given color was prompted). Given these constraints, we obtained 1,725 observations/condition (90 trials * 115 participants/six conditions).

To meet that target sample size, 320 participants were recruited for an online study using Prolific (www.prolific.co; 251 participants) or the Columbia University Psychology Department Participant Pool (69 participants). Those recruited on Prolific were pre-screened for English fluency, nationality (only USA), and age (18–40 years). All participants provided informed consent to a protocol approved by the Columbia University Institutional Review Board, and received $6.50/h or course credit, respectively, as compensation. 137 of these participants were unable to pass one of the Testing Phases; their participation was terminated early and they are not included in the final sample.

To ensure that we had adequate data across all six conditions for each participant, and that each participant was doing the task as instructed, we set an a priori threshold to include only those participants who responded correctly to both components of a given trial (as described in Procedure/Search Phase) in ≥50% of trials. Sixty-two participants from the remaining sample did not reach this criterion, and thus none of their data are included in the final sample. Following these rejections, the final sample comprised 115 participants, as noted above (Mage = 26.2 ± 6.6 years, Meducation = 14.9 ± 2.4 years). Sixty-one of these participants identified as women and six as non-binary; the rest identified as men. Of this final sample, 71.4% identified as White, 19.6% as Asian, 9.8% as Black or African American, 4.8% as American Indian/Alaskan Native, 0.9% as Native Hawaiian or Pacific Islander, and 2.7% identified as part of a different racial group; in addition, 11.6% of these participants identified their ethnicity as Hispanic or Latino.

Participants were not screened for color blindness. However, individuals were only included in the final sample if they were able to perform well on the Training and Test Phases and responded correctly to both components of a Search Phase trial on ≥50% of trials. Is therefore unlikely that included participants were color blind, or, if they did have some degree of color blindness, it did not interfere with their ability to perform well on the task.

Stimuli

Color generation

Two sets of five colors were generated in Hue, Saturation and Luminance (HSL) colorspace (see Fig. 1a). Saturation and luminance were held constant – at 100 and 50, respectively – so that colors varied only along the hue dimension. Colors were not entirely equidistant from one another on the HSL colorwheel due to subjective similarity of the green hues. A constant “central green” hue (115°, between the upper and lower bounds of subjective similarity: 75° and 156°) was always present in the first of the two sets. The rest of the hue wheel was then equally divided (each hue was 31° apart) to generate the remaining nine colors present in the two sets. Colors alternated between sets, so that no set contained colors within 62° of each other. A variable buffer (between 0° and 31°) was added at the “starting point” of the hue generation process to allow the generated sets to differ from one another across blocks. The colors were generated in sets (as opposed to random selection from the color wheel) to be consistent with prior literature in the attentional capture field (Anderson & Halpern, 2017; Nickel et al., 2020) as well as to ease memorization in the Training Phase.

Fig. 1
figure 1

Example generation of color sets (ad) and shape sets (cd) for each experiment. For more details, see the Color Generation or Shape Generation section of each experiment’s methods. a In Experiment 1, ten colors were generated on a hue colorwheel in HSL colorspace. The gray slice of the colorwheel denotes the buffer zone around the “center green” hue. All other colors were separated from one other by 31° on the hue colorwheel. b In Experiment 2, ten colors were generated on the CIE L*a*b* colorwheel at equidistant intervals (36°). c In Experiment 3, six colors were generated on the CIE L*a*b* colorwheel at equidistant intervals (60°). Likewise, six shapes were generated along the VCS shapewheel at equidistant intervals (60°). d In Experiment 4, the color and shape generation matched that of Experiment 3. Bold lines indicate colors (or shapes) in one set; dashed lines indicate colors (or shapes) in the other set

Color assignment

After being generated, we randomly assigned color sets to memory conditions. The first set of colors was assigned to the WM color set and the second set of colors was assigned to the LTM color set half of the time; the assignment of color sets to memory conditions was reversed the other half of the time to ensure that any given color was equally likely to be present in the WM or LTM color set. Two colors were cued on each trial: One LTM color (that would be retrieved from LTM) and one WM color (that would be presented on the screen). The two colors on any given trial were required to be >90° apart on the hue colorwheel so as to be visually distinct; this constraint meant that each color was paired with three colors in the other set, permitting the use of 15 color pairs. There were 30 trials per block, so each color pair was cued twice per block. These color pairs were assigned to conditions such that each color was cued six times per block, once in each of the six conditions (described below). Therefore, each color was prompted (i.e., searched for) three times and unprompted (i.e., not searched for, but reported after search) three times.

Colorwheel

The HSL colorwheel used in Experiment 1 was adapted from project iro.js (github.com/jaames/iro.js) and modified to respond to user input and fit the needs of this experiment. First, to match the generated colors (see Stimuli/Color generation), saturation was held constant at 100 and luminance at 50. Thus, the colorwheel allowed just the selection of hues along a 360° circle. Participants could hover their cursor over the colorwheel to change the color of a central circle placed inside of the colorwheel to allow participants to view the color fully. Clicking selected the color corresponding to the clicked location on the hue colorwheel. At each presentation of the colorwheel screen, the colorwheel was rotated by 0–359°. This was done so that participants had to attend to, and encode, color itself rather than a relative location on the screen.

Scenes

Ninety color scene images were manually selected from the "Massive Memory" Scene Categories database (Konkle et al., 2010). We avoided using multiple scene images from categories that had substantial visual overlap (e.g., only one image was selected from the two categories “inside car” and “inside bus”), and avoided scene images that contained text. All images had a resolution of 256 × 256. Scene images were randomized into three groups – one for each block of the experiment – the order of which was randomly determined per participant. Additionally, the order of presentation of these images was randomized within each block.

Procedure

Upon recruitment, participants were directed via URL to the experiment, which was hosted on Gorilla (www.gorilla.sc). Instructions were first provided describing the structure of the experiment: There were three blocks of the experiment, and each block comprised three successive phases: A Training Phase, a Testing Phase, and a Search Phase (see Fig. 2). Participants were informed that breaks were provided between blocks of the experiment but that breaks were not permitted within the blocks. (Each instruction page had a maximum allowable time for viewing, after which the instructions would advance if the participant had not manually done so to ensure that participants could not take long breaks within blocks.) Lastly, participants were asked to use a computer mouse and to turn the brightness on their screen to maximum for the duration of the experiment (to facilitate color discrimination), though adherence to this could not be properly assessed due to the online nature of the experiment.

Fig. 2
figure 2

Experiment 1 procedure. Participants completed three blocks, each consisting of a Training Phase, a Testing Phase, and a Search Phase. a The Training Phase of each block consisted of 30 scene-color pairs to be encoded into LTM. Participants first viewed each scene and its associated color, then immediately reported the color associated with each scene on the hue colorwheel. Each scene-color pair was presented three times during this phase. b The Testing Phase required that participants recall and accurately report the color associated with each scene, when presented with the scene alone. c The Search Phase presented two cues on each trial: An image that had been encoded into LTM, for which participants had to remember the associated (LTM) color; and a "new" (WM) color in a filled circle. After being prompted to search for either the "remembered" (R) or "new" (N) color, participants clicked the prompted color in the search display. There were three conditions for this search display: Consistent trials contained both the prompted and unprompted colors in a single circle; Only trials contained the prompted color only (and not the unprompted color); and Inconsistent trials contained both the prompted and unprompted colors, but these two colors were placed inside separate circles on the display. After the search display, participants were asked to report the unprompted color on the hue colorwheel. The example search and unprompted color report displays in this figure are for a trial in which the LTM ("remembered") color is prompted. For illustration purposes, the example search displays are labeled to denote the location of the prompted (“P”) and unprompted (“U”) colors

At the beginning of each block, two sets of five colors were generated (see Methods/Stimuli/Color generation for more details). As noted above, one of these sets was assigned as the LTM set (to be learned in the Training Phase and later searched for in the Search Phase) and the other was assigned as the WM set (to be searched for in the Search Phase).

Training Phase

Each Training Phase (Fig. 2a) consisted of 30 unique scene-color pairs presented adjacent to one another on the screen for 3,000 ms. Each of the five LTM colors was paired with six different scenes. The colors were presented as filled circles. Whether the scene was presented on the left or the right side of the screen (and, concomitantly, whether the color was on the right or left; for the Search Phase as well) was randomly chosen at the beginning of the first block and then switched between each subsequent block.

Participants were asked to memorize the association between each image and its color. They were informed that they would later be tested on the scene-color pairs (Testing Phase) by reporting the color associated with each scene. After the presentation of a scene-color pair for 3,000 ms, a colorwheel was presented for ≤8,000 ms. Participants were tasked with selecting the color that they had viewed on the previous screen, to facilitate encoding. Upon submission of the color and provision of feedback about their accuracy (see below), the participant could manually advance the screen before the 8,000 ms expired.

Upon submission of reports on the colorwheel, participants received visual feedback about how close their report was to the correct color: A dotted line, rendered in the color reported, was drawn from the center of the colorwheel to the reported color; and – to give participants an estimate of how close they were – a solid line, rendered in the correct color, was drawn from the center to the correct color. In addition to this visual feedback, textual feedback was provided above the colorwheel. During the Training Phase, a report 0° from the correct color triggered the textual feedback, “Perfect! Exactly 0 degrees off”; ≤4°, “Fantastic! Only X degrees off”; ≤10°, “Close! Just X degrees off”; ≤30°, "Incorrect. X degrees off"; >30°, "Paying attention? X degrees off.” (The “X” in each of these feedback statements was replaced with the number of degrees a participant’s report was from the correct color.) On the colorwheel used in the Search Phase, however – on the unprompted report screen (see Methods/Procedure/Search Phase) – only the dotted line described above was used. That is, on the Search Phase colorwheel, no indication was given as to the correct color when one was selected.

Each of the 30 scene-color pairs was repeated three times in the Training Phase. All scene-color pairs were presented once before a given scene-color pair was repeated. A brief (≤33 s) break was provided after one cycle of the 30 scene-color presentations, during which participants were informed that the pairs would now repeat, alongside an encouraging message.

Testing Phase

During the Testing Phase, participants were told to report the color associated with any given scene when presented with just the scene. They were informed that if they could not accurately report a color after eight presentations of a given scene, the experiment would terminate early.

The Testing Phase (Fig. 2b) consisted of a centrally presented scene (3,000 ms) that had been associated with a color in the Training Phase. Following this, the colorwheel was presented centrally and the participants reported the color previously associated with that scene (≤8,000 ms).

After the color report, participants saw the same visual feedback regarding their accuracy as they did in the Training Phase. The textual feedback, however, was altered such that a report ≤10° from the correct color triggered the feedback “Correct,” while a report of >10° away triggered “Incorrect.” In accordance with this textual feedback, participants’ reports were considered “correct” if they were ≤10° from the correct color on the colorwheel. If a scene was responded to incorrectly, it was added to the next cycle of the Testing Phase. The second cycle began after the first cycle through all 30 scenes, and subsequent cycles through the incorrectly responded-to scenes continued until participants were able to respond correctly to each scene.

As they had been instructed, if a participant could not respond correctly to a scene after eight test cycles, the experiment terminated early. Between each test cycle, participants were provided with a brief (≤33s) break screen that reported the percentage of scenes they had responded to correctly so far, and reminded them of the need to be able to respond to all scenes correctly, alongside an encouraging message.

Search Phase

After successful completion of the Testing Phase, participants read instructions for the Search Phase, which consisted of a visual search task and a post-search color memory task. After passing a brief quiz to ensure they had understood the Search Phase procedure, participants began the Search Phase.

In the Search Phase (Fig. 2c), participants were first presented with a central fixation cross (for ≤10 s). To begin each search trial, they clicked the middle of the cross. This was done to center their mouse cursor relative to the browser window when each trial started. Participants received a reminder to “click the center of the cross to start the trial” after 7 s, and the trial would begin if three more seconds elapsed without a response. The following screen contained a scene image that had been previously associated with a color in the Training Phase. Alongside this scene, a colored circle was presented, like in the Training Phase; this circle was, however, filled with a color from the WM color set. This screen was presented for 3,000 ms. At this point in the Search Phase, participants had been instructed to “simultaneously” keep in mind this “new” (WM) color and bring to mind the “remembered” (LTM) color previously associated with the scene image during the Training Phase. Participants were informed that both colors would be equally important.

A brief (300 ms) color mask was then flashed on the screen to flush sensory information. This mask was 400 × 400 pixels, and for every search trial each pixel was filled with a color randomly selected from HSL space (only selected along the hue dimension, otherwise with the same parameters as the color wheel; i.e., saturation was held constant at 100 and luminance at 50).

Participants were then prompted to search for one or the other of the two colors. A black letter “N” presented in the center of the screen prompted the participant to search for the “new” color (WM prompt); and an “R” prompted the participant to search for the “remembered” color (LTM prompt). This prompt was displayed on the screen for 700 ms. Each trial, therefore, had both a prompted color and an unprompted color.

The search display was then presented for ≤3,000 ms. The search display contained five circles at equal eccentricity from the center of the browser window and equidistant from one another. These circles were rotated around the center of the browser window in each search trial at a random angle between 0° and 359°. Each circle was divided in two, yielding ten differently colored half-circles. Each of the five circles was also rotated around its own central axis (the dividing line between the two colors) by a randomly selected angle across trials. These angles were selected to be equidistant from one another (e.g., for the five circles, each circle was rotated by 4°, 76°, 148°, 220°, or 292°). The purpose of this rotation was to potentially reduce participants’ ability to predict the location of stimuli on a given search display, in an attempt to more heavily tax visual search processes.

Participants were tasked with clicking on the half-circle that contained the prompted color as fast as possible. There were six conditions in total: As noted above, participants could be prompted to search for either the WM color or the LTM color. Within these two conditions, a search trial could be Consistent, Only, or Inconsistent. Consistent trials occurred when one of the five circles contained both the prompted and the unprompted color; thus, both the LTM color and the WM color guided attention to a spatially consistent location. Only trials occurred when the unprompted color was not in the display at all, and only the prompted color was in the display. Inconsistent trials occurred when both the prompted and the unprompted colors were in the display, but were in separate circles; thus, WM and LTM would guide attention to different (inconsistent) locations. The location of the prompted color was randomly determined on each trial. The location of the unprompted color was necessarily the half-circle adjacent to the prompted color on Consistent trials. On Inconsistent trials, the location of the unprompted color was randomly determined, following the constraint that it could not be the half-circle adjacent to the prompted color. The unprompted color was not displayed on Only trials.

The remaining half-circles were filled with distractor colors. For each search display, eight or nine “distractor” colors (depending on condition) were generated in the same colorspace. These distractor colors were generated at random on the hue colorwheel, but at a minimum distance of 30° from both the LTM and WM colors cued on that trial, and at a minimum distance of 20° from other distractors. No circle could contain two colors that were <45° apart. Additionally, no distractor could be drawn within the “center green” buffer zone described above (under Stimuli/Color Generation), as these potential distractors were deemed to be too indistinguishable from the “center green” color itself. Upon clicking on any half-circle, the experiment advanced to the next screen.

After responding on the search display, participants were once again presented with the colorwheel. On this colorwheel, participants were tasked with reporting the unprompted color. That is, they were told to report the color that was cued at the beginning of the trial – either directly with a color (WM color) or indirectly with a scene (LTM color) – but was not searched for in the search display. We refer to this as the unprompted color report. Thus, if a given search trial was LTM-prompted, then participants would report the WM color associated with this trial and vice versa. Unlike in the Training and Testing Phases, participants were not provided with any feedback about their precision on this colorwheel. Participants completed one search trial for each scene-color combination for a total of 30 search trials per block (90 search trials total); thus, each scene was shown once. Half of the trials were LTM-prompted, the other half WM-prompted; one third of the trials were Consistent, one third Only, and one-third Inconsistent (divided equally between WM and LTM conditions).

At the conclusion of each block, participants were given a maximum 5-min break before advancing to the next block. At the start of each phase after the first block, participants were offered the opportunity to re-read or to skip the instructions for that phase. Upon successful completion of all three blocks, or unsuccessful completion of a Testing Phase, participants reported their demographic information and completed a brief debriefing questionnaire before being returned to their recruitment website and compensated.

Data

Exclusions

Only those trials in which participants responded correctly to both the search display and the unprompted color report were included. These trials are referred to here as “both-correct.” This criterion was implemented to ensure that the trials in the main analyses were those for which participants maintained representations of both the prompted and the unprompted color. Search trials were considered correct if participants clicked within the half-circle filled with the prompted color while the search display was on the screen. The unprompted color report was considered correct if participants’ report was ≤20° from the correct hue. This a priori threshold was selected to remain at least 10° away from the nearest possible distractor color, while allowing more error than the Testing Phase to accommodate less precision in reporting WM colors (participants had not practiced reporting colors from the WM color set until the Search Phase). Upon visualization of participants' responses, this threshold appeared to capture roughly 80% of the response distribution. An average of 67% of Search Phase trials were both-correct.

Analysis

We modeled the primary dependent variable, log-transformed response time (RT) on the search task, with multilevel linear regression on participants’ trial-wise data.

This analysis was conducted using R, version 4.0.5 (R Core Team, 2022), with the lmer function from the lme4 package (Bates et al., 2015).

To test whether cooperation and competition arise between WM and LTM representations in this paradigm, we fit a mixed-effects model on log RT that included prompt (LTM or WM) and search display condition (Consistent, Only, or Inconsistent) as fixed factors, along with their interaction, and included random effects for the intercept and prompt terms to allow those estimates to vary between participants. That is: lmer( log(RT) ~ prompt*searchCondition + (1 + prompt | Participant) ). Trial prompt was effect-coded (LTM = 1, WM = -1) to allow the other coefficients to be estimated at the grand mean of LTM- and WM-prompted trials. Search display condition was dummy-coded (Only = 0; Consistent = 1; Inconsistent = 2) so as to facilitate comparisons between Consistent and Only trials, as well as between Inconsistent and Only trials.

The maximal model (with all random effects specified) was singular, and thus we tested for over-parameterization with a principal component analysis (PCA) of the random-effects variance-covariance estimates using the rePCA function of the lme4 package. This allowed us to determine the random effects that the data were capable of supporting (Bates et al., 2015). Excluded random-effects terms were removed on the basis of least explained variance until the model converged. In removing a random slope, it is assumed that the variable is invariant across participants. The Akeike Information Criterion indicated that the removal of terms identified by the PCA improved goodness of fit. Predictor estimate significance (based on conditional F-tests with Kenward-Roger approximation for the degrees of freedom) was computed using the tab_model function from the sjPlot package (Lüdecke, 2018; Wickham, 2018).

For visualization purposes, we show the RT difference between Consistent and Only trials, and between Inconsistent and Only trials; these difference scores were first calculated for each participant and then averaged across participants. We also exponentiated model-predicted log RT to yield estimated RT effects in ms; these differences are reported for the conditions of interest.

Lastly, because generalized linear mixed models (GLMMs) can sometimes outperform linear mixed models on transformed data (Lo & Andrews, 2015), we re-ran all analyses from Experiments 14 with a generalized linear mixed model. This approach used the glmer() function from lme4 in R (Bates et al., 2015; R Core Team, 2022). The RT data were fit with a gamma family distribution with a log link function. All of the GLMM analyses produced similar results to, and supported conclusions from, our main analysis on log RT. The gamma distribution, however, provided a numerically worse fit to our RT distributions than the normal distribution did for log RT; the log RT linear mixed model is therefore reported in the main text.

Results

Training and Testing Phases

Participants successfully learned the association between each scene and its color. In the Training Phase, mean (± SD) precision on the colorwheel (in degrees from the correct color; across all trials) generally improved over cycles: 6.72° ± 11.61° for cycle one; 5.90° ± 11.78° for cycle two; 5.59° ± 11.72° for cycle three. The mean (± SD) number of cycles per scene needed to pass the Testing Phase was 1.45 ± 0.97, and 75.39% of scene-color pairs only needed to be tested once. Mean (± SD) colorwheel precision was 3.57° ± 2.52° from the correct hue on the final test of each scene (i.e., when reported correctly from LTM).

Search Phase

Search accuracy

When searching for a color in LTM, participants successfully clicked the prompted color on the search display in 78.39% ± 11.19% of trials. When searching for a color in WM, participants successfully clicked the prompted color in 89.15% ± 8.79% of trials. No response was registered on the search display within the time allotted on 3.36% ± 2.53% of LTM-prompted trials; on WM-prompted trials this rate was 1.36% ± 1.48%.

Unprompted color report

Participants’ unprompted color reports after the search display were considered accurate (≤20° from the correct hue) on 82.26% ± 10.69% of trials when reporting a color from WM (i.e., on LTM-prompted trials). On trials in which participants were reporting the color in LTM (i.e., on WM-prompted trials), they were correct 88.65% ± 7.71% of the time. When reporting the color from WM, 0.10% ± 0.32 of trials did not receive a response in the time allotted; this never occurred when reporting the LTM color.

Search response times

RTs and accuracy in the Search Phase are summarized in Table 1. Our main analysis investigated RTs in the search task as a function of prompt (WM vs. LTM) and search condition (Consistent, Inconsistent, Only). We only analyzed trials in which participants both (1) clicked on the correct color in the search display and (2) reported the unprompted color accurately. The linear mixed model (see Experiment 1/Data/Analysis) revealed a main effect of prompt on log RTs such that participants were significantly slower to respond on the search display for LTM-prompted trials than for WM-prompted trials (β = 0.080, SE = 0.007, < 0.001). See Fig. 3a for a plot of all fixed effect estimates, and Fig. 3b for a plot of raw RT by prompt.

Table 1 Summary of response times (RTs) and accuracy in the Search Phase
Fig. 3
figure 3

Results from Experiment 1. a Coefficient estimates of the model for trial-wise log response time (RT). Error ribbons represent the 95% confidence interval; ***p < .001. b Long-term memory (LTM)-prompted trials (vs. working memory (WM)-prompted trials) were associated with a slower response on the search display. Error bars represent SEM of the within-participant (LTM-WM) difference. c Mean RT difference for Consistent vs. Only and Inconsistent vs. Only trials. Search RTs were facilitated on Consistent (vs. Only) trials; RT on Inconsistent trials was no different from Only trials. Error bars represent SEM of the within-participant difference between Consistent and Only and Inconsistent and Only trials

As hypothesized, RTs were significantly faster on Consistent trials relative to Only trials (β = -0.032, SE = 0.008, p < 0.001; model-estimated RT benefit on Consistent vs. Only trials = 38.31 ms). Contrary to expectations, the model did not reveal an effect of Inconsistent (vs. Only) search display on log RT (β = -0.004, SE = 0.008, p = 0.629). There were no significant interactions between prompt and search condition. To illustrate these effects, the RT data are plotted in milliseconds by prompt and condition in Fig. 3c.

Discussion

Results from Experiment 1 indicated that visual search is facilitated when representations in LTM and WM could cooperate to support attention to a single location. Specifically, participants’ RTs were faster when the LTM and WM colors were contained in the same circle in the search display, relative to when only one of the colors was present in the display. This attentional facilitation occurred regardless of whether the LTM or WM color was being searched for (i.e., was the prompted color), and, concomitantly, whether it was a WM or a LTM representation that was an accessory to the search task (i.e., was the unprompted color). However, when the LTM and WM colors were spatially distant from each other (i.e., were in different circles), and therefore would have competed – rather than cooperated – for attention, we found no evidence of attentional capture by the unprompted color, regardless of which memory type was prompted.

Our conclusion from Experiment 1 is that WM and LTM can both affect visual search in the same trial, but that their interaction was limited to when these memories could cooperate, and not found when they were in competition. This particular imbalance – cooperation but not competition – was not expected: We had hypothesized competition between memories, potentially with WM competing with LTM-guided search more than the other way around. Accordingly, a second experiment was developed to confirm these findings and expand upon the phenomenon’s generalizability. Three key points were addressed in this second experiment. First, to address concerns about high participant rejection and trial exclusion rates in Experiment 1, we reduced the demands placed on participants by shortening the length of each phase: The number of scene-color associations to be memorized (and later used in the Search Phase) on any given block was halved, and the number of blocks doubled to compensate. In addition, a feedback screen was added to the end of each search trial to encourage correct responses in the Search Phase. Second, a more perceptually uniform colorspace (CIE L*a*b*) was used in Experiment 2 to diminish any potential variance in discernibility between generated colors, enable the use of fewer constraints in selecting colors, and align more with prior literature. Lastly, the search display was reorganized to contain fewer elements, and these elements were moved closer to the center of the display to concentrate their presence in the foveal region of the visual field.

Experiment 2

Methods

Participants

Data were collected until the final sample comprised 115 participants who met the inclusion criteria. This sample size was selected to match that of the first experiment. To meet that target sample size, 148 participants were recruited for an online study using Prolific (www.prolific.co). As before, these participants were pre-screened for English fluency, nationality, and age; provided informed consent to a protocol approved by the Columbia University Institutional Review Board; and received $6.50/h as compensation. Fourteen of these participants were unable to pass one of the Testing Phases and are not included in the final sample. Nineteen participants from the remaining sample did not correctly respond to ≥50% of trials in the Search Phase (i.e., did not respond correctly to both the search display and unprompted color report), so none of their data are included. Our procedural changes were therefore successful at reducing the participant rejection rates seen in Experiment 1. Following these rejections, the final sample comprised 115 participants, as noted above (Mage = 24.9 ± 5.1 years, Meducation = 15.3 ± 2.5 years). Ninety-three of these participants identified as women and four as non-binary; the remaining identified as men. Of this final sample, 83.3% identified as White, 14.0% as Asian, 6.1% as Black or African American, 1.8% as American Indian/Alaskan Native, 0.9% as Native Hawaiian or Pacific Islander, and 3.5% identified as part of a different racial group; in addition, 12.3% of these participants identified their ethnicity as Hispanic or Latino.

Stimuli

The methods for color assignment and scene stimuli were identical to those in Experiment 1.

Colorwheel

The CIE L*a*b* colorwheel used in Experiment 2 was custom-built using JavaScript, but developed to function equivalently to the adapted HSL colorwheel from Experiment 1. The colorwheel was centered at L*=70, A*=12, B*=13, with a radius of 60 (similar to prior online studies, e.g., Shin & Ma, 2016). Visual and textual feedback to responses on the colorwheel was identical to that in Experiment 1.

Color generation

As in Experiment 1, two sets of five colors were generated, one to be assigned for LTM and one to be assigned for WM. This time, however, they were generated in CIE L*a*b* colorspace along the colorwheel described above (see Fig. 1b). CIE L*a*b* colorspace is roughly perceptually uniform and mimics the trichromatic vision of humans (i.e., is defined by luminance, green–red, and blue–yellow dimensions), meaning that two colors selected at a given distance apart are about as visually similar as two other colors selected at the same distance. Luminance was held constant so that stimuli sampled from the circumference of the colorwheel vary incrementally only in hue. Converting to this colorspace relaxed color generation constraints that were necessary when using the HSL colorspace of Experiment 1 (i.e., the use of the "center green" and its buffer zones; see Experiment 1/Methods/Stimuli/Color generation). Colors generated were therefore equidistant from one another on the CIE L*a*b* colorwheel, each color being 36° apart. Similar to Experiment 1, generated colors alternated between sets (i.e., no set contained colors within 76° of each other); a variable buffer (between 0° and 36°) was added at the “starting point” of the hue generation process to allow the generated sets to differ from one another across blocks; and assignment of sets to memory condition (WM or LTM) was randomized and balanced.

Procedure

As in Experiment 1, participants were directed via URL to Gorilla (www.gorilla.sc) and given instructions describing the structure of the experiment. These instructions differed minimally from those provided in Experiment 1 because, structurally, Experiment 2 was similar to Experiment 1 (see Fig. 4).

Fig. 4
figure 4

Experiment 2 procedure is identical to Experiment 1 (see Fig. 2) except for the following: a 15 scene-color pairs were encoded in each block, rather than 30; there were six blocks rather than three; and the CIE L*a*b* colorspace was used. b A re-study section (not shown) helped participants re-encode scene-color pairs that had been imprecisely responded to. c Three circles were present in the search display instead of five, and the circles were closer to fixation. Lastly, participants were provided with a Search Feedback screen, informing them of whether their responses on the search task and unprompted color report were correct / incorrect / not-responded-to. As before, the example search and unprompted color report displays in this figure are for a trial in which the long-term memory (LTM) ("remembered") color is prompted, and, for illustration purposes, the example search displays are labeled to denote the location of the prompted (“P”) and unprompted (“U”) colors

There were four key procedural differences between this and the first experiment. The first change is that blocks were split in half, relative to Experiment 1, such that there were six blocks comprising 15 trials rather than three blocks of 30 trials. This was done primarily to facilitate encoding of scene-color associations and reduce the failure rate in the Testing Phase, but also to potentially reduce errors from failures of sustained attention in the Search Phase. The two sets of colors were generated (see Methods/Stimuli/Color generation) at the beginning of every other block. Following the procedures in Experiment 1 (see Experiment 1/Methods/Stimuli/Color assignment), 30 trials were generated from those colors and then each 30-trial set was split randomly and equally between two blocks. In this way, the six blocks in Experiment 2 corresponded as closely as possible to the three in Experiment 1.

The second key change is that a re-study section was added between each cycle in the Testing Phase. This re-study section was equivalent to a mini-Training Phase for incorrectly responded-to scene-color pairs, and was implemented to enhance the encoding rate of scene-color pairs that participants were unable to recall on a given test cycle. An additional brief break of ≤33 s – like the break previously provided between each test cycle – was provided between re-study and test sections.

The third key change was to the search display (see Fig. 4c). The search display in Experiment 2 comprised three circles, yielding six differently colored half-circles. Concomitant with these modifications, fewer distractor colors were generated on each search trial. As such, distractors could be selected at a larger minimum colorspace distance from other elements on each trial’s search display: Four or five distractor colors (depending on condition) were generated ≥50° from the prompted/unprompted colors, and ≥35° from one another. As in Experiment 1, the circle stimuli could not contain two colors <45° apart, were at equal eccentricity from the center of the browser window, equidistant from one another, rotated around their central axes, and the circle stimuli were also rotated around the center of the browser window in each search trial. The last change to the search display in Experiment 2 was that the radius of the (invisible) background circle around which the circle stimuli were placed was reduced by 55.56% to bring the color stimuli closer to the center of the browser window. This was done to ensure that particularities about the organization of the search display in Experiment 1 were not responsible for the imbalance in attentional guidance effects; and – because identification, discrimination, and popout of visual details, like color, are superior in the fovea than in the periphery (Aagten-Murphy & Bays, 2019; Gutwin et al., 2017) – it was supposed that moving the location of the prompted and unprompted colors closer to the center of the window might increase attentional capture effects.

The fourth and final change from Experiment 1 was that a feedback screen was added at the end of each search trial to inform participants of their performance (see the bottom of Fig. 4c for an illustration). Feedback for both the search task and the unprompted color report was “Correct,” “Incorrect,” or “No Response” (reports were considered incorrect if ≥20° away on the unprompted color report). This was done to encourage participants to respond correctly on both the search task and the unprompted color report, and thus to reduce the number of participant rejections and trial exclusions based on trial correctness.

Data

Trial exclusion criteria were identical to Experiment 1. Only those trials in which participants responded correctly to both the search display and to the unprompted color report were included. Across participants, an average of 77% of Search Phase trials were both-correct.

Analyses were identical to those conducted in Experiment 1.

Results

Training and Testing Phases

In the Training Phase, mean (± SD) precision on the colorwheel (in degrees from the correct color; across all trials) generally improved over cycles: 8.53° ± 13.05° for cycle one; 7.27° ± 9.82° for cycle two; 7.17° ± 13.55° for cycle three. The mean (± SD) number of cycles per scene needed to pass the Testing Phase was 1.63 ± 0.96, and 60.18% of scene-color pairs only needed to be tested once. Mean (± SD) colorwheel precision was 4.20° ± 2.67° from the correct hue on the final test of each scene (i.e., when reported correctly from LTM). Using the CIE L*a*b* colorspace — as opposed to HSL space used in Experiment 1 — therefore did not generally improve participants’ precision on the colorwheel during the Training and Testing Phases. Nevertheless, participants successfully learned the association between each scene and its color.

Search Phase

Search accuracy

When searching for a color in LTM, participants successfully clicked the prompted color on the search display in 87.00% ± 8.76% of trials. When searching for a color in WM, participants successfully clicked the prompted color in 94.53% ± 5.63% of trials. No response was registered on the search display within the time allotted on 2.13% ± 2.51% of LTM-prompted trials; on WM-prompted trials this rate was 0.77% ± 1.06%.

Unprompted color report

Participants’ unprompted color reports after the search display were considered accurate (≤20° from the correct hue) on 78.80% ± 11.56% of trials when reporting a color from WM (i.e., on LTM-prompted trials). On trials in which participants were reporting the color in LTM (i.e., on WM-prompted trials), they were correct 85.16% ± 9.23% of the time. When reporting the color from WM, 0.24% ± 0.60 of trials did not receive a response in the time allotted; this never occurred when reporting the LTM color.

Search response times

RTs and accuracy in the Search Phase are summarized in Table 1. The primary analysis of RTs in the search task as a function of prompt (WM vs. LTM) and search condition (Consistent, Inconsistent, Only) revealed similar results as in Experiment 1. As before, we only analyzed “both-correct” trials in which participants both (1) clicked on the correct color in the search display and (2) reported the unprompted color accurately. The linear mixed model revealed a main effect of prompt on log RTs such that participants were significantly slower to respond on the search display for LTM-prompted trials than for WM-prompted trials (β = 0.082, SE = 0.006, p < 0.001). See Fig. 5a for a plot of all fixed effect estimates, and Fig. 5b for a plot of raw RT by prompt.

Fig. 5
figure 5

Results from Experiment 2 were similar to those from Experiment 1. a Coefficient estimates of the model for trial-wise log response time (RT). Error ribbons represent the 95% confidence interval; ***p < .001. b Long-term memory (LTM)-prompted trials (vs. working memory (WM)-prompted trials) were associated with a slower response on the search display. Error bars represent SEM of the within-participant (LTM-WM) difference. c Mean RT difference for Consistent vs. Only and Inconsistent vs. Only trials. Search RTs were facilitated on Consistent (vs. Only) trials; RT on Inconsistent trials was no different from Only trials. Error bars represent SEM of the within-participant difference between Consistent and Only and Inconsistent and Only trials

As in Experiment 1, RTs were significantly faster on Consistent trials relative to Only trials (β = -0.027, SE = 0.008, p < 0.001; model-estimated RT benefit on Consistent vs. Only trials = 29.17 ms), and, again, there was no significant effect of Inconsistent (vs. Only) search display (β = -0.008, SE = 0.008, p = 0.335). Replicating Experiment 1, no significant interactions were revealed between prompt and search condition. To illustrate these effects, the RT data are plotted in milliseconds by prompt and condition in Fig. 5c.

Finally, we conducted an additional analysis to see if combining data across Experiments 1 and 2 would yield additional effects, given higher statistical power. To that end, we combined the data from Experiment 1 and Experiment 2 and then analyzed log RT in a similar multilevel linear mixed model (the only change being the addition of experiment version as an interacting fixed factor). We observed an effect of experiment version such that RTs were faster in Experiment 2 versus Experiment 1 (β = -0.116, SE = 0.020, p < 0.001). Significant effects of prompt (WM vs. LTM) and Consistent (vs. Only) search condition persisted in this combined model, but no other effect was yielded: No significant effect of Inconsistent (vs. Only) trial type, nor any significant interactions.

Discussion

Results from Experiment 2 replicated the findings from Experiment 1. Visual search continued to be speeded when the unprompted and prompted colors were in the same object, relative to when no unprompted color was present. As before, this attentional facilitation occurred regardless of whether the items being searched for were in LTM or WM, and, accordingly, whether the unprompted color was in WM or LTM, respectively. Likewise replicating Experiment 1, when the LTM and WM colors were organized on the search display such that they could have competed for attention, no evidence of attentional capture by the unprompted color was found, regardless of which memory type was prompted. The changes to block length and inclusion of a re-study section and feedback screen in Experiment 2 did, however, successfully ameliorate the high participant rejection and trial exclusion rates in Experiment 1.

Thus, the finding of cooperation, but not competition, between WM and LTM representations replicated in Experiment 2. This suggests that the particularities of the stimulus display or colorspace used in Experiment 1 were not responsible for these effects.

Participants were tasked with reporting the unprompted color at the end of each trial; this was done to encourage maintenance of the unprompted memory item over the course of the trial, including during the search task. It is possible, however, that individuals were able to temporarily suppress or reduce accessibility of the unprompted color during the search task and “reactivate” it during the unprompted report. If participants occasionally used the latter strategy, that may have reduced our likelihood of seeing competition. We therefore considered the possibility that unprompted memories may be more likely to capture attention if they are relatively actively represented and accessible. If this were the case, competition (Inconsistent (vs. Only) effect) may only have emerged in Experiments 1 and 2 under circumstances in which the unprompted memory was most readily accessible, i.e., when the unprompted report was executed relatively quickly. To test this, we first combined data across these Experiments to maximize power. Unprompted report RT was first z-scored within each relevant sub-condition (participant, prompt (WM or LTM), and search condition (Only, Inconsistent)) before being categorized by tertile (i.e., “fast”, “medium”, or “slow” unprompted report RT). The Inconsistent versus Only search RT difference was then computed separately for each participant and tertile. Finally, we tested to see if competition (the Inconsistent (vs. Only) effect) significantly deviated from zero when participants’ unprompted report RT was in the fastest tertile with a one-sample t-test; however, this effect was not statistically significant (t(229) = 1.80, p = 0.41). This indicates that the unprompted item did not capture attention in Experiments 1 and 2, even when participants’ unprompted report responses were quickest.

Why did we fail to find competition between WM and LTM during attentional guidance? One possibility – inspired by research on redundancy gains and coactivation models (Miller, 1982; Mordkoff & Yantis, 1993) – is the presence of an object-wise winner-takes-all cognitive operation (Koch & Ullman, 1985). That is, perceptual evidence may accumulate from the location of both the prompted and the unprompted colors in the search display, but this evidence may only be summed (or otherwise combined) when it accumulates from a single object (here, a circle; Danek & Mordkoff, 2011; van Ede & Nobre, 2022; see also Treisman & Gelade, 1980). This would yield faster RTs on Consistent vs. Only trials because evidence is combined across the prompted and unprompted colors in the same circle, resulting in a decision criterion being reached sooner. But this process would yield no difference in RTs between Inconsistent and Only trials, because the evidence accumulated for the unprompted color, hypothesized to be weaker than that for the prompted color, arrives from a different object and hence may lose the winner-takes-all competition.

To explore whether this potential winner-takes-all mechanism is a general phenomenon, we conducted a third experiment in which the WM and LTM representations came from different stimulus dimensions: A color and a shape rather than two colors. In addition to allowing us to test the generalizability of this proposed mechanism, using color and shape enabled us to explore more unified objects, in which the two dimensions resided in the same physical space (i.e., “square” and “red” would become a red square, as opposed to two colors in two distinct halves of a circle).

To this end, we leveraged recent work by Li et al. (2020), who created a validated circular shape (VCS) space (see Experiment 3/Methods/Stimuli/Colorwheel and shapewheel). Using the VCS space allowed us to precisely manipulate the perceived visual similarity of shape, analogously to how we used CIE L*a*b* colors and their associated colorwheel. This functionally enables the use of a two-dimensional perceptually uniform space in Experiment 3.

Experiment 3

Methods

Participants

Data were collected until the final sample comprised 192 participants who met the inclusion criteria. To meet that target sample size, 253 participants were recruited for an online study using Prolific (www.prolific.co). As before, these participants were pre-screened for English fluency, nationality, and age; provided informed consent to a protocol approved by the Columbia University Institutional Review Board; and received $6.50/h as compensation. Using the same rejection criteria as in Experiments 1 and 2: 26 of these participants were unable to pass one of the Testing Phases and 35 participants of the remaining sample did not correctly respond to ≥50% of the search trials’ search displays and unprompted reports, so none of their data are included. Following these rejections, the final sample comprised 192 participants, as noted above (Mage = 26.3 ± 5.7 years, Meducation = 15.2 ± 2.1 years).

This sample size was selected to match Experiments 1 and 2 with regards to the number of observations per condition, across participants (power being a joint function of sample size and number of trials; Baker et al., 2021). There were double the number of conditions in Experiment 3 relative to Experiments 1 and 2, to accommodate the addition of a new stimulus dimension. Because an additional 18 trials were added per participant in Experiment 3 (for counterbalancing purposes), we did not quite need to double the sample size: (115 participants * 90 trials)/six conditions (in Experiments 1 and 2) = (192 participants * 108 trials)/12 conditions in Experiment 3.

One hundred and thirty-seven of the participants in the final sample identified as women, six as non-binary or otherwise gender non-conforming, one did not report a gender, and the remaining identified as men. Of this final sample, 82.0% identified as White, 11.6% as Asian, 7.4% as Black or African American, 2.1% as American Indian/Alaskan Native, 0.5% as Native Hawaiian or Pacific Islander, and 3.2% identified as part of a different racial group; in addition, 12.3% of these participants identified their ethnicity as Hispanic or Latino.

Stimuli

Colorwheel and shapewheel

The same CIE L*a*b* colorwheel from the prior experiment was used in Experiment 3 (see Experiment 2/Methods/Stimuli/Colorwheel).

To enable the use of a shape dimension, we used the Validated Circular Shape (VCS) space (Li et al., 2020), which is comparable to the previously-used CIE L*a*b* colorspace: Angular distance along a circle functions as a proxy for perceived visual similarity, but for shape instead of color. The VCS shapewheel was custom-built using JavaScript to function analogously to the CIE L*a*b* colorwheel. First, the 360 VCS shape images were batch-processed using a custom preset with the Image Trace tool in Adobe Illustrator to convert them to scalable vector graphics (SVG) files, an extensible markup language- (XML-) based vector format optimized for display in the browser – thus, the shapes can be rendered at any size without loss of quality, like the colors and circles in Experiments 1 and 2. Each shape’s underlying XML was then inserted into HTML <svg> containers.

The shapewheel itself was a black circle with no indicators as to where individual shapes were located. Like the colorwheel, the shapewheel contained 360 segmentations and was rotated randomly on each trial (0–359°) around its central axis so that location on the shapewheel could not function as a proxy for shape. Similar to the colorwheel, participants could hover their cursor over the shapewheel to view the shape stimulus, which was presented in black in the center of the circle. That is, participants moved their cursor around the shapewheel to find where the shape they wished to report was located on any given trial.

Upon clicking a point on the shapewheel, the corresponding shape was selected and five visual feedback elements were placed on or in the shapewheel: (1) The shape displayed centrally was replaced with the correct shape for that trial; (2) the correct shape was also displayed in gray inside of the black outline of the shapewheel circle, at the position on the shapewheel that the correct shape was located; (3) a solid line, like in the colorwheel, was drawn in black from the center of the circle to the correct position; (4) a dotted line, as for the colorwheel, was drawn from the center of the circle to the corresponding selected shape on the shapewheel; and (5) that selected shape was displayed in gray inside of the black outline of the shapewheel circle at the location clicked, so that participants could see the exact shape they had selected. This last piece of visual feedback (5) was only displayed, however, when a participants’ guess was ≥10° away from the correct shape, to avoid overlap of the gray shape elements, which would have rendered them uninformative (see Fig. 6 for an example of these visual feedback elements).

Fig. 6
figure 6

Experiment 3 procedure is identical to Experiment 2 (see Fig. 4) except for the following. a Scene-stimulus pairs were encoded as before, but each scene could either be paired with a color (left) or a shape (right). There were now 18 scene-stimulus pairs per block (across six blocks), and participants were given ≤10 s on the report screen. b All 18 scene-stimulus pairs were tested per block, and participants were given ≤10 s on the report screen. c Search cues comprised one long-term memory (LTM) stimulus and one working memory (WM) stimulus, as before, but each scene that had previously been paired with a color was now paired with a shape WM cue (left); and each scene that had previously been paired with a shape was now paired with a color WM cue (right). Five shapes were presented on the search display, each filled with a solid color and outlined with black. Similar to previous experiments, there were three conditions for this search display: Consistent trials contained both the prompted and unprompted stimuli in a single, unitized object (i.e., the prompted color in the unprompted shape, or vice versa); Only trials contained the prompted stimulus only (and not the unprompted stimulus); and Inconsistent trials contained both the prompted and unprompted stimuli, but these two stimulus features were not unitized, instead, they were separated and in different locations on the display. After the search display, participants were asked to report the unprompted shape on the shapewheel (left) or the unprompted color on the colorwheel (right). As before, the example search and unprompted report displays in this figure illustrate a trial in which the LTM ("remembered") color is prompted. For illustration purposes, the example search displays are again labeled to denote the location of the prompted (“P”) and unprompted (“U”) stimuli. This example trial is also a trial in which a color (retrieved from LTM) was the prompted stimulus; the use of “OR” in this figure represents what each screen would look like if this example trial had a prompted shape (retrieved from LTM), for each screen that would have differed, except search displays. Note that the example scene image is shown “paired” with a color and shown “paired” with a shape strictly for illustration purposes; no scene was actually paired with more than one stimulus feature

Lastly, like with the colorwheels, textual feedback was provided regarding precision on the shapewheel. This textual feedback was identical to that from the colorwheels in Experiments 1 and 2 (see Experiment 1/Methods/Procedure).

All of these feedback elements were seen on the Training and Testing Phase report shapewheels. On the shapewheel used in the Search Phase, however – on the unprompted report screen – only visual feedback element 4 was used. Thus, on the Search Phase shapewheel no indication was given as to the correct shape when a shape was selected; feedback about shapewheel precision (accurate or inaccurate) was provided only on the search feedback screen directly after the unprompted report, to match Experiment 2.

Color and shape generation

As in Experiment 2, two sets of colors were generated on the same CIE L*a*b* colorwheel, one to be assigned for LTM and one to be assigned for WM. This time, however, only six colors were generated total, at equidistant intervals (60°) from one another (see Fig. 1c, top). As before: Colors alternated being assigned to one set or the other (i.e., no set contained colors within 120° of each other); a variable buffer (between 0° and 60°) was added at the “starting point” of the hue generation process to allow the generated sets to differ from one another across blocks; and assignment of color sets to memory condition (WM or LTM) was randomized and balanced.

In addition, two sets of shapes were generated on the VCS shapewheel, one to be assigned for LTM and one to be assigned for WM, in the same way that colors were generated for this Experiment (see above; Fig. 1c, bottom). That is: Six shapes equidistant from one another (60° apart) alternated in assignment to one set or the other (i.e., no set contained shapes within 120° of each other); the same (0–60°) buffer was added to the “starting point” of the shape generation process; and assignment of shape sets to memory condition (WM or LTM) was randomized and balanced.

Color and shape assignment

Two stimuli were still cued on each search trial, one color and one shape: One LTM color or shape (that would be retrieved from LTM) and one WM shape or color (that would be presented on the screen). Beyond having one LTM item and one WM item, there were no other constraints placed on which colors/shapes could be combined on a given search trial, permitting the use of 18 possible color/shape pairs.

To match Experiment 2 as closely as possible, blocks continued to be split in half such that new shape and color stimuli were generated (see Methods/Stimuli/Color and shape generation) every other block (i.e., each 36-trial set was split in half between two blocks). Shape/color pairs were then assigned to satisfy four balancing conditions: (1) Each half-block had an equal number of LTM- and WM-prompted trials, (2) there was an equal number of color- and shape-prompted trials per half-block, and (3) there was an equal number of Consistent, Inconsistent, and Only trials in each half-block. Once these parameters had been established for color and shape assignment, it was algorithmically impossible to ensure that each shape and color was displayed exactly an equal number of times (three) per half-block. Consequently, (4) each shape and each color was required to be present in each half-block at least twice (i.e., a given color or shape was sometimes displayed only two times in a given half-block and sometimes displayed four times). Therefore, across every whole block (of 36 trials), each color/shape pair was cued twice, and each color and each shape was cued six times.

Scenes

The same scenes were used in Experiment 3 as in Experiments 1 and 2. An additional 18 scenes were selected from the same scene stimulus set – in the same way as before (i.e., avoiding substantial visual overlap and text) – to account for the increase in number of trials, and randomized as before (see Experiment 1/Methods/Stimuli/Scenes).

Procedure

As before, participants were first directed via URL to Gorilla (www.gorilla.sc) and given instructions describing the structure of the experiment. These instructions differed minimally from those provided in Experiments 1 and 2 because, structurally, the procedure of Experiment 3 was designed to be as close as possible to that used in Experiment 2 (see Fig. 6). The sole change, albeit a substantial one, was the inclusion of a second dimension for all the memory and search stimuli (i.e., shape, in addition to color). This change had an effect on each phase of the experiment, but the structure of these phases remained unchanged from Experiment 2.

In each Training Phase, there were 18 scene-stimulus pairs, half of which were scene-color pairs and half of which were scene-shape pairs. The time allotted for responding on the colorwheel or shapewheel on the response screen was increased by two seconds relative to Experiment 2 (to ≤10 s; see Fig. 6a). This was done to allow individuals the additional time needed to select a shape on the shapewheel, because there was no visual cue on the shapewheel as to the location of a given shape until participants used their cursor to explore where the shapes were (see Fig. 6a). No other changes were made to the Training Phase. The Testing Phase was also similar to Experiment 2. Participants were tested on the 18 scene-stimulus pairs from the prior Training Phase, and given the same amount of time (≤10 s) on the report screen as in the Training Phase (see Fig. 6b).

The Search Phase in Experiment 3 was modified to accommodate the added stimulus dimension as well (see Fig. 6c). As described above (Methods/Stimuli/Color and shape assignment), each of the 18 scenes that had been associated with a shape in the Testing/Training Phase was now paired with a WM color cue, and vice versa, in the search cues screen. As in Experiments 1 and 2, each scene was shown once. Upon being prompted with the same “R” (“remembered”; LTM) or “N” (“new”; WM) stimuli as in Experiments 1 and 2, participants were presented with a search display that was modified to integrate the additional shape dimension: The search display now comprised five shapes, each outlined with black and filled with a solid color. On Consistent search displays the prompted color or shape was unitized with the unprompted shape or color, such that the prompted and unprompted stimulus features were bound to a single object. On Only search displays the prompted stimulus feature (either color or shape) was present on the display, but the unprompted stimulus feature was absent entirely. On Inconsistent search displays the prompted and unprompted stimulus features were both present, but in separate locations on the screen; they were not unitized into one object. Instead, both the prompted and unprompted stimulus features were unitized with a distractor feature. These distractor features were generated at random, but at a minimum distance of 60° from both the LTM and WM stimulus features cued on that trial, and at a minimum distance of 45° from other distractors. After clicking the prompted stimulus feature, participants were instructed to report the unprompted stimulus feature on either the colorwheel or the shapewheel before being provided feedback about their accuracy, as in Experiment 2.

Data

Trial exclusion criteria were identical to Experiments 1 and 2. Only those trials in which participants responded correctly to both the search display and to the unprompted color report were included. Across participants, an average of 76% of Search Phase trials were both-correct.

Analyses were similar to those conducted in Experiments 1 and 2. We used the same analysis tools, and, as before, modeled the primary dependent variable, log-transformed RT on the search task, with multilevel linear regression on participants’ trial-wise data. We fit an analogous mixed-effects model, but included one more interaction term – stimulus dimension (shape or color, effect-coded as 1 and -1 respectively) – as a fixed factor; we also included this as a random effect term to allow the estimates to vary between participants. That is: lmer( log(RT) ~ prompt * searchCondition * stimulusDimension + (1 + prompt + stimulusDimension | Participant) ).

Results

Training and Testing Phases

Participants successfully learned the association between each scene and its stimulus, and precision when reporting shapes on the shapewheel was fairly comparable to reporting colors on the CIE L*a*b* colorwheel. In the Training Phase, mean (± SD) precision on the color/shapewheel (in degrees from the correct color/shape) generally improved over cycles: 8.16° ± 11.31° for colors and 9.62° ± 15.30° for shapes in cycle one; 6.93° ± 10.06° for colors and 7.24 ± 11.25° for shapes in cycle two; 6.40° ± 8.29° for colors and 6.78 ± 10.76° for shapes in cycle three.

The mean (± SD) number of cycles per scene needed to pass the Testing Phase was 1.45 ± 0.81 (for color-associated scenes, 1.49 ±0.85; for shape-associated scenes, 1.42 ± 0.78), and 68.56% of scene-stimulus pairs only needed to be tested once (for color-associated scenes, 67.21%; for shape-associated scenes, 69.91%). Mean (± SD) precision on the color/shapewheel was 3.95° ± 2.62° from the correct stimulus on the final test of each scene (i.e., when reported correctly from LTM): 4.06° ± 2.66° for colors and 3.83° ± 2.58° for shapes.

Search Phase

Search accuracy

When searching for a stimulus feature in LTM, participants successfully clicked the prompted feature on the search display in 85.19% ± 8.58% of trials, and when searching for a feature in WM, the rate was 94.72% ± 4.47% of trials. When searching for a color, participants successfully clicked the prompted feature on the search display in 91.48% ± 5.82% of trials, and when searching for a shape, the rate was 88.43% ± 6.89% of trials. No response was registered on the search display within the time allotted on 3.79% ± 2.81% of LTM-prompted trials; on WM-prompted trials this rate was 1.62% ± 1.63%. On trials in which the prompted stimulus was a color, this rate was 2.32% ± 1.92%; and on trials with prompted shapes, this rate was 3.09% ± 2.39%.

Unprompted (color or shape) report

Participants’ unprompted reports after the search display were considered accurate (≤20° from the correct color or shape) on 76.64% ± 11.71% of trials when reporting from WM (i.e., on LTM-prompted trials). On trials in which participants were reporting from LTM (i.e., on WM-prompted trials), they were correct 86.97% ± 8.72% of the time. When reporting a shape (i.e., on trials with a prompted color), 80.84% ± 10.98% of trials were correct. When reporting a color (i.e., on trials with a prompted shape), this rate was 82.76% ± 9.71%. When reporting the stimulus from WM, 0.18% ± 0.54% of trials did not receive a response in the time allotted; this never occurred when reporting from LTM. When reporting a color, this rate was 0.24% ± 0.59%; and when reporting a shape, it was 0.15% ± 0.46%.

Search response times

RTs and accuracy in the Search Phase are summarized in Table 1. As before, we only analyzed trials in which participants both (1) clicked on the correct stimulus feature in the search display and (2) reported the unprompted stimulus feature accurately. The primary analysis of RTs in the search task — as a function of prompt (WM vs. LTM), search condition (Consistent, Inconsistent, Only), and stimulus dimension (Shape, Color) – revealed, as before, a main effect of prompt on log RTs such that participants were significantly slower to respond on the search display for LTM-prompted trials than for WM-prompted trials (β = 0.048, SE = 0.005, p < 0.001). See Fig. 7a for a plot of all fixed-effect estimates, and Fig. 7b for a plot of raw RT by prompt. In addition, a main effect of stimulus dimension was found, indicating that participants were faster to respond on trials in which the prompted stimulus was a color rather than a shape (β = -0.079, SE = 0.005, p < 0.001).

Fig. 7
figure 7

Results from Experiment 3. a Coefficient estimates of the model for trial-wise log response time (RT). Error ribbons represent the 95% confidence interval; ***p < .001, *p < .05. b Long-term memory (LTM)-prompted trials (vs. working memory (WM)-prompted trials) were associated with a slower response on the search display. Error bars represent SEM of the within-participant (LTM-WM) difference. c Mean RT difference by prompt for Consistent (vs. Only) and Inconsistent (vs. Only) trials. Search RTs were facilitated on Consistent (vs. Only) trials, and delayed on Inconsistent (vs. Only) trials. Error bars represent SEM of the within-participant difference between Consistent and Only and Inconsistent and Only trials. d Mean RT difference by dimension of the prompted stimulus for Consistent (vs. Only) and Inconsistent (vs. Only) trials. The model output – represented in (A) – confirms the interaction visualized here between stimulus dimension and search condition: The effects of a trial being Consistent (vs. Only) or Inconsistent (vs. Only) were larger for trials in which participants searched for a shape (and thus the unprompted stimulus was a color). Error bars represent SEM of the within-participant difference between Consistent and Only and Inconsistent and Only trials

As before, RTs were significantly faster on Consistent trials relative to Only trials (β = -0.036, SE = 0.006, p < 0.001; model-estimated RT benefit on Consistent vs. Only trials = 42.98 ms). This benefit on Consistent trials, however, interacted with stimulus dimension: When the prompted stimulus feature was a shape (vs. a color), there was a larger effect of a trial being Consistent (relative to Only; β = 0.013, SE = 0.006, p = 0.023). That is, the effects of having a Consistent (vs. Only) search display were greater when color was the accessory dimension (and shape the target) rather than when shape was the accessory dimension (and color the target).

Unlike in Experiments 1 and 2, however, there was a significant effect of Inconsistent (vs. Only) search display (β = 0.013, SE = 0.006, p = 0.031; model-estimated RT cost = 15.80 ms). This Inconsistent predictor interacted with stimulus dimension such that the effects of having an Inconsistent (vs. Only) search display were significantly larger when color (rather than shape) was the accessory dimension (β = -0.013, SE = 0.006, p = 0.028). Indeed, competition (on Inconsistent vs. Only trials) was reliable only when color was the accessory item (β = -0.026, SE = 0.008, p = 0.001) and not when shape was the accessory item (β = -0.001, SE = 0.009, p = 0.952). Thus, both cooperation (on Consistent vs. Only trials) and competition (on Inconsistent vs. Only trials) were greater when color was the accessory dimension, rather than shape.

There was one final interaction between stimulus dimension and prompt. The RT benefit conferred by searching for a WM (vs. LTM) stimulus feature was larger when that stimulus feature was in the shape (vs. color) dimension (β = -0.010, SE = 0.006, p = 0.004). As in the first two experiments, no interactions were revealed between prompt and search condition, nor were there any three-way interactions. To illustrate these effects, the RT data are plotted in milliseconds by prompt and condition in Fig. 7c, and by stimulus dimension and condition in Fig. 7d.

Discussion

Our conclusion from Experiment 3 is threefold. First, WM and LTM representations continued to cooperate in visual search, producing behavioral facilitation when the activated features were in the same object. This occurred both when the unprompted feature was a shape and when it was a color. Second, in contrast to Experiments 1 and 2, we found that, under certain task circumstances, WM and LTM can compete, with the unprompted memory capturing attention even when it hinders performance; this effect only occurred, however, when the unprompted feature was a color (and the prompted feature a shape), and not vice versa.

That we observed competitive effects only when a color was the unprompted feature is curious, as no competitive effect was yielded in Experiments 1 and 2 despite the fact that, in those experiments, color was the sole stimulus dimension, and therefore was always the dimension of the unprompted feature. We will explore possible explanations for this in the General discussion.

The third conclusion from Experiment 3 is that the source of a memory during memory-guided attention (whether WM or LTM) did not interact with these cooperative or competitive effects. This finding contradicts a potential asymmetry in memory-guided attention, mentioned in the Introduction, in which accessory (unprompted) WMs may compete with LTM-guided search more than the other way around.

Why did we not see differential cooperation or competition between WM and LTM? One possibility is that, during this task, unprompted LTM stimuli were retrieved and subsequently represented in the direct access region of WM – just as an unprompted WM representation may be – rather than being represented in a distinct store, like activated LTM. That is, perhaps we see no difference between LTM and WM because both representations are in the same cognitive store and thus operate on attention via the same cognitive mechanism. If it is indeed the case that the prompted memory is always in the focus of attention and the unprompted memory is always in the direct access region of WM, then cooperation and competition between two WM representations should not be different from cooperation and competition between two LTM representations. A fourth experiment was conducted to test this idea. Experiment 4 was designed to be as similar to Experiment 3 as possible, but instead of pairing each LTM (shape or color) with a WM (color or shape), we instead paired LTM stimuli together and WM stimuli together (i.e., each LTM color was paired with a LTM shape, and each WM color was paired with a WM shape). If unprompted LTM features are indeed held in the direct access region, LTM trials and WM trials in Experiment 4 should appear identical.

Experiment 4

Methods

Participants

Data were collected until the final sample comprised 192 participants who met the inclusion criteria, to match the sample size of Experiment 3. To meet that target sample size, 274 participants were recruited for an online study using Prolific. As before, participants were pre-screened for English fluency, nationality, and age; provided informed consent to a protocol approved by the Columbia University Institutional Review Board; and received $6.50/h as compensation. Thirty-two of these participants were unable to pass one of the Testing Phases and 50 participants from the remaining sample did not get ≥50% of search trials both-correct, so none of their data are included.

Following these rejections, the final sample comprised 192 participants, as noted above (Mage = 27.7 ± 6.2 years, Meducation = 15.3 ± 2.4 years). 120 of these participants identified as women, 16 as non-binary or otherwise gender non-conforming, two did not report a gender, and the remaining identified as men. Of this final sample, 89.8% identified as White, 7.0% as Asian, 4.3% as Black or African American, 2.1% as American Indian/Alaskan Native, 0% as Native Hawaiian or Pacific Islander, and 2.1% identified as part of a different racial group; in addition, 14.4% of these participants identified their ethnicity as Hispanic or Latino.

Stimuli

All stimuli and stimulus creation procedures were as described in Experiment 3 (see Experiment 3/Methods/Stimuli; for color and shape generation, see Fig. 1d).

Procedure

As before, participants were directed via URL to Gorilla (www.gorilla.sc) and given instructions describing the structure of the experiment (see Fig. 8). The Training Phase (Fig. 8a) and Testing Phase (Fig. 8b) remained unchanged from Experiment 3: Participants learned and then completed tests on associations between scene images and stimuli (colors or shapes) until they could report each stimulus correctly.

Fig. 8
figure 8

Experiment 4 procedure. Training Phase (a) and Testing Phase (b) procedures were identical to those in Experiment 3 (see Fig. 6). The Search Phase (c) was identical to Experiment 3, except for the following changes. Search cues now comprised either two scene images, one of which had been previously associated with a color and the other a shape (long-term memory (LTM) trials); or one color and one shape presented directly on the screen (working memory (WM) trials). The prompt now indicated whether a given trial’s target stimulus feature was associated with the left or the right side of the search cues screen. In this example trial, the left prompt would indicate a search for green, regardless of whether it was a LTM or WM trial (but note that LTM colors and shapes were different from WM colors and shapes in each block of the actual experiment; the colors and shapes are the same here for visualization purposes). The search and unprompted report screens were identical to that of Experiment 3, as was the search feedback screen. The example search displays in this figure illustrate an LTM trial in which the left scene is prompted. For illustration purposes, the example search displays are labeled to denote the location of the prompted (“P”) and unprompted (“U”) features. This example trial is also a trial in which a color (retrieved from LTM) was the prompted stimulus; the use of “OR” in this figure represents what the unprompted report screen would look like if this example trial had a prompted shape (i.e., if the right prompt was presented)

The Search Phase in Experiment 4 was modified from the previous experiment to accommodate the pairing of LTM stimuli to other LTM stimuli, and the pairing of WM stimuli to other WM stimuli (see Fig. 8c). As in Experiments 13, participants first clicked a fixation cross to begin each trial before being shown two search cues. In Experiment 4, these search cues were either two LTM cue scenes, one of which had been associated with a color, the other with a shape (LTM trials); or two WM cues, one color and one shape, presented directly on the screen (WM trials). The “R” and “N” stimuli from Experiments 13 that previously indicated whether the prompted item on any given trial was the “remembered” stimulus (LTM-prompted trials) or the “new” stimulus (WM-prompted trials) could no longer be used, as each trial in Experiment 4 had either two “remembered” or two “new” stimuli. Thus, participants were now prompted to either search for the stimulus associated with the left or the right side of the search cues screen. This prompt took the form of a square black outline filled in with black on the left side or the right side, respectively. LTM trials were now those in which participants searched for the stimulus feature associated with the scene presented on the left or the right side of the screen; and WM trials were those in which participants searched for the stimulus feature presented directly on the left or the right side of the screen. Upon being prompted with the left or right prompt, participants were presented with a search display identical to the one used in Experiment 3. As before, after clicking the prompted stimulus feature, participants were instructed to report the unprompted stimulus feature on either the colorwheel or the shapewheel (i.e., if prompted to search for a color, participants reported the unprompted shape, and vice versa), before being provided feedback about their accuracy.

Data

Trial exclusion criteria were identical to Experiments 13: Only those trials in which participants responded correctly to both the search display and to the unprompted color report were included. Across participants, an average of 76% of Search Phase trials were both-correct.

We initially modeled the data with the addition, relative to Experiment 3, of one variable. Unlike the previous experiments, the search prompt was directional (i.e., prompted the participant to search for features associated with the left or right side of the search cues screen; Fig. 8c). We therefore added a “target-side agreement” variable (effect-coded: 1 = same and -1 = different) to control for differences in RT based on whether the prompted feature in the search display appeared on the same versus different side as the side indicated by the prompt. We had expected that participants may respond faster when there is agreement between the location of the search target (i.e., whether on the left or right side) and the side of the screen prompted. Participants indeed responded faster when the search target was on the same versus different side of the screen as that indicated by the prompt (β = -0.048, SE = 0.008, p < 0.001). However, the direction and significance of main effects and interactions involving memory type, stimulus dimension, and search conditions were unaffected by inclusion of the target-side agreement variable; additionally, target-side agreement did not have significant interactions with any other variable included in the model.

We also conducted an analysis in which we added, in addition to the target-side agreement variable, a lure-side agreement variable. This lure-side agreement variable coded for whether the accessory item appeared on the same side as that indicated by the prompt or not (effect-coded: 1 = accessory feature on the same side as that indicated by the prompt; -1 = accessory feature on the opposite side as that indicated by the prompt). This analysis was only conducted for Inconsistent trials because it would not be applicable to the others: For Consistent trials, target-side agreement and lure-side agreement were necessarily the same, and there was no lure on Only trials. The model indicated no significant main effect of lure-side agreement, nor did this variable interact with any other predictors, including target-side agreement.

Because neither target-side agreement nor lure-side agreement variables interacted with any variables of interest, and inclusion of these variables did not affect our results, we report results from a subsequent model that omits these variables; this provides consistency with the models in prior experiments. Analyses reported are therefore identical to those conducted in Experiment 3.

Results

Training and Testing Phases

Participants successfully learned the association between each scene and its stimulus. In the Training Phase, mean (± SD) precision on the color/shapewheel (in degrees from the correct color/shape) generally improved over cycles: 8.74° ± 13.57° for colors and 9.36° ± 15.79° for shapes in cycle one; 7.05° ± 9.75° for colors and 7.13 ± 11.71° for shapes in cycle two; 6.57° ± 8.88° for colors and 6.88 ± 12.42° for shapes in cycle three.

The mean (± SD) number of cycles per scene needed to pass the Testing Phase was 1.44 ± 0.78 (for color-associated scenes, 1.48 ±0.80; for shape-associated scenes, 1.41 ± 0.76), and 68.40% of scene-stimulus pairs only needed to be tested once (for color-associated scenes, 66.39%; for shape-associated scenes, 70.41%). Mean (± SD) precision on the color/shapewheel was 3.98° ± 2.63° from the correct stimulus on the final test of each scene (i.e., when reported correctly from LTM): 4.12° ± 2.66° for colors and 3.84° ± 2.60° for shapes.

Search Phase

Search accuracy

When searching for a stimulus feature in LTM, participants successfully clicked the prompted feature on the search display in 83.23% ± 11.21% of trials, and when searching for a feature in WM, the rate was 95.72% ± 5.33% of trials. When searching for a color, participants successfully clicked the prompted feature on the search display in 90.94% ± 7.50% of trials, and when searching for a shape, the rate was 88.00% ± 8.27% of trials. No response was registered on the search display within the time allotted on 3.85% ± 3.31% of LTM trials; on WM trials this rate was 1.02% ± 1.15%. On trials in which the prompted stimulus was a color, this rate was 1.99% ± 1.88%; and on trials with prompted shapes, this rate was 2.88% ± 2.43%.

Unprompted (color or shape) report

Participants’ unprompted reports after the search display were considered accurate (≤20° from the correct color or shape) on 83.08% ± 11.32% of trials when reporting from LTM. On trials in which participants were reporting from WM, they were correct 82.94% ± 9.36% of the time. When reporting a shape, 82.25% ± 10.47% of trials were correct. When reporting a color, this rate was 83.77% ± 9.80%. When reporting the stimulus from LTM, 0.19% ± 0.68% of trials did not receive a response in the time allotted; this never occurred when reporting from WM. When reporting a color, this rate was 0.21% ± 0.56%; and when reporting a shape, it was 0.12% ± 0.41%.

Search response times

Response times and accuracy in the Search Phase are summarized in Table 1. As before, we only analyzed trials in which participants both (1) clicked on the correct stimulus feature in the search display and (2) reported the unprompted stimulus feature accurately. The mixed model analyzing log RT — as a function of memory type (WM vs. LTM), search condition (Consistent, Inconsistent, Only), and stimulus dimension (Shape, Color) — revealed, as in all previous Experiments, a main effect of memory type on log RTs such that participants were significantly slower to respond on the search display for LTM trials than for WM trials (β = 0.073, SE = 0.005, p < 0.001). See Fig. 9a for a plot of all fixed-effect estimates, and Fig. 9b for a plot of raw RT by memory type. In addition, a main effect of stimulus dimension was found, as in Experiment 3, indicating that participants were faster to respond on trials in which the prompted stimulus was a color rather than a shape (β = 0.089, SE = 0.005, p < 0.001).

Fig. 9
figure 9

Results from Experiment 4, in which participants were cued with either two long-term memory (LTM) representations or two working memory (WM) representations. a Coefficient estimates of the model for trial-wise log response time (RT). Error ribbons represent the 95% confidence interval; ***p < .001, **p < .01, *p < .05. b LTM trials (vs. WM trials) were associated with a slower response on the search display. Error bars represent SEM of the within-participant (LTM-WM) difference. c Mean RT difference by memory type for Consistent (vs. Only) and Inconsistent (vs. Only) trials. Search RTs were facilitated on Consistent (vs. Only) search displays, and this facilitation did not interact with memory type. RTs were delayed on Inconsistent trials (vs. Only), but less so when cued with two LTM cues than two WM cues. Error bars represent SEM of the within-participant difference between Consistent and Only and Inconsistent and Only trials. d Mean RT difference by dimension of the prompted stimulus for Consistent (vs. Only) and Inconsistent (vs. Only) trials. The model output – represented in (A) – confirms the interaction visualized here between stimulus dimension and search condition: The effects of a trial being Consistent (vs. Only) or Inconsistent (vs. Only) were larger for trials in which participants searched for a shape (and thus the unprompted stimulus was a color). Error bars represent SEM of the within-participant difference between Consistent and Only and Inconsistent and Only trials

As before, RTs were significantly faster on Consistent trials relative to Only trials (β = -0.016, SE = 0.006, p = 0.004; model-estimated RT benefit on Consistent vs. Only trials = 18.26 ms). There was also a significant effect of Inconsistent (vs. Only) search display (β = 0.013, SE = 0.006, p = 0.026; model-estimated RT cost = 14.69 ms), indicating that participants were slower to respond on Inconsistent versus Only trials.

As before, when the prompted stimulus feature was in the shape dimension (rather than a color), there was a larger effect of a trial being Consistent (relative to Only; β = 0.014, SE = 0.006, p = 0.015); and a larger effect of a trial being Inconsistent (relative to Only; β = -0.029, SE = 0.006, p < 0.001). That is, as in Experiment 3, when color was the accessory dimension (i.e., the trial had a prompted shape and an unprompted color, rather than vice versa), the effects of having a Consistent or Inconsistent search display were greater.

Two differences emerged – relative to Experiment 3 – from the log RT model. Firstly, memory type no longer interacted with stimulus dimension, meaning that the RT benefit conferred by searching for a WM (vs. LTM) stimulus feature was not different when that feature was a color vs. a shape (β = < 0.001, SE = 0.004 p = 0.991).

Critically, and unlike in Experiment 3, memory type and search condition interacted when the search condition was Inconsistent (relative to Only; β = 0.012, SE = 0.006, p = 0.031) such that the slowing of RT on Inconsistent trials was greater on WM (vs. LTM) trials. There was no interaction between prompt and Consistent trials (relative to Only; β = 0.006, SE = 0.006, p = 0.289). To illustrate these effects, the RT data are plotted in milliseconds by prompt and condition in Fig. 9c, and by stimulus dimension and condition in Fig. 9d.

Comparison of search performance across Experiments 3 and 4

A follow-up linear mixed model on the log RT data was performed to compare Experiments 3 and 4. This comparison probes whether cooperation and/or competition during memory-guided search differ when the prompted and unprompted features are from different (Experiment 3) or the same (Experiment 4) memory source. If LTM and WM representations on a given trial are functionally the same, then no interaction should emerge that involves search condition and experiment version.

This trial-wise model was specified in the same way as the previous RT model, with the addition of an interacting fixed factor indicating experiment version (effect-coded: Experiment 3 = 1; Experiment 4 = -1). The consequent full experiment comparison model was: lmer( log(RT) ~ memoryType * searchCondition * stimulusDimension * experiment + (1 + memoryType * stimulusDimension | Participant) ). This random effects structure was specified on the same basis as before (see Experiment 1/Methods/Data/Analysis; by trimming the maximal structure variables by least explained variance using PCA until convergence).

Effects that were significant in each experiment individually were also significant in this combined model. We therefore observed the expected effects of memory type (LTM vs. WM), Consistent (vs. Only) search display, Inconsistent (vs. Only) search display, and stimulus dimension, as well as the finding that both search display condition effects (Consistent vs. Only and Inconsistent vs. Only) were larger when the unprompted feature was in the color dimension (see Fig. 10a).

Fig. 10
figure 10

Comparison of results from Experiments 3 and 4. In Experiment 3 the unprompted (accessory) feature was of the opposite memory type (i.e., if searching for a feature from long-term memory (LTM), the accessory feature was from working memory (WM)); in Experiment 4 the unprompted feature was of the same memory type (i.e., if searching for a feature from LTM, the accessory feature was also from LTM). a Coefficient estimates of the model for trial-wise log response time (RT) in the experiment comparison linear mixed model. Error ribbons represent the 95% confidence interval; ***p < .001, **p < .01, *p < .05. b Mean RT difference by prompted memory type for Consistent (vs. Only) and Inconsistent (vs. Only) trials across Experiments 3 and 4. The RT benefit on Consistent (vs. Only) trials was larger in Experiment 3 than Experiment 4 (Consistent (vs. Only) by experiment interaction), and the RT cost on Inconsistent (vs. Only) trials interacted with prompted memory type in Experiment 4 but not Experiment 3 (memory type by Inconsistent (vs. Only) by experiment interaction). Error bars represent SEM of the within-participant difference between Consistent and Only and Inconsistent and Only trials for each stimulus dimension and experiment

A main effect of experiment version revealed that participants were faster to respond to the search display in Experiment 4 than in Experiment 3 (β = 0.027, SE = 0.007, p < 0.001). There was also a significant interaction between memory type and experiment such that the difference between WM trials and LTM trials was larger in Experiment 4 versus Experiment 3 (β = -0.012, SE = 0.003, p < 0.001). As anticipated, a significant three-way interaction between memory type, Inconsistent (vs. Only) search display, and experiment version (β = 0.009, SE = 0.004, p = 0.038) also emerged (Fig. 10b, right), reflecting the finding reported in the previous section (Experiment 4/Results/Search Phase/Search response times): Two LTM representations competed less than two WM representations in Experiment 4, while no asymmetry between LTM-prompted and WM-prompted trials was present in Experiment 3.

Lastly, this model revealed an interaction between experiment version and Consistent (vs. Only) search display such that the RT benefit on Consistent trials (relative to Only trials) was larger in Experiment 3 than in Experiment 4 (β = -0.010, SE = 0.004, p = 0.017). That is, when the prompted and unprompted memory representations were from different memory types (one LTM and one WM), participants showed more of a RT benefit on Consistent (vs. Only) trials, relative to when the two memories were of the same type (i.e., both WM or both LTM). This effect is visualized on the left side of Fig. 10b. The interactions observed between search performance and experiment are incompatible with the notion that all prompted memories are in the focus of attention, and all unprompted memories in the direct access region of WM, regardless of their source (WM or LTM). We discuss the implications of these results in the General discussion.

Discussion

In Experiment 3, we found no evidence that LTM cooperates or competes with WM more than the other way around. One potential explanation of these results is that LTM, when activated to guide attention, is placed into the same store as where WM information is placed to guide attention. That is, it is possible that no difference was observed as a function of memory type in Experiment 3 because the prompted memory may have been in the focus of attention regardless of its source (WM or LTM), and the unprompted memory may have been in the direct access region of WM, regardless of its source (Oberauer, 2002, 2009).

If it were indeed the case that cued LTMs become stored in the same way as cued WMs, then the same pattern of results (as Experiment 3) should hold if both memories on a given trial were accessed from WM, or both memories accessed from LTM. That is, regardless of the memory source, the prompted memory should be in the focus of attention and the unprompted memory in the direct access region of WM. To test this idea, we designed Experiment 4, which cued participants with either two LTM representations or two WM representations rather than one of each. Experiment 4 indicated that, similar to how LTM and WM representations cooperated and competed with one another to guide attention in Experiment 3, two representations of the same memory type also cooperated and competed. However, two key differences emerged.

The first key difference is that when two LTM representations could guide attention in the search task, less competition emerged than when attention could be guided by two WM representations. This led to a memory type (WM vs. LTM) by Inconsistent (vs. Only) by experiment (3 vs. 4) interaction, because there was no detectable difference in competition between WM-prompted (LTM accessory) and LTM-prompted (WM accessory) trials in Experiment 3.

The second key difference is that two memories of the same type (i.e., either both LTM or both WM) cooperated less than two memories of different types (i.e., one from LTM and the other from WM). That is, cooperation was augmented when the memories came from different sources.

These findings are incompatible with the idea that LTM and WM, when activated to guide attention, are represented in a shared store, or otherwise identical format. We discuss the implications of our results in the General discussion, below

General discussion

Summary

Behavior is guided by memories at multiple timescales (Nobre & Stokes, 2019). The present study tested how working memory (WM) and long-term memory (LTM) guide visual attention when active in the same task. We found that WM and LTM representations readily cooperate during attentional guidance, and that they are also capable of competing under certain stimulus conditions. In particular, competition only occurred when individuals searched for a shape while attempting to avoid distraction from an irrelevant color. Across three studies, we found no evidence that WM cooperated or competed with LTM more than the other way around. A modified version of the task, however, in which two memories from the same source were cued (both WM or both LTM) revealed key differences between WM- and LTM-guided search: Two WM representations competed more than two LTM representations, and memories from different sources (WM and LTM) cooperated more than memories from the same source. Taken together, these results suggest that WM and LTM interact to guide visual search in the same task; that they compete and cooperate with each other in similar ways; but they are not represented in identical states during attentional guidance.

Proposed mechanisms

Cooperation between WM and LTM was observed in all experiments, but competition arose only when participants searched for a shape target and held an accessory color memory in mind. Why did we observe cooperation but not competition in Experiments 1 and 2, for which both memories were colors? The pattern of results we observed may be explained by an object-wise winner-takes-all mechanism that is active during memory-guided attention. We propose the following: First, visual features that match any activated memory representation (whether in WM or LTM, whether prioritized or not) contribute to evidence accumulation. Second, features that match prioritized (vs. accessory) memory representations lead to faster or stronger evidence accumulation (as reported by Zhang et al., 2018). Third, some features (e.g., color, relative to shape) lead to faster or stronger evidence accumulation – perhaps because colors are typically more salient than shapes (Theeuwes, 1991). For example, color can elicit stronger attentional guidance effects than directional cues (Fan et al., 2021, Experiment 2) and irrelevant colors are more distracting than irrelevant shapes (Nickel et al., 2020; Theeuwes, 1991). Finally, evidence is summed over an object, and the object with the most accumulated evidence guides attention.

This model predicts that competition is stifled when the prioritized and accessory representations are both colors (as in Experiments 1 and 2), or both in the same stimulus dimension more generally. This is because, within the color dimension, features that match the prioritized (vs. accessory) memory representation led to faster evidence accumulation; without a second stimulus dimension, the object with the prioritized feature always “wins” the race to the decision threshold. This does not preclude cooperation between the prioritized and accessory memories: Because evidence is summed at the object level, features that match an accessory item can quicken evidence accumulation for that object.

When memories are in different stimulus dimensions, however, imbalance in the relative saliency of the visual features can lead to competition. Evidence from particularly salient features (e.g., color; Folk, 2015; Nickel et al., 2020) can rapidly accumulate. When the salient feature is a distractor, this evidence can accumulate faster than evidence for the less-salient but prioritized feature (e.g., a shape). This would produce more competition (relative to a non-salient distractor) when the distracting feature is in a different object than the prioritized memory, and more cooperation when it is in the same object. This model can therefore account for our observed effect of more competition, and more cooperation, when color is an accessory memory (during search for a shape) compared to the other way around.

The above model also needs to account for our finding that memories from different sources (WM and LTM) cooperate more than memories from the same source (two WMs or two LTMs). This may occur because of differential shielding between memories to prevent interference (similar to how WM can be shielded from perceptual distractors; see Hakim et al., 2020; Lorenc et al., 2021). Interference between items maintained in WM increases proportionately with the neural overlap that those item representations activate (Cohen et al., 2014; see also Yang et al., 2018; Oberauer & Lin, 2017). When two memory representations derive from the same (vs. different) source, they may engage more overlapping neural regions, and thus require more aggressive shielding to minimize the proportional increase in potential interference. Memories that are more efficiently shielded from one another may have less opportunity to cooperate to guide attention – thus accounting for why memories of the same source cooperate less than memories from different sources.

This notion of active-memory shielding may also help explain why we observed more competition between two WMs relative to two LTMs (Experiment 4). One possibility is that active LTMs may be more effectively shielded from one another than active WM representations. This would be consonant with the importance of pattern separation mechanisms for LTM – by which similar items come to be represented distinctly to reduce interference (Favila et al., 2016; Schurgin, 2018; Yassa & Stark, 2011; Zotow et al., 2020) – as well as research on efficient behavioral and neural suppression of potentially distracting LTMs (Anderson et al., 2004; Anderson & Green, 2001). Stronger shielding between activated LTM representations should lead to less competition between LTMs than between WMs, as evidenced in Experiment 4, and less cooperation between LTMs than between WMs (particularly when color is the accessory item, which, as noted above, is when cooperation and competition are most pronounced). While the latter effect – less cooperation for two LTMs versus two WMs – is not statistically significant in our data, the numerical trend is in that direction. Further work testing our proposed model will be important for determining if cooperation and competition within WM is reliably stronger than these dynamics within LTM.

There is, nevertheless, an alternative model that could explain cooperation between memories, in the form of performance enhancements on Consistent (vs. Only) trials. It is possible that, in Experiments 1 and 2, attention was only guided by the prompted color; however, once attention arrived at the appropriate object (half-circle), evidence accumulation was further enhanced (or a decision threshold dropped) once the accessory color was recognized to be in that same object (i.e., adjacent half-circle). Under this scenario, the accessory color does not guide attention but it does contribute to the visual search response either by enhancing further evidence accumulation once attention is already at the target object or by reducing the threshold for a response. With our procedures, we cannot tell the difference between these three possibilities in Experiments 1 and 2 (i.e., attentional guidance by the accessory color; enhanced evidence accumulation by the accessory color once attention is already guided to the target object; or a reduced decision threshold once the accessory color is detected to be adjacent to the target). Note, however, that whatever mechanism is at play does not affect the conclusion that working memory and long-term memory can cooperate to improve attentional behavior and visual search; it is merely the stage at which they cooperate that can be debated.

Importantly, we think that considering Experiments 1 and 2 together with Experiments 3 and 4 is more concordant with the hypothesis that multiple memories guide attention (as opposed to influencing evidence accumulation or decision thresholds once attention is already guided). In particular, it is not clear how competition between memories in guiding attention can arise if the effects of an accessory memory only occur after attention has already been guided to the target item. That is, distraction or slowing on Inconsistent (vs. Only) trials is difficult to explain in terms of an altered decision threshold: Once attention is guided to the target feature, Only and Inconsistent trials are the same in the sense of having a distractor feature unitized with the target feature (i.e., a distractor shape or distractor color unitized with the target color or shape). Any slowing on Inconsistent (vs. Only) trials should therefore be due to attentional capture by the accessory item elsewhere in the display, rather than a different decision or evidence accumulation process that occurs after attention has already been guided to the target shape or color.

We believe it is parsimonious to explain competition and cooperation effects with the same mechanism (guidance by both activated memories, as we describe in our Proposed mechanisms above) and therefore prefer this interpretation. Alternatively, one can propose different mechanisms for cooperation and competition: That competition is due to attentional guidance by multiple memories, but cooperation is due to changes in decision thresholds or evidence accumulation once attention is already guided. These competing mechanisms can be tested in future work.

Finally, other factors not mentioned above may affect the object-wise winner-takes-all mechanism we propose. For example, the learned value of an item (Anderson et al., 2011) or its meaning (Henderson & Hayes, 2018) may increase the rate of evidence accumulation. Future studies that test these factors can be useful in extending this proposal of memory-guided attention.

Competition between working memory (WM) and long-term memory (LTM) in guiding attention

A main goal of the current investigations was to test whether WM and LTM compete symmetrically during attentional guidance, or if WM competes with LTM more than the other way around. Across Experiments 13, we failed to observe asymmetry between WM and LTM when guiding attention: There was no evidence that WM cooperated or competed with LTM more than vice versa. This lack of asymmetry cannot be explained by WM and LTM being functionally identical because, across all experiments, participants were reliably slower to search for a feature from LTM than one from WM. Although WM and LTM were not functionally identical, one possibility is that they came to be stored in a similar way, and that LTM may simply be weaker or slower to be stored in that state or format.

We therefore considered the possibility that the lack of asymmetry might be due to LTM being “placed in” WM to guide attention. If this is the case, then the prioritized memory may be represented in the focus of attention (FoA), regardless of its source (WM or LTM), and the accessory memory may be represented in the region of direct access (RDA) of working memory, regardless of its source (WM or LTM) (Oberauer, 2002, 2009). If that were true, two LTMs should yield the same behavioral effects in attentional guidance as two WMs or one WM and one LTM.

Our results, however, do not accord with this strong interpretation of the concentric activation model. First, we found an interaction between memory type (WM vs. LTM) and Inconsistent (relative to Only) trials in Experiment 4: Two WM representations competed more than two LTM representations. This pattern of results was significantly different from that in Experiment 3, in which RT slowing on Inconsistent (vs. Only) trials was not different between LTM-prompted trials (WM accessory) and WM-prompted trials (LTM accessory). This result is contrary to the notion that the prioritized memory is always in the FoA and the accessory item always in the RDA regardless of their source.

Furthermore, we found an interaction between Experiment (3 vs. 4) and Consistent (vs. Only) trials, such that memories from different sources (WM and LTM) cooperate more than memories from the same source (both WM or both LTM). This is, again, incompatible with the idea that memories are represented similarly regardless of source (LTM or WM) and distinguished only based on their level of priority. Instead, this suggests that there are integral differences between these active memories’ representational states.

Nevertheless, there may be a complex interaction between memory source (LTM or WM) and priority state (FoA, RDA, or otherwise), which could provide a potential amendment to the concentric activation model (Oberauer, 2002, 2009) that incorporates memory source. For example, if two LTMs are cued, the accessory LTM may remain in activated LTM (rather than the RDA) while the prioritized LTM is in the FoA. For the remaining cases we investigated (two WMs, or one WM and one LTM), the prioritized memory may be in the FoA and the accessory item in the RDA. In other words, if two LTMs are activated, the second (accessory) LTM may be more weakly represented than an accessory WM or a single accessory LTM. Such a scenario may explain why two WM representations compete more than two LTM representations: Two WM items are represented more similarly and more strongly than two LTM items. Further studies exploring these possibilities may be useful in elucidating asymmetries in how active LTMs and WMs are stored similarly versus differently, and the implications of that for how they interact during memory-guided attention.

One particularly interesting observation in our studies is that accessory LTMs consistently affected WM-guided search, despite WM-guided search being significantly faster than LTM-guided search. That is, LTM influenced visual search during trials for which search was much faster, on average, than LTM could yield on its own. This is consistent with a two-stage model of long-term memory retrieval (Ciaramelli et al., 2009; Moscovitch, 2008) in which an initial, rapid retrieval process can guide behavior automatically, while a second, slower stage is necessary for conscious and deliberate access of those memories.

Relation to prior work

An ongoing debate in the memory-guided attention literature is whether only a single item, as opposed to multiple items, can guide attention in a given task. Our results align with the multi-item template hypothesis (Beck et al., 2012; Beck & Hollingworth, 2017): Across four experiments, we found that an accessory memory reliably sped visual search when its feature was contained in the same object as that of the prioritized memory. We observed that accessory memory features can also, under certain stimulus conditions, reliably slow visual search when they are in a different object than the prioritized memory features. These effects occurred despite accessory items conferring no advantage overall in the search task (most of the time, these features were either not present or could only hurt performance). Furthermore, we extend prior work in this domain – which typically focuses on multiple representations from WM – by showing that representations sourced from LTM, and memories from different sources, can guide attention in the same task as well.

This literature on multi-item memory-guided attention has tended to focus on competitive (distracting) effects during memory-guided attention (e.g., Bahle et al., 2018; Chen & Du, 2017; Fan et al., 2019; Frătescu et al., 2019; Hollingworth & Beck, 2016; van Moorselaar et al., 2014), though some also look at cooperative effects (e.g., Bahle et al., 2020). While a small selection of studies has looked at cooperation and competition in the same paradigm, (e.g., Fan et al., 2021, Experiment 1; Soto et al., 2005), no studies to our knowledge have reported the reliable imbalance between cooperative and competitive interactions exhibited in Experiments 1 and 2: We report that there is, under certain conditions, a bias towards cooperation (or, potentially, lack of competition altogether). Because there are such limited data regarding cooperative and competitive interactions in the same paradigm, however, we cannot determine precisely under which conditions this imbalance is exhibited. Further research will be needed to determine how and when biases towards cooperation (rather than competition) arise in memory-guided attention. Such research can determine, for example, if this imbalance is a feature of WM-LTM interactions during attentional guidance, or if it extends to guidance from two WMs or two LTMs as well.

Our work also adds to the literature on memory-guided attention by examining bidirectional cooperative and competitive interactions between WM and LTM. Some studies have looked at how WM may compete with LTM-guided search but did not explore the reverse relationship (Günseli et al., 2016). Other studies have examined how selection history (a type of LTM) and contents in WM can jointly bias search (Schwark et al., 2013), or how accessory LTM features can capture attention during WM tasks (Fan & Turk-Browne, 2016, Experiment 2), but these studies lacked baselines to determine if the effects were due to cooperation, competition, or both. They were therefore unable to determine whether there are asymmetries in how WM and LTM cooperate or compete with each other in their respective paradigms (nor whether imbalances exist between competition and cooperation). We addressed these limitations in our study, finding evidence for both cooperation and competition between WM and LTM during memory-guided search. Furthermore, by comparing WM-LTM interactions to WM-WM and LTM-LTM interactions, we were able to find some differences in cooperative and competitive dynamics as a function of memory source. Thus, our work adds to prior literature on memory-guided attention by exploring cooperative and competitive interactions within the same task, and systematically comparing these effects based on memory source.

Limitations and future directions

Prior work proposing a “flexible gate” that mediates information transfer from LTM to WM (Oberauer, 2002, 2009) has focused on when LTM may interfere (or not) with content in WM. In this research, there was no demand to use this information to guide attention: The goal was simply to report information in memory. Thus, it is possible that we did not observe the predicted asymmetry (such that WM is relatively shielded from interfering LTM) because this gate’s role is primarily for protecting WM from proactive interference, and functions differently or not at all in the case of memory-guided attention. Further research is needed to compare how this gate functions for memory reporting vs. memory-guided attention in the same task.

It may also be the case that asymmetry between LTM and WM in guiding attention – whether due to the flexible gate model described above or otherwise – could arise in the accuracy of visual search, rather than its speed. Accuracy was near ceiling for WM trials in the current studies, limiting our ability to probe such effects. This was by design, given a priori predictions for RTs, and training procedures that were meant to maximize accuracy. Future studies could bring performance off of ceiling, e.g., with limited presentation durations or more aggressive visual masks, to determine if asymmetries in cooperation and competition arise in the accuracy of memory-guided attention.

An additional limitation is that the lack of asymmetry in cooperation or competition between WM and LTM in Experiments 13 may be due to limitations of the employed methods, or otherwise too low power, to detect the most subtle differences between conditions. However, we think it is unlikely that our studies were poorly powered for several reasons. First, we were able to detect many two-way and three-way interactions, and were able to identify asymmetry in competition between memory types in Experiment 4 (in which two WM representations competed more than two LTM representations). Thus, Experiment 3 – which is identical to Experiment 4 except for which memory types were paired together on a given trial – is unlikely to be too poorly powered to detect asymmetry in competition.

Second, we conducted an additional analysis in which we combined Experiment 1 and Experiment 2 in one mixed-effects model that was identical to the analyses in the individual experiments, but with the addition of an “Experiment” variable. This allowed us to determine if the lack of competition in Experiments 1 and 2 was due to low power; however, we still did not detect statistically significant competition effects (Inconsistent vs. Only trials) in this larger, more well-powered analysis.

Finally, we also re-ran all log RT models with a higher threshold for participant inclusion: 60% of trials “both-correct” rather than 50%. Although this reduces the number of trials in the analyses (because fewer participants are included), it should also reduce noise by excluding participants who had the fewest trials overall. The direction and significance of all effects reported vis-a-vis the log RT models remained the same following implementation of this higher participant-inclusion threshold, with the exception of one effect: The prompt (WM or LTM) by stimulus dimension (color or shape) interaction observed in Experiment 3 was no longer significant. (This effect is not central to the paper, as it does not involve the critical “search condition” variable.)

While we have made substantial efforts to ensure that the present experiments have sufficient power to detect even relatively small differences in RTs – and were able to detect many subtle RT differences – studies with larger sample sizes, more sensitive methods, and/or more trials per participant may be able to detect smaller effects in cooperation or competition between memory types, should those effects exist.

In our studies, LTM-prompted trials were consistently slower than WM-prompted trials. We incorporated memory source as a covariate in all analyses; thus, these RT differences were controlled for when examining other effects. Nevertheless, future studies may consider providing additional training for participants in the LTM associations, until their RTs for LTM-prompted search are similar to those for WM-prompted search. This would allow exploration of how LTM and WM compete or cooperate when they are both brought to mind as quickly and easily. (Doing so, however, may further imbalance other measures of performance between LTM and WM trials – e.g., more LTM training may further increase the accuracy benefit for reported LTM vs. WM in the post-search report of the unprompted feature).

In the same vein, there are also differences in accuracy and RT by prompted stimulus dimension (color or shape). We incorporated stimulus dimension as a covariate in all analyses; thus, these RT differences between shape- and color-prompted trials were controlled for when examining other effects. These RT differences likely occurred because the novel shapes in the Validated Circular Shape (VCS) space (Li et al., 2020) were relatively more difficult to remember and detect than familiar colors. Nevertheless, it was important for us to use the VCS space to keep both dimensions (color and shape) continuous, relatively perceptually uniform, and roughly comparable in terms of task demands on the post-search unprompted feature report. Nevertheless, future studies may consider employing familiar shape stimuli (like squares, circles, triangles) to permit investigation of whether asymmetry in competition between color and shape are eliminated in these circumstances (note, however, that Nickel et al., 2020 observed greater attentional capture by colors vs. shapes even though they used familiar shapes).

Although we find evidence that WM and LTM both guide attention in the same task, an open question is whether they do so simultaneously or in alternation, albeit on a rapid timescale. For example, active memories may be represented in oscillatory dynamics of nested theta-gamma subcycles (Lisman & Idiart, 1995; Sauseng et al., 2009; Wolinski et al., 2018), such that different memories are represented at different timepoints of a given cycle. Indeed, recent evidence suggests that two concurrently held items in WM are prioritized in alternation, not simultaneously, via an ongoing theta oscillation (Pomper & Ansorge, 2021). Our study was not designed to tease apart truly simultaneous versus rapid sequential representation of memories on a given trial. Probing the temporal dynamics of when WM and LTM are available to guide attention, and whether they are simultaneously or successively available, will require future research employing EEG, MEG, or behavioral studies carefully crafted to detect oscillations in behavior (e.g., Dehaene, 1993; Kerrén et al., 2022; Pomper & Ansorge, 2021; ter Wal et al., 2021; VanRullen, 2016).

Conclusion

At any given moment, multiple memories may inform our behavior, whether acquired recently (WM) or some time ago (LTM). We have shown that WM and LTM can work jointly to guide attention in the same task. WM and LTM reliably cooperate to guide attention and compete in more limited circumstances. We found no evidence that WM helps (or hinders) LTM-guided search more than the other way around. By directly comparing cooperation and competition between two WMs and two LTMs, we further showed that they are not functionally identical during memory-guided attention. Taken together, these results suggest that WM and LTM, despite functional differences, both cooperate and compete to guide attention.