Introduction

Visual working memory (VWM) is a critical cognitive function that allows for the maintenance and manipulation of information that is no longer present. A fundamental question concerns the nature of the representations held in VWM – in particular, whether such representations are cognitive abstractions or sensory analogs. Classical findings of persistent neural activity in high-order, non-sensory areas during a memory delay (e.g., Goldman-Rakic, 1995) suggest that working memory representations might be fairly abstract. Recently, this view has been countered by the sensory recruitment hypothesis, which suggests that representations are sensory in nature and are maintained by the neural systems responsible for sensory processing (see review by D'Esposito & Postle, 2015). Compelling evidence for this hypothesis comes from studies demonstrating decodable neural patterns in early visual areas during a memory delay (e.g., Harrison & Tong, 2009; Serences, Ester, Vogel, & Awh, 2009). The sensory recruitment hypothesis, however, has been challenged by findings that non-sensory regions in the prefrontal and parietal cortices also contain decodable neural patterns for memorized items (Bettencourt & Xu, 2016; Christophel, Klink, Spitzer, Roelfsema, & Haynes, 2017; Xu, 2017). Thus, more evidence is needed to make a strong case for the sensory recruitment hypothesis.

To assess if memory representations are sensory in nature, we leveraged a perceptual phenomenon, surround suppression, that occurs when an item is attended. Perceptual studies have established that attending to a visual stimulus both enhances processing in the attended location and suppresses the processing of adjacent locations (Cutzu & Tsotsos, 2003; Mounts, 2000a, 2000b). This attentional surround suppression effect is theorized to reduce interference from nearby distracters when the task requires isolation of a target. Importantly for the current study, the neural mechanisms of attentional surround suppression have been attributed exclusively to processing in early visual areas (Boehler, Tsotsos, Schoenfeld, Heinze, & Hopf, 2011; Boehler, Tsotsos, Schoenfeld, Heinze, & Hopf, 2009; Hopf et al., 2006; Muller & Kleinschmidt, 2004). For example, both fMRI (Muller & Kleinschmidt, 2004) and MEG (Boehler et al., 2011; Boehler et al., 2009) studies implicate the primary visual cortex as the origin of surround suppression. The fine spatial resolution afforded by the retinotopic organization of early visual neurons makes it possible to resolve competition between nearby target and distractors (Kastner, De Weerd, Desimone, & Ungerleider, 1998; Luck, Chelazzi, Hillyard, & Desimone, 1997). We reasoned that if WM representations are sensory in nature, then attending to items within VWM, i.e., prioritization via internal attention (Griffin & Nobre, 2003), should also induce a surround suppression effect.

Here, we presented a stimulus array and prioritized a single item through external attention via a precue (Experiment 1) or internal attention via a retrocue (Experiment 2). Memory for the cued items was probed on the majority of trials (valid) and for the uncued items on the remaining trials (invalid). Importantly, we systematically varied the cue-probe offset to assess the spatial profile of the prioritization effect. If VWM representations are sensory in nature, we would expect that a surround suppression effect in memory performance will occur regardless of whether attention is directed externally or internally.

Experiment 1 – External attention

Given the extensive evidence of surround suppression for perceptual performance (Boehler et al., 2011; Boehler et al., 2009; Cutzu & Tsotsos, 2003; Hopf et al., 2006; Muller & Kleinschmidt, 2004; Mounts, 2000a, 2000b), we first sought to verify that a corresponding effect in memory performance occurs when attention is directed externally. A precue was used to prioritize a single item in a stimulus array and memory for both the cued and uncued item was probed.

Methods

Participants

All participants (N = 12) had normal or corrected-to-normal visual acuity and reported normal color vision, which was verified with the Dvorine Pseudo-Isochromatic Plates (Dvorine, 1963). The Institutional Review Board at Michigan State University approved the experimental protocols. All participants gave informed consent and were compensated at the rate of US$10 per hour. The sample size was selected based on the effect size from a pilot study we conducted that used a similar design. We used the comparison between the intermediate location and the furthest location to index the surround suppression effect, which yielded a Cohen’s d of 0.86. Using this effect size, a minimum of ten participants is required to detect such an effect with a power of 0.8 at alpha level of 0.05.

Apparatus

The study was conducted using Matlab (MathWorks, Natick, MA, USA) with the MGL toolbox (http://gru.stanford.edu/mgl). Visual stimuli were presented on a 21-in. CRT monitor (1,024 × 768 pixels, 90-HZ refresh rate) at a viewing distance of 90 cm. We linearized the luminance (gamma correction) and converted color coordinates in CIE L*a*b* color space to monitor RGB values (Westland & Ripamonti, 2004) through a standard calibration procedure using an I1-Pro spectrophotometer (Xrite, Grand Rapids, MI, USA).

Stimuli

The memory array consisted of six evenly distributed colored disks (radius = 0.45 degree of visual angle, dva) on an imaginary circle (radius = 2.8 dva) centered on the screen. The six locations had fixed polar angles: 0°, 60°, 120°, 180°, 240°, 300°, starting from the right horizontal meridian (Fig. 1) The disk array’s colors were randomly sampled from a color wheel in CIE L*a*b* color space (L* = 75, a* = 23, b* = 25), with a minimum difference of 40° between hues to reduce swap errors and chunking. The distance between two adjacent disks (center to center) was 2.8 dva, which was similar to previous studies that reported a surround suppression effect (Boehler et al., 2011; Boehler et al., 2009; Cutzu & Tsotsos, 2003; Hopf et al., 2006; Muller & Kleinschmidt, 2004; Mounts, 2000a, 2000b). The spatial precue was presented as a small gray disk (radius = 0.15 dva) to draw attention to a memory disk’s location. Participants’ memory performance was measured with a probe appearing at one of the six locations, the initial color of which was randomly selected from the color wheel.

Fig. 1
figure 1

Trial sequence for Experiment 1 (a) and Experiment 2 (b). The bottom inset illustrates the offset between the probe and the cued item, where ‘0’ indicates the cued location, and ‘1, 2, 3’ indicate increasing offset to the cued location in the clockwise direction and ‘-1, -2, -3’ indicate offsets in the counterclockwise direction. The cued location (0) varied across trials, randomly sampling all possible locations

Task and procedure

We measured the modulation of external attention with a delayed estimation task (Fig. 1a). A neutral cue (i.e., baseline) trial began with the onset of a fixation cross (500 ms) at the screen center. After a 120-ms delay (first delay), a memory array of six colored disks was presented for 300 ms, which was followed by a 600-ms inter-stimulus interval (ISI) to provide sufficient time for iconic memory to decay (Irwin & Thomas, 2008; Sperling, 1960). After another 490-ms delay (second delay), a probe was presented at one of the previous disk’s positions. Participants scrolled through possible colors using left or right arrow keys until it best matched with the previous disk’s color at the probe location. We used the scrolling procedure instead of clicking on a color wheel in order to reduce non-target errors (van den Berg, Shin, Chou, George, & Ma, 2012). Participants were given unlimited time before pressing the space bar to submit their answer.

To manipulate external attention, a spatial precue was presented during the first delay for 90 ms followed by a 30-ms ISI before the memory array. Location of the precue was randomly assigned to be one of the memory array’s locations and was predictive of future recall. The probe’s location matched the cued location on half of the trials (i.e., “valid condition”). On the other half of trials, the probe’s location was selected uniformly at random from the other five possible locations with different cue-probe offsets (see inset of Fig. 1). Participants were informed about the predictability of the cue at the beginning of the experiment and were requested to maintain central fixation during the cue and memory array period. Participants completed 12 cueing blocks of 54 trials (648 cued trials in total, 324 valid trials, 324 invalid trials, 65 trials per invalid condition), and 12 neutral blocks of 18 trials (216 neutral trials in total) in three separate 1-h sessions.

Eye-movement recording and analysis

To assess the fixation quality, we recorded participants’ eye position with an Eyelink 1000 system (SR Research, Ontario, Canada). Eye-tracking data were analyzed offline with custom software. Eye-position data during the memory array period were analyzed in order to ensure that fixation was maintained after one location was cued. We conducted two analyses to assess if there were changes in fixation patterns as a result of the experimental manipulations. In the first analysis, we calculated the average fixation deviation by computing the Euclidean distance from the fixation position to the screen center and compared the deviation between the attention (cued) condition and neutral (uncued) condition, via a t-test. In the second analysis, we evaluated whether there were systematic differences in fixation position (in polar coordinates) for cued trials as the cued location varied (six locations in all), via a one-way repeated-measures ANOVA.

No significant differences of eye position were observed in these analyses. Fixation deviation did not differ between the precue and the neutral condition (t(11) = 1.18, p = 0.26). There was also no difference in fixation position across precue locations (radial distance: F(5,55) = 0.88, p = 0.5, polar angle: F(5,55) = 1.86, p = 0.12). Thus, eye positions were equivalent across conditions and could not account for our observed memory effects.

Analysis

Memory error was calculated as the absolute angular difference between the presented color and participant’s response. Attentional condition was sorted according to the cue-probe offsets (valid condition: offset 0, invalid condition: offsets -3, -2, -1, 1, 2, 3). For the invalid conditions, positive values indicate the item was located clockwise from the cue and negative values indicate a counter-clockwise direction from the cue (see the inset in Fig. 1). We subtracted the memory error under cueing conditions from the error under the neutral condition (i.e., baseline) as a measurement of the cueing effect (i.e., neutral − cue).

To examine the profile of attentional modulation, we compared whether a non-monotonic or a monotonic model best fitted the cueing effect in a Bayesian framework. The monotonic model was implemented as a Gaussian function, which has three free parameters:

$$ P=\frac{A}{w}{e}^{-\frac{x^2}{2{w}^2}}+b, $$

where P is memory error, x is the cue-probe offset, and w, A, and b are the free parameters controlling the shape of the function. The non-monotonic model was implemented as a negative second derivative of a Gaussian function, which has a Mexican hat shape:

$$ P=\frac{2A}{\sqrt{3w}{\pi}^{\frac{1}{4}}}{e}^{-\frac{x^2}{2{w}^2}}\left(1-\frac{x^2}{w^2}\right)+b, $$

where P is memory error, x is the cue-probe offset, and w, A, and b are the three free parameters controlling the shape of the function. We calculated the evidence supporting each model using the Bayesian Information Criterion (BIC). Assuming a normal error distribution, the formula for BIC is:

$$ BIC=n\ln \left(\frac{RSS}{n}\right)+k\ln (n), $$

where RSS is the residual sum of squares (Raftery, 1995), n is the number of observations, and k is the number of free parameters. Model selection is based on the Bayes factor, which is computed by comparing the evidence supporting the non-monotonic model (i.e., Mexican hat function) over evidence supporting the monotonic model (i.e., Gaussian function) based on BIC approximation:

$$ B{F}_{MG}={e}^{\left(\frac{\left( BI{C}_G- BI{C}_M\right)}{2}\right)}, $$

where BICG is for the Gaussian function, BICM is for the Mexican hat function. BFMG quantifies the odds favoring the non-monotonic Mexican hat function over the monotonic Gaussian function.

For the frequentist analyses, we first conducted a two-way repeated-measures ANOVA (direction of offset: clockwise vs. counterclockwise; distance: 1, 2, 3) to assess whether the cueing effect was symmetric for the clockwise and counterclockwise offsets. Then, we averaged the cueing effects across the positive and negative cue-probe offsets and conducted t-tests against 0 to assess the cueing effect, where positive values indicated enhancement and negative values indicated suppression. To determine whether a surround suppression effect was reliable, multiple t-tests were also conducted between cue locations (i.e., cue-probe distance 2 vs. 0 and 2 vs. 3). For both sets of multiple t-tests, we corrected p-values based on Bonferroni correction.

Results

Validity effect relative to neutral baseline

The overall validity effect of external attention is shown in Fig. 2a. Compared to the neutral condition, external attention reduced memory error at the cued location (Fig. 2a), t(11) = -12.04, p < 0.001, Cohen’s d = 3.48. Moreover, memory performance under invalid condition was impaired, t(11) = 6.77, p < 0.001, d = 1.96, compared to the neutral condition.

Fig. 2
figure 2

Overall validity effect and model fitting results for external attention (left column, Exp. 1, N = 12) and internal attention (right column, Exp. 2, N = 12). Panels (a) and (c) show the validity effect in memory error (* p< 0.05). Lower values (smaller bars) represent better performance. Error bars represent standard error of the mean. Panels (b) and (d) show the cueing effect as a function of cue-probe offset (filled symbols) and model-fitting results using both a monotonic Gaussian function (gray line) and a non-monotonic Mexican hat function (black line). Positive values indicate enhancement and negative values indicate suppression in the cueing effect. Note a smaller scale on the y-axis for internal attention (panel d) was employed for illustration purposes

Model comparison

To evaluate the overall profile of attentional modulation (Fig. 2b), we fitted a non-monotonic model (i.e., Mexican hat function) and a monotonic model (i.e., Gaussian function) to the group-averaged cueing effect in memory errors as a function of cue-probe offset. The non-monotonic model (R2 = 1) was favored by a Bayes factor (BFMG) of 1.7 x 108 over the monotonic model (R2 = 0.92), which constitutes very strong evidence for the non-monotonic model (Raftery, 1995). We also compared the two models at the individual level and found that the non-monotonic model was favored over the monotonic model in nine out of 12 participants.

Combined cueing effect

The cueing effect appeared symmetric between clockwise (+) and counter clockwise (-) directions. We found no significant main effect of the direction of offset (F(1,11) = 0.018, p = 0.89) nor an interaction between direction of offset and distance (F(2, 22) = 0.033, p = 0.97) in a two-way repeated-measures ANOVA. Therefore, we averaged the cueing effect for each pair of +/- offset at the participant level and conducted statistical significance tests to further verify the non-monotonic surround suppression effect. We found that all uncued locations produced costs (e.g., Fig. 3a). Relative to the neutral condition, participants’ performance was worse when the cue-probe offset was one item, t(11) = 3.39, p = 0.024, d = 0.98, two items, t(11) = 8.64, p < 0.001, d = 2.49, and three items, t(11) = 3.53, p = 0.019, d = 1.02. Planned pairwise comparisons also confirmed that a cue-probe offset of ±2 is significantly worse than the cued location, t(11) = 10.87, p < 0.001, d = 3.14, and the furthest location (i.e., cue-probe offset ±3), t(11) = 4.33, p = 0.0024, d = 1.25. Thus, the suppression was strongest at the intermediate cue-probe offset with a rebound at the farthest location, indicating a surround suppression effect. These results converge with those obtained by model-fitting above.

Fig. 3
figure 3

Combined cueing effect for external attention (left panel, Exp. 1, N = 12) and internal attention (right panel, Exp. 2, N = 12). A positive value means enhancement and a negative value means suppression. Error bars represent standard error of the mean. * p< 0.05

While consistent with previous findings of surround suppression in perceptual discrimination tasks (Cutzu & Tsotsos, 2003; Mounts, 2000a, 2000b), these results demonstrate that the surround suppression effect also manifested in memory performance as measured in our protocol. These results, thus, set the stage for us to examine the spatial profile of internal attention next.

Experiment 2 – Internal attention

In this experiment, we manipulated internal attention with a retrocue during memory maintenance (Griffin & Nobre, 2003) to examine if a surround suppression effect occurs. If memory maintenance recruits sensory mechanisms, we would expect the occurrence of a surround suppression profile from internal attention similar to the effect observed from external attention in Experiment 1.

Methods

Participants

Twelve additional observers participated in Experiment 2 and gave informed consent. All had normal or corrected-to-normal visual acuity and reported normal color vision, which was verified with the Dvorine Pseudo-Isochromatic Plates (Dvorine, 1963). The Institutional Review Board at Michigan State University approved the experimental protocols. We compensated all participants at the rate of US$10 per hour. The sample size was selected based on another pilot study we conducted that used a similar design. The pilot study yielded a Cohen’s d of 0.96 for the surround suppression effect, when comparing the intermediate location to the furthest location. Such an effect size required a minimum of nine participants for a power of 0.80 in detecting an effect at an alpha level of 0.05.

Stimuli, task, and procedure

This experiment was identical to Experiment 1 except that a retrocue (90 ms) was presented 400 ms before the probe’s onset during the second delay (Fig. 1b), such that the delay period for memory retention was equated between experiments.

Eye-movement recording and analysis

We recorded participants’ eye position and analyzed eye-tracking data in a similar manner to Experiment 1. We compared fixation deviation during the memory array period between the retrocue and neutral condition and found no significant difference (t(11) = 0.38, p = 0.71). We also examined eye position as a function of retrocue location during the retrocue period and the following fixation period before the probe onset. This is the time period where eye movement to the cued location could potentially occur. However, we did not find any difference in fixation position among different retrocue locations (radial distance: F(5,55) = 1.58, p = 0.18, polar angle: F(5,55) = 1.17, p = 0.34). Thus, eye positions were equivalent across experimental conditions.

Results

Validity effect relative to neutral baseline

The retrocue facilitated memory at the cued location (Fig. 2c), t(11) = -4.84, p = 0.001, d = 1.4. Similar to external attention, this benefit came at the expense of performance in the invalid condition (Fig. 2c), t(11) = 2.72, p = 0.039, d = 0.79.

Model comparison

Again, we fitted a non-monotonic Mexican hat model (R2 = 0.96) and a monotonic Gaussian model (R2 =0.84) to the average cueing effect in memory error (Fig. 2d). The effect of internal attention yielded a non-monotonic pattern with strong support according to the Bayes factor (BFMG =107.14). We also compared between the two models at the individual level and found that the non-monotonic model was favored over the monotonic model in 10 out of 12 participants.

Combined cueing effect

The internal attention’s modulation profile also appeared symmetric between clockwise (+) and counter clockwise (-) directions. The two-way (direction of offset and distance) repeated-measures ANOVA showed no significant main effect of direction (F (1,11) = 0.6, p = 0.45) or an interaction between direction and distance (F(2, 22) = 0.37, p = 0.69). We then averaged cueing effect across the +/- offsets at the participant level and conducted statistical significance tests (Fig. 3b). We found that participants performed worse than the neutral baseline only when the cue-probe offset was 2, t(11) = 3.26, p = 0.03, d = 0.94. We did not observe a cost to memory at the closest cue-probe offset (i.e., 1), t(11) = 1.33, p = 0.28, d = 0.38, or the furthest cue-probe offset (i.e., 3), t(11) = 1.1, p = 0.3, d = 0.32. Importantly, the planned pairwise comparisons showed that memory error at the cue-probe offset 2 was significantly worse than the cued location, t(11) = 5.57, p < 0.001, d = 1.61, and the farthest location (i.e., cue-probe offset 3), t(11) = 3, p = 0.02, d = 0.87. These results, thus, demonstrated a surround suppression profile and converged with the model fitting results.

We note that the absolute magnitude of performance modulation is much smaller for internal attention than external attention, both for the enhancement effect (for the valid item) and the suppression effect (for the invalid items). This is likely due to the fact that precues can modulate stimulus encoding, which is known to produce large perceptual effects such as inattentional blindness (Simons, 2000). However, retrocues can only operate on previously encoded representations via reallocation of mental resources during maintenance. Thus it seems reasonable that the retrocue would have a weaker impact on memory performance than the precue. Interestingly, the modulation ratio (average enhancement / average suppression) is similar between the two (precue: 1.18 vs. retrocue: 1.15), suggesting that the relative degree of performance modulation is similar between the two cue types.

General discussion

We found that prioritization as a result of external and internal attention elicited a spatial surround suppression effect in memory performance. Previous studies have provided strong evidence that this non-monotonic attentional modulation occurs in early visual areas for (Boehler et al., 2011; Boehler et al., 2009; Hopf et al., 2006; Muller & Kleinschmidt, 2004). The retinotopic organization of early visual areas provides a natural architecture to suppress interference from nearby stimuli and prioritize the attended stimulus. Our findings for internal attention suggest that working memory can draw on a similar sensory mechanism to reduce interference when prioritizing a memory representation. The observation that similar spatial profiles were produced as a result of internal and external attention supports the sensory recruitment hypothesis of VWM representations (D'Esposito & Postle, 2015).

Using a similar logic to ours, two recent studies examined whether perceptual effects due to retinotopic sensory mechanisms (crowding in Harrison & Bays, 2018 and contrast normalization in Bloem, Watanabe, Kibbe, & Ling, 2018) are observed in WM performance. Unlike our results, these studies did not find spatial effects in WM and are, thus, argued to be evidence against sensory recruitment. The discrepancy between our results and these findings might be due to the sequential presentation of items in the latter experiments. For example, Harrison and Bays (2018) sequentially presented the target and flanker orientation, with different colors to distinguish between them. In Bloem et al.’s (2018) study, the center and the surround gratings were presented separately with independent changes in their contrast. However, such sequential presentation rendered spatial location irrelevant to the task. As such, memory representations in these studies might be formed without any spatial context and rely less on spatial sensory mechanisms. In contrast, we presented the memoranda simultaneously, making it necessary to retain the spatial configuration of the stimuli in order to utilize the retrocue. It seems likely that a spatial effect in memory can be more easily detected when items are presented simultaneously and spatial information is task relevant.

In a recent study, Souza, Thalmann, and Oberauer (2018) also manipulated external and internal attention using a similar paradigm to ours, and measured the spatial profile of prioritization in VWM. They found a monotonic gradient of modulation for both external and internal attention as a function of cue-probe offset. On the one hand, this result is consistent with our suggestion that simultaneous presentation of the memory array is necessary for a spatial effect to emerge. On the other hand, the exact pattern of the spatial profile (the monotonic gradient) in their study is inconsistent with our finding of a non-monotonic Mexican-hat profile. Two possibilities may explain such a discrepancy. Firstly, the spatial sampling interval may not be optimal in detecting the surround suppression effect in Souza et al. (2018). While we chose our sampling interval (2.8° minimum center-center distance) based on previous studies (Boehler et al., 2011; Boehler et al., 2009; Cutzu & Tsotsos, 2003; Hopf et al., 2006; Mounts, 2000a, 2000b; Muller & Kleinschmidt, 2004), Souza et al. sampled on a larger scale (3.8° center-center distance), such that the suppressive surround might have been missed. Secondly, our delayed estimation task is likely more sensitive to detect subtle differences in memory strength than the change detection task employed by Souza et al., which only assessed large color changes.

While previous studies have generally found that a valid retrocue enhances working memory, there have been discrepant findings regarding whether an invalid retrocue incurs a cost (see review by Souza & Oberauer, 2016). Our results may help explain these discrepancies as we found that internal attention’s surround suppression mainly exerted a significant cost at the intermediate location (i.e., offset ±2), and a numerical trend for suppression at the nearer or farther locations (i.e., offset ±1 and ±3). Thus, the cost of an invalid retrocue may depend on the spatial separation between cued and uncued items, a factor not considered in previous studies. Additionally, participants might need more time (i.e., the delay period after retrocue) to fully suppress all unattended locations. Thus, a longer delay period might lead to an overall stronger surround suppression effect such that location 1 might also become significantly suppressed.

While our results support sensory recruitment, they do not argue against the contribution of non-sensory systems to VWM. It has been proposed that WM is flexible and can adjust its dependence on the sensory system according to task demand (Lorenc, Sreenivasan, Nee, Vandenbroucke, & D'Esposito, 2018). Our results show that when the task requires high-fidelity, visuospatial representations, sensory mechanisms are likely recruited.