Visual working memory (VWM) refers to the ability to encode, maintain, and retrieve visually presented stimuli over a brief amount of time. A key feature of VWM is that it is capacity-limited, and an emphasis of recent cognitive neuroscience research has been to elucidate the neural bases of this capacity limitation. Many of the foundational studies in this area have focused on the maintenance phase of VWM (Cowan, 2001; Luck & Vogel, 1997; Todd & Marois, 2004; Vogel & Machizawa, 2004; for reviews, see Luck & Vogel, 2013; Postle, 2006; Stokes, 2015). However, this focus has tended to obscure the facts that maintenance depends on successful encoding of the to-be-remembered stimuli and that capacity limitations are partially attributable to events during encoding (Cohen, Sreenivasan, & D’Esposito, 2012; Emrich, Riggall, LaRocque, & Postle, 2013; Ester, Serences, & Awh, 2009; Gurariy, Killebrew, Berryhill, & Caplovitz, 2016; Harrison & Tong, 2009; Peterson et al., 2014). Moreover, experiments focusing on encoding-related processes commonly test retrieval by using recognition, yet it is well accepted that VWM performance can be influenced by the VWM task demands (Adam, Mance, Fukuda, & Vogel, 2015; Berryhill, Chein, & Olson, 2011; Duncan, Schramm, Thompson, & Dumontheil, 2012; Klein, Addis, & Kahana, 2005). An unintended consequence of this focus on maintenance is that we may know less than we realize about the neural mechanisms underlying encoding and the role they play in the overall limitations of VWM capacity.

Here we paired a full-report recall VWM paradigm with high-density electroencephalography (hdEEG) and used frequency domain analyses to probe the neural correlates of VWM encoding. Specifically, we addressed two questions regarding the role of encoding-related processes in the successful retrieval of information in VWM. First, we examined how resources are assigned to individual items within a VWM array as a function of whether or not the item was remembered. Second, we examined how task-specific retrieval demands influence how items are encoded into VWM. One point central to the questions raised here was that, given an array of to-be-remembered items, people often remember certain stimuli, whereas others are subsequently forgotten.

Two general models have been proposed to describe how neural resources may be allocated across items within a VWM array (Mance, Adam, Fukuda, & Vogel, 2014; Peterson et al., 2014). One model, which we will call the “all-or-nothing” model, proposes that on any given trial, neural resources are allocated in an all-or-nothing fashion. In other words, on correct trials participants successfully allocate neural resources to all items in a VWM array and are then able to recall all or most of the items. Importantly, this model predicts a uniform distribution of neural signatures across all items within a given trial, with greater amplitudes expected on trials in which more items are remembered. Alternatively, one could imagine that encoding-related resources might be distributed unevenly across an array, with only a subset of items receiving sufficient resources to ensure successful encoding, thereby increasing the likelihood of them being remembered. We will refer to this model as the “subset-of-items” model. The primary prediction born of this model is that for each trial, items that are successfully recalled will have larger frequency tag amplitudes than do items that are not successfully recalled, and there will be few trials in which all items are remembered.

To test these models, we applied a full-report VWM recall paradigm (Adam et al., 2015; Cabeza et al., 1997; Klein et al., 2005). Unlike the more traditional recognition paradigms, which measure memory performance for a single item in a display, a full-report recall paradigm allows us to measure VWM for each item in the display. We can test the predictions of the two models by examining the distribution of neural responses as a function of whether or not each item in the display was subsequently recalled. To address our second question—whether task-specific retrieval demands can influence the manner in which items are encoded into VWM—we applied two variants of the full-report recall paradigm. In the simultaneous paradigm, participants freely recalled items in any order they wished. In the sequential paradigm, participants were forced to recall items in a randomly determined order. We could thus determine whether the distribution of neural responses during encoding was influenced by these distinct retrieval demands.

Here we leveraged classic frequency domain EEG techniques—frequency tagging of the visual evoked potentials in response to rapid serial stimulus presentations (for a recent comprehensive review, see Norcia, Appelbaum, Ales, Cottereau, & Rossion, 2015) and induced-power analyses (Canolty & Knight, 2010; Klimesch, 1999; for a review, see Ward, 2003). Frequency tags can be described as the neural response to visual stimulation presented at specific frequencies. For instance, a stimulus continually flickering at 5 Hz produces a corresponding 5-Hz steady-state oscillation detectible in the EEG signal. This signal can be detected in frequency space, revealing a “frequency tag” at 5 Hz. This technique is useful for identifying the consequences of attentional selection on visual stimuli. For example, when focused attention is applied to a flickering stimulus, the corresponding frequency tag increases in amplitude as compared to an unattended stimulus (Hillyard & Anllo-Vento, 1998; Morgan, Hansen, & Hillyard, 1996; Müller, Teder- Sälejärvi, & Hillyard, 1998; Muthu, Suzuki, Joon Kim, Grabowecky, & Paller, 2007; Zhang, Jamison, Engel, He, & He, 2011). If there multiple stimuli are flickering at different rates, the frequency tags for each stimulus are evident in the EEG (Appelbaum, Wade, Petter, Vildavski & Norcia, 2006, 2008; Zhang et al., 2011). This offers the tremendous advantage of permitting the measurement of individual items rather than measuring an aggregate response to all stimuli. We previously had leveraged this technique to investigate the neural correlates of VWM encoding while measuring behavioral performance within recognition paradigms. We found that the frequency tag amplitudes for later-remembered items were significantly greater than those for later-forgotten items (Peterson et al., 2014). We also found that on correct trials, the frequency tag amplitudes were significantly greater when the array contained two rather than four items (Gurariy et al., 2016). Together, these observations demonstrated the effectiveness of the frequency-tagging technique for identifying the neural correlates of VWM performance during encoding.

An analysis that is complementary to frequency tagging is to assess changes in induced power. Changes in induced oscillatory power within specific frequency bands (e.g., alpha, theta) can be interpreted as the results of task-specific neural processes and/or as a mechanism for cortico-cortical and cortico-subcortical communication (Canolty & Knight, 2010). Induced oscillations have been examined in WM paradigms in order to examine WM capacity, retrieval, task difficulty, attention, and executive control (for reviews, see Klimesch, 1999; Ward, 2003). However, the literature has revealed findings that have implicated various frequency bands in specific WM task demands. For example, some studies have reported increased theta power as a function of increased WM load (Deiber et al., 2007; Jensen & Tesche, 2002) and as a function of WM task demands or characteristics (Caplan, Madsen, Raghavachari, & Kahana, 2001; Kahana, Seelig, & Madsen, 2001; Kahana, Sekuler, Caplan, Kirschen, & Madsen, 1999). Others studies have reported increased theta power as well as decreased alpha power as WM demands increase (Gevins, Smith, McEvoy, & Yu, 1997) and when task difficulty and attentional engagement increase (Klimesch, 1999; Klimesch, Doppelmayr, Schimke, & Ripper, 1997). In contrast, others have reported increased alpha band power with increased WM load (Adam et al., 2015; de Vries, van Driel, & Olivers, 2016; Erickson, Albrecht, Robinson, Luck, & Gold, 2017; Fukuda, Kang, & Woodman, 2016; Fukuda, Mance, & Vogel, 2015; Fukuda & Woodman, 2017; Jensen, Gelfand, Kounios, & Lisman, 2002; Klimesch, Doppelmayr, Schwaiger, Auinger, & Winkler, 1999; Krause, Lang, Laine, Kuusisto, & Pörn, 1996). Here we included measurements of induced oscillatory power in order to identify the effects of WM performance on induced oscillatory frequency bands.

As a preview of what will be described in detail in the following sections, participants viewed arrays of four items and remembered item–location conjunctions. Unique frequencies were assigned to each stimulus, allowing the frequency tag amplitude to be calculated for each item, either correct or incorrect, at each hdEEG electrode. The results indicated that during encoding, higher-amplitude frequency tags are observed for correctly recalled items than for those later recalled incorrectly, but there were few trials in which most of the items were retrieved. These results were consistent with the “subset-of-items” model, and were true for both the sequential and simultaneous full-report paradigms. Moreover, during encoding there were distinct patterns of induced power as a function of how many items were remembered on a given trial. Further highlighting the importance of encoding on VWM performance, these patterns depended on the retrieval task.

Method

Participants

Twenty-six adults participated in this study (13 males, 13 females; 19–35 years of age). Six datasets (i.e., six participants) were excluded because of excessive artifacts in the hdEEG data. All participants had normal or corrected-to-normal vision and no history of neurological disorders. The Institutional Review Board at the University of Nevada, Reno, approved all experimental protocols. Each participant gave written, informed consent.

Stimulus apparatus and display

The stimuli were presented on a Mitsubishi Diamond Pro270 (20 in., 1,024 × 768, 120 Hz) monitor run by a Mac Mini (2.6 GHz, 4GB 1600 MHz DDR3 RAM) computer with an Intel High Definition (HD) 4000 graphics processor (768 MB of DDR3 SDRAM). Stimuli were presented with the Psychophysics Toolbox (6.1; Brainard, 1997; Pelli, 1997) for Matlab (R2012b, The Mathworks Inc., Natick, MA).

Electrophysiological apparatus and recordings

The hdEEG data were collected using a Geodesic system 300 with a HydroCell Geodesic Sensor Net 130, 256-electrode cap at a sampling rate of 1000 Hz. The hdEEG was amplified using an EGI Net Amps Bio Amplifier with a gain factor of 20,000 per channel and digitized using a Texas Instruments analog-to-digital (AD) converter (ADS 1251, 24 bit, 20 kHz). The data were then recorded using the Net Station 5.0 software package (Electrical Geodesics, Inc., Eugene, OR, USA). Electrical impedances for the majority of the electrodes were kept below 50 Ω. Synchrony between the stimulus presentation and the EEG recording, as well as the flicker rate of the stimuli, was verified using a photodiode attached to the monitor. The photodiode output was amplified using a transimpedance amplifier and digitized with the EEG data via one channel of the AD converter.

Task design and procedure

Participants performed a VWM full-report paradigm with a set size of four items. Each trial started with a gray fixation screen (15.63 cd/m2) presented for 600 ms, plus or minus a random jitter between 0 and 100 ms, that contained a central fixation point of approximately 0.3 × 0.3 deg of visual angle) along with four black squares of approximately 1.2 × 1.2 deg of visual angle. The black squares (0.1538 cd/m2) were placed at the center of each quadrant, with the addition of random jitter (± 1.5 deg). They were presented first in order to denote the locations of the four stimuli and to prevent any contamination from visual-onset-evoked potentials in the hdEEG recordings. During encoding (1,000 ms), the squares were replaced with the VWM stimuli. These stimuli were pseudorandomly chosen from a set of ten polygons.

Each polygon flickered at one of four possible frequencies (3, 5, 12, or 20 Hz) pseudorandomly assigned to a location. One flicker cycle corresponded to the object reversing contrast from black to white. Thus, a 5-Hz flicker frequency has a 200-ms cycle, during which the shape is black for 100 ms and white (76.49 cd/m2) for 100 ms (i.e., 12 frame refreshes). These frequencies were chosen because the encoding duration (1,000 ms) permitted exact periods of stimulation based on the 120-Hz frame refresh rate. Another reason why these frequencies were chosen was because the harmonics of each fundamental frequency are independent of one another up to the third harmonic, which allowed the fundamental frequency of each stimulus to be recorded, as well as the second and third harmonics, without contamination from the other stimuli. Following the presentation of the stimuli a blank screen was presented, and observers were instructed to remember the shape and location of each stimulus in VWM (1,000 ms). During retrieval, all ten possible polygons were placed in the center row of the screen, as is shown in Fig. 1A. Retrieval demands included two types of recall testing: simultaneous and sequential. During simultaneous recall, participants dragged polygons into one of the four possible quadrants in an order of their choice with the computer mouse. In the sequential recall, participants were forced to place each stimulus in a randomly designated quadrant during testing. Responses were untimed for both the simultaneous and sequential recall types. Each recall type (simultaneous and sequential) was randomly presented in three blocks with 25 trials each, for a total of 75 trials per retrieval task, or 150 total trials.

Fig. 1
figure 1

VWM task design and steady-state visually evoked potential frequency-tagging technique. (A) An example trial sequence for the full-report paradigm that we tested. During fixation, a black square was presented in each quadrant (for 600 ms), followed by an encoding period (1,000 ms). During encoding, each stimulus, placed one per quadrant, flickered at a unique frequency (3, 5, 12, or 20 Hz). Next, a maintenance period (1,000 ms) was followed by retrieval. There were two retrieval task types: (i) simultaneous and (ii) sequential. During simultaneous recall, participants were instructed to choose and place the four encoded polygons in the locations where participants remembered them being. During sequential recall, participants were randomly designated one location at a time in which to place each polygon. After the participant had placed the shape, the bounding black square turned green to indicate that the choice had been registered, and a new black square appeared at another location. In both recall tasks, all of the selected shapes remained visible within the bounding box until the conclusion of the trial. Responses were unspeeded. (B) During encoding, hdEEG data were collected (artificial waveform data are presented here). (C) Individual trials were sorted by accuracy: The numbers of correct and incorrect trials for each frequency were matched depending on which type had the lower number. A fast Fourier transform (FFT) was applied to the raw electrical signal, averaged across electrodes, and individual component frequency amplitudes were extracted from the FFT data, corresponding to the flicker frequencies of each stimulus for both correct and incorrect trials. As in panel B, the data here are hypothesized and not actual results

Before they began the experiment, participants were instructed not to blink, make jaw movements, or move their eyes away from central fixation and were shown how these movements would affect the EEG trace. Each participant was asked to blink during retrieval. They were also instructed that they would have two recall tasks during the experiment, and they completed five practice trials of each recall task to familiarize them with that task.

Behavioral data analysis

Behavioral accuracy was calculated separately for simultaneous and sequential recall for each participant. Participants correctly recalled between zero and four items. Accuracy was calculated as the percentage of correct responses across all trials for each recall type and averaged across participants. Accuracy was also calculated for the order of items retrieved. For example, if a participant correctly placed the first item but incorrectly placed the second item, accuracy was considered 100% for the first item and 0% for the second item. The percentages of trials on which a participant correctly recalled one, two, three, or four items were also tabulated.

Electrophysiological data analysis

Preprocessing

The hdEEG data were processed and analyzed using a combination of functions from the Net Station 5.0 software package (Electrical Geodesics, Inc., Eugene, OR, USA), the Brainstorm software package (Tadel, Baillet, Mosher, Pantazis, & Leahy, 2011: http://neuroimage.usc.edu/brainstorm), and custom-made Matlab scripts. The data from both correct and incorrect trials were analyzed. First, in Net Station the data were high-pass filtered (>0.5 Hz) in order to remove slow-drift noise. The filtered data were then segmented into 1,000-ms epochs that were time-locked to stimulus onset and lasted until stimulus offset, which was the duration of the encoding period (150 total epochs per participant, 75 epochs of each recall type). To ensure the accuracy of the data segmentation, event triggers generated in Matlab were sent from the presentation computer to the acquisition computer at the start of every encoding period. The offset periods between trigger generation in Matlab and trigger registration in NetStation were measured using a photodiode. This offset was accounted for during data segmentation. Custom-made Matlab scripts sorted the individual epochs by recall type (simultaneous, sequential) and by the number of items correctly recalled.

Artifact detection and subject exclusion

After segmentation, the data epochs were cleaned of artifacts (e.g., eye movements, eye blinks, and bad channel detection) using NetStation’s built-in tools. An epoch was marked as bad and removed if it contained more than 20% bad channels or contained an eye movement or eye blink. Eye blinks and eye movements were detected using NetStation’s built-in artifact detection tool using minimum and maximum detection thresholds of 55 and 140 μV, respectively. When possible, if an epoch had one or more bad channels, these channels were replaced by interpolating the data from surrounding channels. Additionally, if there was an epoch in which up to 20% of the channels were bad, that epoch was removed. Bad channels were detected using NetStation’s built-in artifact detection tool using a minimum/maximum detection threshold of 200 μV. Participants who had more than 25% bad epochs were removed from the study. A total of six participants were excluded from the analysis because they exceeded the allowable number of bad epochs.

After excluding the participants who were rejected, an average of 7.1 trials (SD = 8.8), were excluded per participant, on the basis of artifact detection or excessive bad channels.

Permutation analysis

Since this was a full-report design, each trial consisted of four items that could be either correctly or incorrectly recalled. This is different from a recognition paradigm in which each trial probes one item. Due to this fact, for this analysis the trials were sorted on the basis of the accuracy of the individual items at their specific frequencies. For example, when examining the 3-Hz frequency tag amplitudes, we used trials in which the participant correctly recalled the 3-Hz item as compared to trials in which they incorrectly recalled the 3-Hz item, regardless of accuracy for the other items. To account for the unequal numbers of correct and incorrect epochs (e.g., a greater number of 3-Hz correct than of 3-Hz incorrect), we implemented a permutation analysis in which, on each of 100,000 iterations, equal numbers of correct and incorrect epochs would be pseudorandomly selected. For example, a participant might have recalled the 3-Hz item correctly 30 times (30 correct trials) and erred 20 times (20 incorrect trials). Given this case, our permutation analysis would pseudorandomly select 20 of the correct epochs to match with the 20 incorrect epochs. Importantly, we ensured that the numbers of correct and incorrect trials were matched to the type that included the smallest number of trials for each frequency (i.e., 20 trials were selected from both groups if one of the groups had either 30 correct and 20 incorrect or 20 correct and 30 incorrect as the minimum number of trials in both cases is 20). Due to the complexity of this permutation analysis, we validated our process by first simulating EEG datasets in which we embedded known signals at each of our target frequencies, and demonstrated that these known signals could be accurately recovered. Specifically, using Matlab we generated simulated EEG signals for which we controlled the amplitudes of the embedded frequency tag specific oscillations as well as a small amount of added noise. We then subjected these data to our permutation analysis to ensure that the amplitudes recovered from the analysis matched the known embedded frequencies.

Frequency-tagging analysis

For each of the 100,000 iterations of the permutation analysis, we computed the time domain average of the trial-number-matched epochs (correct, incorrect) for each of the target frequencies. A fast Fourier transform (FFT) was then applied to each average in order to extract the frequency tag amplitude corresponding to the frequency of the presented item (3, 5, 12, or 20 Hz) as a function of whether it was correctly or incorrectly recalled.

To directly compare the amounts of neural resources allocated to the different items in the memory array when the participant correctly or incorrectly recalled each item, and to control for confounding attentional effects caused by using a range of frequencies, index values were created. These indices were calculated using the difference between the amplitudes for a correct as compared to an incorrect item and scaled by a measure of the sum of the two amplitudes for each individual frequency:

$$ {\mathrm{Index}}_F=\frac{\left({F}_{\mathrm{Correct}}-{F}_{\mathrm{Incorrect}}\right)}{\left({F}_{\mathrm{Correct}}+{F}_{\mathrm{Incorrect}}\right)} $$

where F represents the amplitude of the frequency being calculated. The index values could range between – 1 and 1. A positive index value would indicate that the amplitude was greater for the correctly recalled item, whereas a negative index value would represent a greater amplitude for an incorrectly recalled item. An index value of zero indicated there was no difference between the amplitudes of correct and incorrect items.

To control for outliers, the process of matching the numbers of correct and incorrect items, averaging across correct and incorrect items, applying the FFT, and calculating the index was repeated for each of the 100,000 iterations of the permutation analysis. These index values were averaged across iterations and then frequencies, for each electrode and participant. See Fig. 1 for a visual representation of the data analysis using hypothesized EEG data and frequency tags.

A group-level analysis of these indices was performed separately at each electrode using repeated measures, one-sample, two-tailed t tests against a test value of 0. The significance of these tests was evaluated using an alpha of .05. Additionally, to correct for multiple comparisons, we applied a false discovery rate correction (q = .10).

Induced-power analysis

This orthogonal analysis permitted us to examine the task-specific effects of induced power. First, we segmented the data on the basis of the number of correct responses, regardless of flicker frequency. For example, we contrasted “good” trials, in which participants correctly recalled three or four of the items, with “bad” trials, in which participants correctly recalled one or two of the items. Trials in which participants correctly recalled zero items were excluded, since they made up a relatively small number of trials (6.25 trials per condition averaged across participants, with a standard deviation of 5.84) and due to the possibility that participants might not have actually performed the task during these trials. We performed the analysis in this way, instead of breaking the data into bins for one, two, three, and four correct, for two reasons. First, this procedure had been used in previous studies that had looked at neural activity as a function of “good” or “bad” trials (Mance et al., 2014). Second, this analysis required trials to be segmented into groups based on the number of items correctly recalled (one, two, three, or four). However, the numbers of trials in which participants correctly recalled one or four items were significantly smaller than those in which participants recalled two or three items. This difference would lead to data loss during the permutation procedure. As a result, the number of trials that would have gone into the permutation analysis would be much smaller than in the evoked analysis. Therefore, we parsed trials into “good” and “bad” groups. After segmenting the trials, we performed an FFT and extracted amplitudes across the range of frequency power bands (delta, theta, alpha, and beta) for each individual trial. We then averaged these frequencies across trials. This contrasted with the permutation analysis explained above, in which we averaged across trials before taking the FFT; this procedural difference allowed for the extraction of induced power that would have been lost due to variations in phase (Bastiaansen & Hagoort, 2003). This is an important difference to note between the two analyses: By using frequency tagging, we were explicitly entraining neural activity to that of an externally flickering stimulus, which allowed us to examine the neural signature specific to each stimuli. In contrast, the induced-power analysis examined changes in neural oscillations that were independent of the external stimulus presentation (Bastiaansen & Hagoort, 2003). Next we performed a permutation analysis, repeating this process 25,000 times to control for uneven numbers of trials. We performed the analysis only 25,000 times because of the increased computational load of performing the FFT on each individual trial, as opposed to on the average of trials. We again calculated an index for each frequency within each power spectrum between the good and bad trials. These index values were then averaged across the frequency bands: delta (1–4 Hz), theta (5–7 Hz), alpha (8–15 Hz), and beta (16–31 Hz).

A group-level analysis of these indices was performed separately at each electrode using repeated measures, one-sample t tests against a test value of 0. The significance of these tests was evaluated using an alpha of .05. Additionally, to correct for multiple comparisons, we applied a false discovery rate correction (q = .10).

Results

Behavioral accuracy

The majority of VWM studies have employed either recognition probes or partial report. A strength of the present study was its use of a whole-report paradigm. This allowed us to examine the behavioral data in greater detail, and we took advantage of this by comparing more than the aggregate accuracy differences between the two recall conditions. In full-report paradigms the chance performance rate is not a single, easy-to-calculate number, the way it is in recognition paradigms. This is because for the first item chance performance is 10% (one choice out of ten potential options), but chance becomes a slightly different value for each subsequent item. This is because for each subsequent response, the total number of options decreases by 1. Therefore, because chance changed for each subsequent response, we use this 10% level only as an estimate in Fig. 2A. To test whether accuracy differed between the simultaneous and sequential tasks, we conducted a paired-samples t test and found that performance was significantly higher on the simultaneous trials than on the sequential trials: t(19) = 5.058, p = .00007 [simultaneous M = 56.74% (SD = 2.70), sequential M = 47.49% (3.02)].

Fig. 2
figure 2

Behavioral VWM performance. (A) Plotted is behavioral performance (% correct) in the working memory task for the simultaneous and sequential recall types. Accuracy was calculated as the percentage of correctly retrieved items, divided by the total number of items presented. The dashed black line represents an approximation of chance performance. *p < .01. (B) Plotted here is behavioral performance (% correct) for response order in the simultaneous and sequential recall types. Accuracy is calculated as the percentage of correctly recalled nth items as a proportion of the total number of items presented as the nth response. (C) Plotted is the percentage of trials on which x items were recalled correctly (x = 0 to 4). All error bars represent standard errors of the means

Accuracy for response order showed declines with each subsequent response for each task [simultaneous: 1st: 87.65% (1.76), 2nd: 72.35% (2.82), 3rd: 44.21% (4.70), 4th: 22.74% (3.05); sequential: 1st: 62.95% (3.21), 2nd: 45.82% (0.96), 3rd: 41.96% (3.50) 4th: 40.42% (3.56)] (Fig. 2B). A repeated measures analysis of variance (ANOVA) was calculated to determine differences between task types (sequential, simultaneous) and response orders. The ANOVA reported significant main effects of both task [F(1, 19) = 25.59, p < .0001] and response order [F(3, 57) = 133.69, p < .0001], as well as a significant interaction [F(3, 57) = 58.14, p < .0001]. Performance was better for early responses, and the slope of performance was steeper in the simultaneous than in the sequential task; see Fig. 2.

Another strength of the full-report paradigm is that it allowed us to examine the distribution of correct and incorrect items across trials. For example, the distribution of correct and incorrect items across trials might be consistent with predictions made by the “all-or-nothing” model, in which the distribution would be much more bimodal, with the majority of trials containing zero or four items correct. Alternatively, the distribution might be more consistent with predictions made by the “subset-of-items” model, in which the majority of trials would contain two correct and two incorrect. To evaluate these possibilities, we calculated the percentages of trials in which participants correctly recalled zero, one, two, three, and four items for each of the two recall types (Fig. 2C). A repeated measures ANOVA was calculated to determine the effect of task (sequential, simultaneous) on the number of items recalled correctly. There were significant main effects of both task [F(1, 19) = 13.15, p < .0001] and number of items correctly recalled [F(3, 57) = 26.67, p < .0001], as well as a two-way interaction [F(3, 57) = 4.34, p = .008]. Again, performance was superior in the simultaneous task. On the majority of trials the participants remembered two items, and this was a higher proportion for the simultaneous-task trials. There were many fewer trials on which participants remembered either all four or zero items.

Remembered versus forgotten items

The electrophysiological data for remembered and forgotten items were analyzed separately for the sequential and simultaneous conditions (Fig. 3). One-sample, two-tailed t tests were performed using the index value at each electrode as compared to a test value of 0, with an alpha value of .05. A total of 182 electrodes reached significance using the data from the sequential condition, and a total of 115 electrodes reached significance using the data from the simultaneous condition. In both cases, larger amplitudes were evident for correctly than for incorrectly recalled items (Correct > Incorrect). This was the case regardless of electrode location or recall task. Across VWM tasks we observed some qualitative differences. The topographic maps show that the sequential condition had higher index values than the simultaneous condition. Yet, in the simultaneous condition some electrodes represented greater differences between correct and incorrect items. This is consistent with the behavioral data, in which, during the simultaneous condition, participants performed best when recalling the first item, but performance fell with each consecutive response, whereas during the sequential condition, participants performed more consistently across responses.

Fig. 3
figure 3

Index values and test statistics for all electrodes for both the simultaneous and sequential full-report recall conditions. Topographic maps (left column) represent index values for the simultaneous and sequential conditions. Positive index values (warm colors) represent higher amplitudes when an object was correctly recalled, and negative index values (cool colors) represent higher amplitudes when an object was incorrectly recalled. The scale represents index values ranging from – .2 (cool) to .2 (warm). Significant electrodes are represented by a bold black outline around the electrode. Paired t tests were performed on each electrode index, as compared to a test value of 0 at an alpha value of .05. Plotted in the right column are test statistics for each electrode, rank-ordered by p values from most to least significant. Importantly, the values plotted here are ranked in order of significance and not in order according to their identity or location on the scalp. The horizontal lines represent a significance threshold of t = 2.05, p = .05

Induced power at encoding is associated with memory load

To show the induced power related to VWM processes at encoding, we have plotted index values for the four frequency bands (delta, theta, alpha, and beta) on topographic heat maps for both retrieval tasks (sequential to the left and simultaneous to the right; Fig. 4). These topographic maps show index values comparing trials in which one or two items were correctly recalled with trials in which three or four items were correctly recalled. Additionally, we have plotted bar graphs of the test statistics, rank-ordered by significance (p value). The horizontal lines represent a threshold of significance of α = .05; the FDR-corrected p and q values are described below. For the sequential condition in the alpha band, 140 electrodes reached significance (FDR q value = .09); in the beta band, 112 electrodes reached significance (FDR q = .11), and 58 electrodes reached significance (FDR q = .22) in the theta band. In contrast, only two electrodes in the delta band reached significance (FDR q > .99). For the sequential condition there is a widespread pattern of greater activity across the alpha, theta, and beta bands for trials with more correctly remembered items. The pattern of corresponding results in the simultaneous condition is notably different from that in the sequential condition. In the simultaneous condition, no electrodes reached significance for the alpha band, and only 9, 23, and 11 electrodes reached significance for the delta, theta, and beta bands, respectively (FDR q > .99, q = .56, q > .99, respectively).

Fig. 4
figure 4

Power increases across the frequency bands on trials with more correct responses. Index values are plotted on topographic maps, and test statistics are plotted on bar graphs for all electrodes and the four spectral bands (delta, theta, alpha, beta), for both the simultaneous and sequential full-report recall conditions: (A) Data for the sequential condition. (B) Data for the simultaneous condition. Each row represents data from one of the four spectral frequency bands: delta, theta, alpha, and beta. Plotted on the topographic heat maps are index values comparing the amplitudes on trials in which participants remembered few (1–2) items correctly with those on trials in which participants remembered most (3–4) items correctly. Cool colors represent More Correct > Fewer Correct (3–4 > 1–2), and warm colors represent Fewer Correct > More Correct (1–2 > 3–4). The index values range from – .04 to .04. Electrodes that reached significance surpassed t = 2.09 (p = .05) and are shown in bold on the heat maps. The bar graphs plot the test statistics for each electrode ranked in order of significance (p value). The horizontal lines represent α = .05

Discussion

Our overall goal was to better understand VWM capacity limitations arising during encoding. We paired two full-report VWM tasks with a frequency-tagging hdEEG paradigm. This provided behavioral and neural responses specific to each stimulus item and improved upon our previous work using VWM recognition (Peterson et al., 2014). Consistent with these previous data, the frequency tag amplitudes at encoding were significantly higher for correctly retrieved items compared to items later forgotten. In these data, this meant that for a given stimulus array the frequency tag amplitudes varied as a function of later VWM accuracy. We also replicated and extended our earlier findings in low-density EEG data (Peterson et al., 2014) by showing that this frequency tag amplitude difference (Correct > Incorrect) is evident across the whole brain rather than being driven by a limited number of electrode sites. In a subsequent analysis examining induced power, we were able to demonstrate differences as a function of the full-report retrieval task. For the sequential task, there were significant differences in alpha, theta and beta frequency bands that were unmatched in the simultaneous task. These induced power observations were also associated with higher amplitude predominantly for trials with more correct responses compared to those with fewer correct answers (More Correct > Fewer Correct). In short, these data add to the evidence supporting a key role of encoding in determining VWM capacity limitations.

Our previous work had shown that frequency tags were significantly higher for probed and unprobed items on correct trials compared to incorrect trials. These results left open a mystery regarding whether the frequency tags of correct trials reflected greater resource availability or whether participants simply held the correct answer for each of the unprobed items as well as for the probed item. In other words, because we did not collect any accuracy information for the unprobed items, we could not draw conclusions about the allocation of resources to those items. The present data clarify this mystery by showing that the amplitude of the frequency tag is higher for correct items and significantly lower for incorrect items, consistent with our original data. The present data also show that encoding is not “all-or-none,” but rather, encoding resources can be unevenly distributed across individual items and those who receive more, are more likely to be subsequently remembered. A recent paper employed a full-report VWM design and measured neural activity using EEG (Mance et al., 2014). These authors were interested in VWM capacity limitations and, in particular, whether VWM capacity limitations could be attributed largely to attentional lapses, in which neural resources are allocated to either the entire stimulus set or none of the stimuli (Rouder et al., 2008), or to failures of attentional control, in which neural resources are only allocated to a subset of stimuli. Consistent with our results, their data indicate that VWM capacity limitations were primarily due to failures of attentional control rather than lapses. We also note that we presented stimuli for a long encoding duration (1,000 ms) to sufficiently power frequency tag analyses. This slow encoding duration is considerably longer than in most VWM studies (100–250 ms) (for a review, see Luck & Vogel, 2013). In other words, participants had more encoding time than in standard VWM tasks and they still generally could not remember all four stimuli.

Furthermore, Mance et al. (2014) included spectral power analyses of their EEG data, examining induced power in four frequency bands: delta, theta, beta, and alpha. They observed a relationship between high frontal theta, and reduced posterior alpha with successful WM performance. Additional studies have shown similar patterns of increased theta power along frontal and midline areas and/or decreased alpha power in more posterior regions as a result of increased attentional allocation, WM task demands, and WM load (Caplan et al., 2001; Deiber et al., 2007; Gevins et al., 1997; Jensen & Tesche, 2002; Kahana et al., 2001; Kahana et al., 1999; Klimesch et al., 1997; for a review, see Hsieh & Ranganath, 2014). However, several earlier studies show the opposite pattern, citing increased alpha power as WM load increased (Jensen et al., 2002; Klimesch et al., 1999; Krause et al., 1996). The induced power analysis presented here is consistent with this second set of findings. As we noted above, for the sequential full-report task, alpha band power (as well as theta and beta) was significantly stronger during trials when participants correctly recalled three or four items compared to one or two items. This increase in power is widespread. One important note is the lack of studies explicitly examining induced power during VWM encoding. Of the studies listed above only three explicitly examine the encoding phase (Jensen & Tesche, 2002; Klimesch et al., 1999; Krause et al., 1996) and the induced power analysis presented in these studies is consistent with our results: increased alpha band power during encoding as a function of number of items correctly recalled. More recently, studies have begun using alpha power suppression as a measure of WM capacity (Adam et al., 2015; de Vries et al., 2016; Erickson et al., 2017; Fukuda et al., 2016; Fukuda et al., 2015; Fukuda & Woodman, 2017). Overall, these studies report a decrease in alpha power during the VWM maintenance phase that corresponds to the number of items currently held in VWM. However, even within these studies, during the encoding stage, we see a clear increase in alpha power as a function of VWM load. One consistent result from these studies presented is different patterns of activity emerge during encoding, maintenance, and retrieval across alpha and theta bands as a function of task demands and VWM load.

The results presented here also demonstrate notable differences in the power analyses as a function of VWM retrieval task demands, as has been noted previously (Mance et al., 2014; for related patient work, see Berryhill et al., 2011). These results demonstrate that task demands at retrieval can have a measureable impact on how an item is encoded into WM. This is notable as it shows that, not only can encoding related processes effect how or if an item enters into WM, but expectations of how that item will be used also affect how the item is encoded. Importantly, the present study uses a different paradigm from the cited studies as it compares two different types of recall: simultaneous and sequential. Therefore, a discussion on why we might see differences in both the behavioral and neural data between the two recall tasks is warranted. One possible explanation is that participants may be using different strategies for each recall task. For example, perhaps participants hold in VWM the two left items and elect to ignore the two right items. This strategy would be more effective in the simultaneous because participants can immediately report the two items held in memory. However, in the sequential condition, in which response order is chosen randomly for the participant, this strategy would be less effective. This is because on approximately half the trials the participant must report the two left items last, leaving the representations of the two items subject to interference or decay over time.

For the participant to maintain high accuracy, it is advantageous for them to encode as many items as possible during the sequential condition. Thus, the differences in task demands may increase the like strategy used seem to be the cause of the differences seen in both the neural and behavioral data. In the behavioral data, we see patterns of results consistent with the use of two different encoding strategies. In the simultaneous condition, accuracy falls off in a linear fashion, where performance accuracy falls with the reporting of each additional item. Additionally, note that the accuracy for the first two items is fairly high but falls off significantly for the second two items. This is consistent with a strategy in which the participant remembers as many items as they can accurately recall. For example, a participant might only remember two or three items regardless of their position. Alternatively, in the sequential condition, accuracy is relatively high for the first item recalled, but lower for the last three items. Importantly, the accuracy for the last three items is almost identical. This is constant with a strategy in which the participant must try to remember all the items present. For example, because the participant does not know which items they will be probed on, they must try to remember all items equally. In the neural data, a similar pattern is found. If one assumes that during the sequential condition participants are attempting to remember all items instead of a subset of items, one would expect greater activity in the sequential compared to the simultaneous condition, which is seen in both the evoked and induced power analyses.

Importantly, one potential confound can emerge when examining the induced power as a function of “good” and “bad” trials, particularly in the simultaneous recall condition. For example, during the simultaneous condition, participants might encode two items strongly and two items weakly. These participants may use a strategy in which they report the weakly encoded items first to stave off forgetting. As a result, the participants may then forget the two remaining items that were initially strongly encoded. This may lead to the case in which, during the induced power analysis, trials may be sorted incorrectly as bad trials, despite the participant successfully encoding the first two items. This would then lead to a confound in the induced power analysis. The best way to correct for this potential confound would be to segment trials based on response order, and examine the frequency tags as a function of first, second, third, or fourth item recalled. However, this analysis cannot be conducted on the present dataset as there are few trials in which each specific frequency tagged item is chosen in each specific recall position (e.g., 3 Hz recalled first, second, third, or fourth). For example, trials must be segmented not only into groups based on frequency, but also further segmented into subgroups for each of four recall positions. In other words, each group used in the evoked analysis must be further broken down into four more groups, which leaves only around five trials in each group. More work is needed in the future in order to investigate this potential encoding strategy. These results should be considered for future research when interpreting effects of WM during maintenance. In combination, the insights provided by temporally sensitive evoked and induced EEG analyses are important in understanding the mechanism of VWM encoding and VWM in general.

In closing, we built upon past findings showing that encoding related activity contributes to WM capacity limitations (Gurariy et al., 2016; Peterson et al., 2014). Here, we show that within a given trial, frequency tags are larger for correctly recalled items than for incorrectly recalled items. This suggests that encoding related resources are allocated in a heterogeneous fashion across stimuli. Those stimuli that receive more resources are more likely to be later remembered. Furthermore, our induced-power results show that encoding related processes are clearly influenced by task-specific retrieval demands. Taken together our results demonstrate an important and complex role for encoding-related processes in overall VWM performance.