Attention for action in visual working memory

From the conception of Baddeley's visuospatial sketchpad, visual working memory and visual attention have been closely linked concepts. An attractive model has advocated unity of the two cognitive functions, with attention serving the active maintenance of sensory representations. However, empirical evidence from various paradigms and dependent measures has now firmly established an at least partial dissociation between visual attention and visual working memory maintenance - thus leaving unclear what the relationship between the two concepts is. Moreover, a focus on sensory storage has treated visual working memory as a reflection of the past, with attention as a limiting resource. This view ignores what storage is for: immediate or future action. We argue that rather than serving sensory storage, attention emerges from coupling relevant sensory and action representations within working memory. Importantly, this coupling is bidirectional: First, through recurrent feedback mechanisms, action coupling results in the enhancement of the appropriate sensory memory representation. Under this view, unattended memories are currently not coupled to an action plan, but are not necessarily lost and remain available for future tasks when necessary. Second, through the very same feedback projections, attention serves as the credit assignment mechanism for the action's outcome. When the action is successful, the associated representations are being reinforced, leading to more robust consolidation and more rapid retrieval in the future - thus explaining performance benefits for attended memories without assuming that attention serves as the maintenance mechanism. By firmly grounding VWM in the action system, the new framework integrates a range of behavioural and neurophysiological findings and avoids circularity in explaining the role of attention in working memory.


Introduction
Visual working memory (vWM) is the cognitive function that enables the temporary maintenance of visual information that is relevant for an ongoing or imminent task. Imagine assembling a piece of flat pack furniture. We first look in the manual at the image of the screw we need next, after which we momentarily retain that image to look for the same screw in the parts sachet. As goes for all types of working memory, vWM is highly flexible, as it can represent a range of different visual characteristics, each of which can be updated, replaced, recombined or forgotten according to changing task demands. For example, after you have driven in the screw, you move on to the next assembly step, for which you need to remember a different part.
Given the fleeting nature of vWM, theories have assumed that storage requires active maintenance. Attention has then often been invoked as the primary mechanism behind such maintenance. Attention is typically regarded as a limitedcapacity mechanism by which the brain enhances relevant sensory representations, by increasing neuronal activity. The idea is that attention can serve the same role within vWM, where it can sustain, or regularly rekindle, the neuronal firing pattern associated with memorized representations. As has been extensively reviewed elsewhere, visual attention and vWM appear to have a lot in common, in terms of functional properties, capacity, and the brain regions involved (Awh, Jonides, & Reuter-Lorenz, 1998;Awh, Vogel, & Oh, 2006;Chun, 2011;Cowan, 1998;Gazzaley & Nobre, 2012;Jonides et al., 2008;Kiyonaga, Egner, & Soto, 2012;Olivers, 2008;Theeuwes, Belopolsky, & Olivers, 2009b), which has led to proposals of close to complete unity of the two cognitive constructs, with attention being directed to either incoming stimuli ("external attention"), or to maintained sensory representations in the absence of such stimulation (i.e., "internal attention", e.g., Chun, 2011;Kiyonaga et al., 2012). There is clear evidence for shared mechanisms. For example, it has been shown that sensory information that matches vWM representations is preferentially processed, both in spatial and non-spatial (feature) domains (e.g., Awh & Jonides, 2001;Olivers, Meijer, & Theeuwes, 2006;Soto, Heinke, Humphreys, & Blanco, 2005). Furthermore, there is ample evidence that attending to specific items within vWM increases their chances of being correctly remembered, suggesting a central role of attention in their retention or protection (Gazzaley & Nobre, 2012;Souza & Oberauer, 2016).
However, proposals that suggest a complete unity of attention and vWM are not without problems (see also Oberauer, 2019). vWM often suffers little, if at all, from unrelated intervening tasks that appear to demand a good deal of visual attention (e.g., Hollingworth & Maxcey-Richard, 2013;Rerko, Souza, & Oberauer, 2014). Similarly, neural activity associated with memorized items may wax and wane without eliminating memory (e.g., Lewis-Peacock, Drysdale, Oberauer, & Postle, 2012;van Moorselaar et al., 2018;Watanabe & Funahashi, 2014). The pattern of findings implies that sustained attention is not a necessary requirement for successful vWM maintenance. This leaves the question as to what attention in vWM is then for, if not just a handy way of boosting a memory.
Moreover, treating attention as the mental spotlight that lights up and sustains the objects on the mnemonic canvas of the past ignores the fact that in everyday life we tend to use working memory for the future: Which goal do we have, which action will we perform, and which task will we do afterwards (something that , have referred to as "premembering"; see also Myers, Stokes, & Nobre, 2017). As the example of the flat pack furniture illustrates, we typically use working memory for a specific purpose, often in the form of some near-future action, whether of cognitive or motoric nature. The most common vWM paradigms do not tap into this aspect of future use. They treat vWM as a perceptual endeavor, in which an observer maintains a mental image of the sensory experience for a later report. Two of the dominant paradigms are delayed match-to-sample, which typically requires a yes/no response, and delayed continuous recall, where observers indicate on a continuous scale what the orientation or color of this memorandum was. Both types of test are typically treated as providing a neutral form of memory read-out. Attention manipulations then typically involve instructing or cueing observers on taskrelevant locations, objects, or features on the memory canvas. But here too it is worth pointing out that taskrelevance is ultimately determined by the desired outcome, i.e., the participant's response requirements.
Here we aim to provide an integrative theoretical framework that will do justice to the interplay between attention and vWM, while also accommodating the dissociations between attending and storage. In addition, the framework avoids some of the definitional conundrums and circularities by linking attention to clearly defined and neurophysiologically plausible mechanisms, rather than remaining at the level of conceptual labels. Especially within the behavioural vWM literature, more often than not attention is defined as the process that leads to improved performance. Vice versa, if performance is improved, researchers conclude that the item must have been attended. To avoid such circularity, we link attention to recurrent feedback mechanisms, with implications for both activity and plasticity. The core of the proposal is that action plans determine the role of attention in working memory: Instead of a purely perceptual process serving the maintenance of sensory activity, attention is the process e the emergent property e of linking sensory memory and action. Importantly, this link is bidirectional, as not only does successful sensory selection serve a chosen action, but a successful action also serves to strengthen the sensory representation that led up to it, to the benefit of subsequent performance. By integrating attention, working memory, action, and reinforcement learning, the framework can explain the paradoxical findings regarding the role of attention in vWM.

2.
Attention as emerging from sensoryaction coupling within working memory Central to the framework we present here is the tenet that attention reflects the coupling between sensory and action representations within vWM. The sheer intellectual capacities of the human mind may occasionally make us forget what c o r t e x 1 3 1 ( 2 0 2 0 ) 1 7 9 e1 9 4 cognition is for: Action. Our cognitive system has evolved because we are moving beings, interacting with our environment (Churchland, Ramachandran, & Sejnowski, 1994;Cisek & Kalaska, 2010;Clark, 2015;Clark & Chalmers, 1998;Engel, Maye, Kurthen, & K€ onig, 2013;O'Regan & No€ e, 2001). 1 Moreover, for a moving organism to successfully adapt, information flow must be bidirectional: Sensory and memory systems must map onto motor systems, and conversely, successful actions must reinforce the sensory and memory representations that led to them, so that the organism learns to achieve what it needs. Although none of these concepts is new, they only recently gained a foothold in the vWM literature and have so far been left largely unspecified (Myers, Chekroud, Stokes, & Nobre, 2018;van Ede, 2020). Our new contribution here is that this bidirectional coupling provides explanations for the paradoxical relationship between vWM and attention. Specifically, we claim that attention serves two purposes in vWM functioning, selection-for-action, and credit assignment in action-based learning, each of which in their own way lead to improved behavioural performance.

2.1.
Attention as selection-for-action Fig. 1 provides an illustration of how attention serves the reciprocal link between memory and action. First, attention reflects the selective coupling of a task-relevant sensory representation (whether from ongoing stimulation, or from existing memory traces) to a task-relevant action plan. Specifically, the link to action creates a long-range recurrent feedback loop which enhances the activity of the selected neuronal representations involved. Exactly how such links are forged (and especially how they can be forged so rapidly and flexibly that they can meet ad hoc changing task demands) is a question beyond the scope of the current paper, but such connections have been both successfully modelled (e.g., Dehaene & Naccache, 2001;Hamker, 2005;Rombouts, Bohte, & Roelfsema, 2015) and observed (e.g., Moore & Armstrong, 2003;Pooresmaeili, Poort, & Roelfsema, 2014;Siegel, Buschman, & Miller, 2015;Zhou & Desimone, 2011). It is important to define what is an action in this context. Here we stay close to what others have referred to as intention (e.g., Andersen & Cui, 2009), and define action as an intentional plan (or program) to initiate the goal-directed motion of an effector (cf. Cisek & Kalaska, 2010). By intentional we mean that the action plan is activated as part of the task and flexibly changes with changing task demands, while by goal-driven we mean that the action is directed towards an object or location. Note that the movement itself need not be executed. In fact, with sufficient experience, humans and potentially other primates may internalize certain action programs and apply them to mentally simulated objects and situations instead (Frick & M€ ohring, 2016;Frick, M€ ohring, & Newcombe, 2014;Funk, Brugger, & Wilkening, 2005;Grush, 2004;Ito, 2008;Jansen & Heil, 2010;Jeannerod, 2001;Lehmann, Quaiser-Pohl, & Jansen, 2014). Fig. 1 therefore also contains such "covert" actions.
Within the research field of visual attention, the idea of attention serving action has been recognized decades ago, but still has not become part and parcel of the mainstream discourse. Most notably, Allport (1987) proposed a "selectionfor-action" framework, arguing that perception is limited by attention not because the brain cannot deal with more information, but because selecting single objects is actually functionally useful given the inherently limited mapping onto bodily responses. Hommel (2009;Hommel, Mü sseler, Aschersleben, & Prinz, 2001) has proposed that selected sensory and action features are automatically bound within implicit memory, and that activation of an action may thus automatically invoke the associated visual feature. Earlier, Klein (1980) had already proposed his oculomotor readiness theory of attention, which states that spatial attention is in essence the consequence of the preparation of an eye movement, an idea echoed later in the premotor theory of attention (Rizzolatti, Riggio, Dascola, & Umilt a, 1987). Collectively, these existing theoretical frameworks state that preparing an action towards an object automatically results in perceptual enhancement of that object. Attention thus becomes an emergent property of sensory-action coupling. However, none of these theories say much about the role of attention and action in vWM. 2 Yet, for adaptive behavior, we must also be able to act on sensory information that is no longer there. Interestingly, there is a class of working memory models that does take action into account. These are network models that take storage as a given, and instead seek to explain the control mechanisms that allow the cognitive system to perform complex tasks, in particular the mechanisms that enable the selective encoding, updating, or removal of information from short term memory stores (O'Reilly & Frank, 2006;Todd, Niv, & Cohen, 2009). Central to these models are gating operations (similar to those in LSTM circuitries in artificial intelligence, Hochreiter & Schmidhuber, 1997), which provide dynamic access to working memories in a manner that depends on the activity in the rest of the network. These gates can shield memory items so that they do not interfere with an ongoing task and also determine whether information is remembered or forgotten. Interestingly, the Prefrontal cortex-Basal Ganglia Working Memory model (O'Reilly & Frank, 2006) sees such gating as internal, covert actions, served by the Basal Ganglia e a structure also closely involved in overt motor selection. While these models do not explicitly assign a role to attention, we propose that these gating operations are in fact intimately related to the proposed roles of attention in working memory: Attention is directed to those items in memory that are used for action control and that therefore will characterized by enhanced activity. Hence, attention can be seen as an emergent property of selecting an item for a subsequent operation. Furthermore, we argue that the recurrent activity between action selection mechanisms and vWM results in a continued strengthening of the relevant working memory representation, as is addressed next. 1 A classic but still particularly illustrative example is that of the sea squirt, which eats its own rudimentary eye, spinal cord, and brain when it transitions from its free-swimming larva stage to its stationary vegetative stage.

Attention as credit assignment
The second way in which attentional effects emerge within our framework is by an influence on plasticity: The very same feedback projections that link the action to the working memory representations, also act as the mechanism assigning credit or blame to the representation in working memory and the sensory neurons that gave rise to it, based on their contribution to the selection of the action. Hence, when a particular action is selected the entire chain from the sensory stimulus through working memory to the selected action becomes eligible for plasticity (Roelfsema & Holtmaat, 2018). This way, after an action has been successful, the associated representations are being reinforced, leading to faster selection, more robust memory, and more rapid retrieval in subsequent trials or task steps.
An advantage of action-based models over storage-based models is that correct actions can be rewarded while incorrect actions can be punished. In other words, such models enable learning. For example, the earlier-mentioned PBWM model assumes a dopamine-based learning system in which the basal ganglia are thought to learn the distinction between good and bad actions via dopamine or other neuromodulatory signals upon positive or negative reinforcement. Recent EEG evidence supports a coupling between working memory and reinforcement learning systems (Collins & Frank, 2018). However, such learning poses unique challenges. First, any neural network model has to solve a structural credit assignment problem, where the connections between the lower layers of a network need to be updated in accordance with their contribution to the outcome. Second, in working memory tasks, correct performance by definition relies on Here multiple items are represented in visual memory. We propose two key mechanisms: 1. Of the multiple representations in memory, the "attended" or "active" item (indicated by a black arrow in the display at the bottom) is currently subject to sensory-action coupling. This means that the item's neuronal activity is being enhanced through recurrent feedback loops established when coupling representations to overt (dotted loop) or covert (solid loop) action-related systems. Unattended, accessory memories are not coupled to current action programs, and are therefore reduced or silent in terms of activity. 2. Feedback activity additionally makes local synapses eligible for reward. These tagged synapses then interact with a global, neuromodulatory reward signals such as dopamine, strengthening the connections when the chosen action led to desired outcome (or weakening them when it did not). This way, attention serves as the credit assignment mechanism that makes the relevant synapses eligible for plasticity.
information that is no longer available in the outside world. This means that any agent also needs to solve what is known a temporal credit assignment problem: it has to relate the reinforcement to the selection of stimuli, their storage into working memory and actions which all occurred in the past. Although relatively simple for basic delayed match to sample tasks, credit assignment becomes far from trivial for more complex tasks involving multiple inputs and multiple memory operations. In artificial intelligence, and to some extent also in cognitive models like PBWM (O'Reilly, Frank, Hazy, & Watz, 2007), this is solved through omniscient learning algorithms that are instructed by an outside teacher, but this is no solution for biological agents, which often learn by trial and error, and thus rely on reinforcement learning instead. Somehow, biological systems need to find out which connections are eligible for updating based on the neuromodulators that signal reinforcement.
An elegant way in which a biologically plausible reinforcement learning scheme can solve both temporal and structural credit assignment in WM tasks is by assuming a key role for attention (Churchland et al., 1994). The idea is that while attention is the mechanism that enhances sensory and mnemonic representations so that they become available for action, it at the same time also makes those representations eligible for reward-based plasticity (Roelfsema & Holtmaat, 2018). In Fig. 1 this eligibility is illustrated by labelling the connections of the attended representation with yellow diamonds. These connections then interact with increasing or decreasing global neuromodulatory reward signals (as illustrated by the yellow cloud marked d). This means that if the action outcome was better than expected, the sensory and memory representations involved are selectively being reinforced. Vice versa, if the outcome was disappointing, these representations become weaker, thereby decreasing the probability that the action will be chosen again in the future. Note that functionally, this corresponds to the idea that attention strengthens the consolidation and binding of features in memory (Baddeley, 2000;Jolicoeur & Dell'Acqua, 1998;Ricker & Hardman, 2017;VanRullen, 2009), but the proposed framework provides a mechanism for such attentional benefits.
That such an attention-based credit assignment mechanism can be implemented as a neurophysiologically plausible and efficient strategy has been demonstrated in the Attentiongated Memory Tagging (AuGMEnT) model, a neural network architecture that learns WM tasks through reinforcement learning (Rombouts, Roelfsema, & Bohte, 2012;Rombouts et al., 2015; see also; Roelfsema & Ooyen, 2005;Roelfsema, van Ooyen, & Watanabe, 2010;and;Roelfsema & Holtmaat, 2018). In the model, information from sensory input layers and self-recurrent memory layers converges onto an output layer which represents the expected reward value for each of a set of possible actions. The model then typically selects the action with the highest expected reward (and it sometimes explores other actions). Importantly, through top-down feedback projections, chosen actions are linked back to representations at earlier processing levels that contributed to the action. These feedback projections e referred to as attention e 'tag' or 'label' those synapses that led to the response, making them eligible for interaction with global neuromodulatory signals, such as dopamine (Roelfsema & Holtmaat, 2018;Schultz, Dayan, & Montague, 1997). Thus, learning of specific sensory-action couplings occurs through the interaction between global neuromodulatory reward signals and specific attention-based feedback signals. As a result, only sensory neurons involved in the decision change their tuning, whereas the tuning of other neurons remains the same. The attentional feedback signal thus acts as the credit assignment mechanism. In its current form AuGMEnT can perform basic vWM tasks such as the saccade-antisaccade task, delayed match to sample and probabilistic inference. If the learning scheme is applied to complex classification tasks that are in the domain of deep artificial networks, its performance approximates that of state-of-the-art machine learning approaches (Pozzi, Boht e, & Roelfsema, 2018). Kruijne, Bohte, Roelfsema, and Olivers (2020) expanded the model with internal gating actions (related to those of the LSTM and PBWM architectures, Hochreiter & Schmidhuber, 1997;O'Reilly & Frank, 2006), which additionally allows for learning of more complex tasks involving selecting, ignoring, and updating information. The modified model, referred to as WorkMATe, can learn to execute the 12-AX continuous performance task (which requires context-dependent updating and ignoring of items) and ordered recognition tasks (in which memory items need to be gated in a certain order, Warden & Miller, 2010). Moreover, the new model generalizes its performance to novel stimuli, a hallmark of working memory. The success of models like PBWM, AuGMEnT, and WorkMATe provide further support for the idea of covert WM operations being closely tied to action systems.
Thus, combining action-like gating operations with attention-based credit assignment provides for powerful, trainable, yet flexible cognition. Importantly, it also leads to the improved vWM performance that has typically been attributed to attention, without having to invoke maintenance as its sole purpose. Instead, attended representations are stronger than unattended stimuli because of sensory-action coupling, which leads to 1) enhanced neural activity when such links are established through recurrent feedback, and 2) strengthened memory traces afterwards, when attention has guided plasticity-based learning.

Empirical evidence
The link to action provides attention with a clear and coherent functional purpose, in which it selects a representation for action, while at the same time it targets that representation for learning. This contrasts starkly with the classic conception of attention as a capacity-limited resource serving vWM maintenance. As we will review next, the framework carries the promise of integrating a diversity of empirical findings on the relationship between attention and working memory.

Grounding storage in action
As pointed out earlier, the idea that attention emerges from selection-for-action mechanisms is not new (Allport, 1987;Clark & Chalmers, 1998;Hommel, 2009;Rizzolatti et al., 1987), and there is now a large body of evidence from visual c o r t e x 1 3 1 ( 2 0 2 0 ) 1 7 9 e1 9 4 attention tasks that supports this view, especially within the spatial domain (e.g., Baldauf & Deubel, 2008;Craighero, Fadiga, Rizzolatti, & Umilt a, 1999;Deubel & Schneider, 1996;Fagioli, Hommel, & Schubotz, 2007;Kowler, Anderson, Dosher, & Blaser, 1995; though see Smith & Schenk, 2012, for a critical review). Many of these studies have shown that programming an eye movement or manual movement to a location invariably involves spatial attention to that location. Note that while classically conceived as a visual attention paradigm, spatial cueing often also involves a visual memory component as one has to remember the cued location. Perhaps not surprisingly then, visuospatial working memory tasks have also often found attention to be residing at remembered locations (Awh et al., 1998;Awh & Jonides, 2001;van Moorselaar et al., 2018). Going beyond that, there is evidence that visuospatial memory may be directly grounded in the oculomotor system, as observers tend to make eye movements to remembered positions even when location is irrelevant to the task (de Vries, van Driel, Karacaoglu, & Olivers, 2018;Postle, 2006;Theeuwes, Belopolsky, & Olivers, 2009a;Theeuwes, Kramer, & Irwin, 2011;Theeuwes, Olivers, & Chizk, 2005; van Ede, Chekroud, & Nobre, 2019). Not only eye movements are effective here, as visual memory also improves for an item when people make a hand movement to that item's original location (Hanning & Deubel, 2018;Heuer, Crawford, & Schub€ o, 2017). Also consistent with action-related coding of spatial WM content is the fact that spatial working memory is disrupted by various forms of concurrent motor activity (e.g., Ball, Pearson, & Smith, 2013;Farmer, Berman, & Fletcher, 1986;Hale, Myerson, Rhee, Weiss, & Abrams, 1996;Logie & Marchetti, 1991;Pearson, Ball, & Smith, 2014;Pearson & Sahraie, 2003;Quinn & Ralston, 1986;Smyth, Pearson, & Pendleton, 1988), as well as by progressive nuclear palsy, a degenerative disease that affects motoric functioning, including the oculomotor system (Smith & Archibald, 2020). Single cell recordings of neurons in posterior parietal cortex in monkey (especially the lateral intraparietal sulcus, LIP) further support the motor view, as they have revealed what appears to be a close coupling between memory and action planning. These cells exhibit persistent activity during the delay period of memory-guided saccade tasks, in which the monkey has to plan, but not yet execute, a saccade to a cued location. Such cells have therefore been thought to reflect the intention to perform an action (specifically the intention to execute a saccade, Andersen & Buneo, 2002;Andersen & Cui, 2009;Mazzoni, Bracewell, Barash, & Andersen, 1996;Quiroga, Snyder, Batista, Cui, & Andersen, 2006;Snyder, Batista, & Andersen, 1997). Others have taken a more general stance, proposing that LIP (in combination with prefrontal areas) serves as a universal spatial priority map, which may then either drive covert attention, spatial working memory, or different forms of overt action (Bisley & Goldberg, 2010;Ikkai & Curtis, 2011). In any case, it is clear that posterior parietal cortex is one of the sites where spatially tuned perceptual, mnemonic, and action-related mechanisms come together, serving sensorimotor transformation, as is consistent with overall dorsal stream functioning (Goodale & Milner, 1992;Gottlieb & Balan, 2010). Further evidence shows that LIP also encodes the expected value of the reward for a particular movement, suggesting that it integrates reward expectancy into the equation (Platt & Glimcher, 1999;Sugrue, Corrado, & Newsome, 2004;Yang & Shadlen, 2007). Interestingly, the representations that the earlier-mentioned AuGMEnT model forms in its memory layer after reinforcement learning on basic working memory tasks resemble the representations in LIP and PFC, which lends further support to its plausibility. As a final point, although the grounding of spatial memory in motor systems is well supported, relatively little is known about their role in the memory of non-spatial features, such as color or shape. We will return to this topic in Section 4.

Different states of VWM
Regarding attention as the process of making an item in working memory available for a particular action or operation resonates with the idea that items in working memory can adopt different states of activity, depending on whether they are "in the current focus of attention" (Cowan, 2011 van Loon et al. (2017) showed that items in working memory result in stronger attentional biases when relevant for the immediately upcoming task than when they are relevant for a prospective task and are only accessory with respect to the current task. Using a similar task structure, but combined with multivariate pattern classification of fMRI measures of brain activity, Lewis-Peacock et al. (2012) found that while representations remembered for the first of two tasks could be reconstructed from the brain activity, the representations relevant only for the second task could not e at least not until they became relevant again. These and other data have led to the idea that while items relevant to an ongoing task are stored in the current "focus of attention" (Cowan, 2011;Oberauer, 2002), accessory, or prospectively relevant items in vWM may be stored silently, through altered connectivity or responsivity rather than activity of neuronal ensembles (Larocque et al., 2014;Rose et al., 2016;Stokes, 2015;Wolff, Jochim, Akyü rek, & Stokes, 2017;Lundqvist et al., 2016;Sreenivasan, Curtis, & D'Esposito, 2014) although the evidence for this is not conclusive. Alternatively, such accessory memories may be maintained through still active, but relatively weak levels of neural firing (Kornblith et al., 2017;Schneegans & Bays, 2017;. Whether actively or silently represented, presumably such accessory representations are unattended, and as such "off the person's mind" (cf. Mashour, Roelfsema, Changeux, & Dehaene, 2020; Zylberberg, Dehaene, Roelfsema, & Sigman, c o r t e x 1 3 1 ( 2 0 2 0 ) 1 7 9 e1 9 4 2011). Yet recent evidence indicates that does not necessarily make them vulnerable. Additional brain networks may be recruited to support the memorized content (Bettencourt & Xu, 2016b;Christophel, Iamshchinina, Yan, Allefeld, & Haynes, 2018) or representations may be actively shielded by coding them in transformed or suppressed patterns (Kornblith et al., 2017;Peters et al., 2012;Rademaker, Chunharas, & Serences, 2019;van Loon et al., 2018;Yu & Postle, 2018  . This may occur for example because both sensory and attended mnemonic representations then make use of the same activity-based code, or because attention may momentarily make connections susceptible to overwriting (comparable to similar effects in episodic long term memory, e.g., Hupbach, Gomez, Hardt, & Nadel, 2007). In any case, at a functional level, the important conclusion is that memorized representations can be flexibly made available for, or separated from, the current stimulus-response mapping, without being actually lost from memory, and without becoming particularly vulnerable either, implying a dissociation between working memory storage and attention.
The proposed framework readily maps onto this division between active ("attended") and accessory ("unattended") memory states. Specifically, we argue that active vWM representations have a special status with regards to maintenance: They are memories for which there is a recurrent action link active, and therefore correspond to what is "on the mind" (cf. Mashour et al., 2020;Zylberberg et al., 2011). The recurrent activity accompanying the sensory-action coupling is then expressed as "attention", in that the relevant item drives both behavior (Olivers et al., 2006;Soto et al., 2005; though see; Houtkamp & Roelfsema, 2006) and is associated with extra neuronal activity (Desimone & Duncan, 1995;Gayet, Paffen, & Van der Stigchel, 2018;Lamme & Roelfsema, 2000;Roelfsema, 2006). The non-attended items are accessory and stored either as an engram of short-term connectivity changes, weak activity patterns, and are only revived to a status of higher neural firing rates when needed for the next action (whether overt or covert).

Dual task interference is more in the action
The idea that attention serves action and not maintenance may explain why vWM often suffers little from unrelated, yet visual attention-demanding tasks that observers are asked to do in between. Although vWM has been shown to suffer from interference (Morey, 2018), there are remarkable exceptions. For example, Hollingworth and Maxcey-Richard (2013) asked participants to remember a color for a change detection task, and additionally perform a reasonably taxing visual search for another feature during the delay period, something which would presumably draw attention away from the memory. Memory performance was nevertheless comparable to a condition without intervening search task. Similarly, Rerko et al. (2014) as well as Souza and Oberauer (2017) found little to no effect on visual memory for color even when observers were required to do attention-demanding color matching or brightness detection tasks during the delay period. In another recent study, Souza and colleagues failed to find detrimental effects of observers attending to distractor objects during the delay period (Souza, Czoschke, & Lange, 2020 (2014) found that memory-related activity of cells was greatly reduced during a visuo-spatial working memory delay period when the monkey was required to direct attention elsewhere in between. Activity then returned when the memory test became imminent, indicating that information was retained across the attentiondemanding task. This pattern matches with EEG data in humans (van Moorselaar et al., 2018). In another single cell recording study, Konecky et al. (2017) trained monkeys to remember multiple consecutively presented locations or colors, and found increased activity levels for any newly encoded and thus presumably attended item, but not for already stored items, again a pattern that is reminiscent of human data (Greene, Kennedy, & Soto, 2015).
So what does cause interference in vWM? For example, while Souza et al. (2020) failed to find detrimental effects of observers attending to distractor objects during the delay period, they did find interference when the intervening task required response generation. Similarly, in a comprehensive review, Morey (2018) found that vWM suffered just as much from non-visual interference as from visual interference, provided the secondary task involved response generation. And on the basis of a similarly comprehensive review of dual task interference in both the verbal and visual working memory literature, Oberauer (2019) came to the conclusion that "paying attention to an object does not require a resource per se e rather the process of controlling attention in a top-down manner consumes the limited resource" (p. 5; our italics), which then causes the interference. If we classify these control operations as internal actions (as also Oberauer does), then our framework is fully compatible with this account. Although more work needs to be done, the above at least suggests that attention is not necessary for maintenance per se. Instead, attention appears to be necessary for action coupling. As maintenance does benefit from attention, interference with such action coupling will result in performance decreases.

3.4.
Retrocueing as rewarded readiness Not surprisingly, memory encoding is better for attended stimuli than for unattended stimuli (see e.g., Chun & Turk-Browne, 2007, for a review). But memory also benefits from attention being directed only after the stimulus has already vanished, in what has become known as the retro-cueing paradigm (Gazzaley & Nobre, 2012;Griffin & Nobre, 2003;Landman, Spekreijse, & Lamme, 2003;Souza & Oberauer, 2016). In a typical retro-cueing task, observers commit multiple objects to memory, after which one of them is singled out by a cue. When the cued item is then probed at memory test, performance is typically better than for uncued items. Souza and Oberauer (2016) have listed a number of explanations for the retrocueing benefit that have been hypothesized over the last two decades, some of which assume a central role of attention in the refreshing, preventing decay, or otherwise retaining of cued memories. This would be consistent with the idea of attention serving the maintenance of cued items, at the expense of uncued items. However, several studies have found that while a memorandum benefits from being cued (suggesting an attentional benefit), this does not necessarily mean that the uncued items suffer (Gunseli et al., 2019;Gunseli, van Moorselaar, Meeter, & Olivers, 2015;Landman et al., 2003;Myers et al., 2018). For example, Gunseli et al. (2015) always found improvements for cued items, but whether or not uncued items suffered depended on the probability of being probed on those items. When probing probability for uncued items was relatively high (50%), uncued items did not suffer relative to a no cue baseline, while cued items retained an advantage, indicating that participants could simultaneously attend to the cued item and hold on to the uncued items. In addition, several studies have shown that uncued items may initially suffer, but that the quality of memory can subsequently be restored when attention is cued back to them, thus the information on uncued items is not actually lost (Murray, Nobre, Clark, Cravo, & Stokes, 2013;Rerko & Oberauer, 2013;van Moorselaar, Olivers, Theeuwes, Lamme, & Sligte, 2015). These findings confirm that although attention can contribute to maintenance, it is not needed for it. We argue here that a selection-for-action perspective provides an explanation for the role of attention in retrocueing. Previous work has often attributed the benefits of retro-cueing to attention per se, in that attention strengthens representations, makes them more robust against interference, and therefore more readily available for retrieval (see Souza & Oberauer, 2016 for a review). Our proposal is more specific in that it not only explains these consequences of attention, but also its purpose, namely the preparation of an action related to the cued item. Which action depends on the task requirements and may for instance be an eye movement, a button press, or turning a dial. The outcome may even be a covert action, such as using the cued item for a mental comparison or transformation (Albers, Kok, Toni, Dijkerman, & de Lange, 2013;Alivisatos & Petrides, 1997;Jordan, Heinze, Lutz, Kanowski, & J€ ancke, 2001;Richter et al., 2000;Vingerhoets, De Lange, Vandemaele, Deblaere, & Achten, 2002), or actively thinking about it (Souza, Rerko, & Oberauer, 2015). Under our framework, the cue then has two effects. First, the recurrent connections between sensory and action modules result in momentarily enhanced firing activity which broadcasts, and thus makes the cued representation available for action (Dehaene & Naccache, 2001). This lasts until the action or operation is completed. A remnant of this increased firing may then be a temporary increase in the efficacy of the connections, which result in better retrieval the second time the item is called for (typically at test). Second, in retro-cueing experiments the cued item is typically used for a response and its representation may thus undergo plasticity changes when the operation has successfully accomplished the intermediate task goal of selecting the to-be-used item and thus increased reward expectancy. This may lead to the consolidation of the memory trace, and thus more efficient retrieval at a later time point. These two effects combined explain not only the retrocueing benefit per se, but also the finding that once cued, representations remain robust despite attention being directed elsewhere in the meantime (Rerko & Oberauer, 2013;van Moorselaar et al., 2015). Thus, the enhancement in activity that accompanies the successful selection of a cued item reinforces its representation, resulting in better protection against interference, and more rapid retrieval when essentially the same operation has to be performed again at report.
Note that for an effect of retrocueing through action-based reinforcement learning, two conditions need to be met. First, such plasticity changes need to occur rapidly, after a single successful retrocue operation. This condition can be met, as short-term plasticity mechanisms (one shot Hebbian learning, or context-dependent meta-reinforcement learning mechanisms, Botvinick et al., 2019;Fiebig & Lansner, 2017;Sandberg, Tegn er, & Lansner, 2003;Sugase-Miyamoto, Liu, Wiener, Optican, & Richmond, 2008) have been demonstrated. Second, the internal, covert operation of selecting the cued item, is, in terms of mechanisms and systems involved, close to overt action. That is, the internal operation of selecting the cue, when successful, may be rewarding in itself, or because it increases the expectancy to obtain a later real reward, and triggers the same neuromodulatory signals as overt actions (cf. Seitz & Watanabe, 2005;Sutton & Barto, 1998). In support of such an action-based account of retrocueing, retrocues evoke activity in striatum and premotor cortex (Chatham, Frank, & Badre, 2014;Tamber-Rosenau, Esterman, Chiu, & Yantis, 2011), as well as motor-related signals in the EEG over sensorimotor cortex (Schneider, Barth, & Wascher, 2017).

Discussion and conclusions
We conclude that the bidirectional role of attention in linking visual memories and actions provides more explanatory power than regarding attention as the de facto mechanism of maintenance. Instead, better maintenance of an item is a consequence of temporarily enhanced recurrent activity due to an item being subject to a covert or overt operation, for as long as that operation lasts. In addition, attention acts as the credit assignment mechanism for reinforcement learning when the operation has either proven successful or disappointing, leading to a reward or punishment.

Relation to other theories
In addition to its connection to aforementioned (pre-)motor theories of attention (Allport, 1987;Klein, 1980;Rizzolatti et al., 1987), the proposed framework is consistent with a wider c o r t e x 1 3 1 ( 2 0 2 0 ) 1 7 9 e1 9 4 range of theoretical perspectives. For example, under the global workspace model, it has been proposed that attention is responsible for the widespread broadcasting of sensory information to response systems (among other), and that this process is modulated by reward systems (Dehaene & Naccache, 2001;Duncan, Humphreys, & Ward, 1997;Mashour et al., 2020;Roelfsema, 2006). Conversely, attention is known to gate learning e including the strengthening of relevant sensory representations (e.g., Ahissar & Hochstein, 1993, 2002Gilbert, Ito, Kapadia, & Westheimer, 2000;Raymond & O'Brien, 2009;Trabasso & Bower, 1975;Vartak, Jeurissen, Self, & Roelfsema, 2017). The idea that attention within vWM emerges as a consequence of implementing a reciprocal link between sensory representations and action programs also bears resemblance to the ideas of Krauzlis, Bollimunta, Arcizet, and Wang (2014), who proposed that attention is the consequence of a decision making process determining which of many competing internal states is allowed to dominate current behavior, where states include sensory inputs, current context, value-based learning, and motor plans. Importantly, such a system does not involve explicitly adding attention as a standalone mechanism; rather, attention-like behaviour is the result of the decision-making process and, hence, action selection. We agree with Krauzlis et al. that we "should be able to explain the functional circuits that give rise to the phenomenology of attention without using the word 'attention'" (p. 461), yet we also believe that the term attention is necessary if the aim is to relate theoretical frameworks to a wealth of experimental results in perceptual psychology. The term can be used without causing ambiguity if it is well defined. We here proposed that attention corresponds to the recruitment of feedback circuitries that link sensory and working memory representations to action, causing increased representation across the cortex (including subcortical structures Lamme & Roelfsema, 2000;Mashour et al., 2020;Roelfsema, 2006), and making these representations eligible for action-based learning.
There are also clear links to the embodied cognition view. According to this perspective, action-related machinery is recruited for the representation of working memory content (Ballard, Hayhoe, Pook, & Rao, 1997;Postle, 2006). However, while this view poses that working memory representations are not necessarily perceptual in nature, our framework still allows for the sensory nature of vWM. Instead, it proposes that such representations are boosted by the action coupling. That said, we note that this link between sensory and memory representations and action need only be transient, while the action recruitment may then last. If the stimuli and task allow it, the relevant information may then be fully transferred to motor programs, and the sensory code can be abandoned.

Future directions
We conclude that the bidirectional link to action ties together a number of different existing perspectives and phenomena on the role of attention in working memory. Importantly, it also raises a number of predictions compared to more standard accounts, in which attention serves to selectively enhance sensory representations.
First, an important question is whether there are forms of attention conceivable without a link to action. The strongest prediction from an action-based account is that there is no attention without such coupling. Note that a priori, this may be difficult to test directly. For one, it may prove difficult to measure a pure sensory attention effect without the involvement of action-related brain mechanisms and areas, as these have consistently shown considerable overlap. But even if we were to find a sensory attention effect without an accompanying action effect, we face the difficulty of interpreting a partial null result. As described earlier, there is considerable evidence for a close attention/action coupling in the spatial domain, where action preparation, attention and spatial working memory have been found to overlap in terms of behavior and neurophysiology (see also Heuer, Ohl, & Rolfs, 2020, for a recent review). The evidence for a similar overlap in the non-spatial, feature domain (color, orientation, shape, etc.) is much scarcer. There is evidence for action-driven feature enhancement in visual attention tasks such as visual search (e.g., Bekkering & Neggers, 2002;Freedman & Assad, 2006;Gutteling et al., 2015;Hommel, 2004;Humphreys et al., 2010;Mirabella et al., 2007;Moore, 1999;Moore & Armstrong, 2003). Moreover, there is evidence that merely paying attention to a memorized task rule results in the actual forging of the sensorimotor link (Gonz alez-García, Formica, Liefooghe, & Brass, 2020). However, still very little is known about the links between feature-based selection and action within vWM, also because techniques like retro-cueing have mostly used space as a cue. There is one study that manipulated both the features to be remembered (color or size), and the actions to be performed (grasping or pointing, . Interestingly, and consistent with the current proposal, memory for size improved when grasping was the associated action, while for color, pointing worked best. However, there was no manipulation or measurement of attention in this study, and the relationship between feature memory, attention, and action remains to be elucidated further. A recent EEG study by van Ede, Chekroud, Stokes, and Nobre (2019) may provide the strongest evidence yet for a feature-based sensorimotor coupling in vWM (see also Boettcher, Gresch, Nobre, & van Ede, 2020). In their study observers were first given two orientations to remember, followed by a non-spatial retrocue indicating which of the two would be tested. Decoding analyses revealed the cued sensory representation to become active, but virtually at the same time also the associated expected response. In conclusion, we certainly cannot exclude the possibility of purely sensory biasing effects within vWM, but so far the data are insufficient for a critical evaluation of the proposed vWM-attention relationship.
A related prediction is that working memory states ought to evolve when incoming information integrates towards changing action plans. Studies in the parietal (Calton, Dickinson, & Snyder, 2002;Snyder, Batista, & Andersen, 2000;Yang & Shadlen, 2007) and frontal cortex (Hoshi, Shima, & Tanji, 2000;Rao, Rainer, & Miller, 1997;Siegel et al., 2015) of monkeys revealed that the delay activity evolves when action planning requires the integration of pieces of information that are presented only sequentially. Whenever new information relevant to the action plan comes in, the c o r t e x 1 3 1 ( 2 0 2 0 ) 1 7 9 e1 9 4 memory state updates, causing changes in the set of neurons with persistent activity and their activity levels. It would be of interest for future studies in humans to also measure the dynamics of working memory states when used for different action outcomes, testing what such transformations in working memory look like and how they are implemented (see Bae & Luck, 2018; van Driel, Gunseli, Meeter, & Olivers, 2017;Fahrenfort, van Leeuwen, Foster, Awh, & Olivers, 2017, for evidence that memory test outcome modulates WM representations e although these test did not really vary action).
Another prediction from the action-coupling perspective of attention is that attending to an item in vWM should interfere with action selection and, vice versa, action selection should interfere with attention shifts within vWM. As we have laid out in section 3.3, it is noteworthy that vWM often suffers relatively little from interfering visual stimulation, (see also Andrade, Kemps, Werniers, May, & Szmalec, 2002;Bettencourt & Xu, 2016a;Rademaker et al., 2019). At the same time, adding internal operations or external actions appears to impair overall memory performance more (Morey, 2018;Oberauer, 2019). However, the extent to which these deficits reflect depletion of some general cognitive resources, or reflect specific interactions between action and attention remains to be determined, and more experiments are necessary.
Shifting our conceptual understanding of attention from a limited resource to an emergent property of sensory-action coupling may also provide insight into why attention is limited in the first place. We conjecture that processing limitations are not due some central resource depletion, but due to a limitation in the number of sensory-action links one can simultaneously forge or sustain. A large bulk of evidence suggests that for ad hoc (i.e., not overlearned) connections this is at most a handful, and may be fewer than vWM capacity itself (Oberauer, 2009;Pashler, 1994;Welford, 1952;Zylberberg, Ouellette, Sigman, & Roelfsema, 2012; though see; Gallivan et al., 2011). As such, the number of active connections can be seen as a resource, and the question then becomes why there is such a strong serial bottleneck in the number of sensory-action links. One compelling reason, as alluded to earlier, is that bodily actions themselves are simply limited (Allport, 1987), and the same therefore goes for associated internalized operations. We only have one pair of eyes that can be directed to only a single point in space, only one pair of hands, and only one mouth to speak with. An optimal system therefore consists of a perceptual system that detects, in parallel, as many relevant objects in as wide an area of visual space as possible, while this perceptual information is then effectively funneled into a limited number of effectors.
But regardless of the number of effectors, another reason for a serial bottleneck may be that it functionally serves action plans, rather than individual actions per se. Even in relatively simple tasks, actions are more often than not part of a sequential procedure. For example, one first needs to pick a spoon before it can be placed in a cup. And, in our flat pack furniture example, one simply needs to assemble one part before the next. Similarly, it has been argued that also internal operations such as those involved in conscious rational thought are implemented in analogy with a serial Turing machine (Zylberberg, Slezak, Roelfsema, Dehaene, & Sigman, 2010; Zylberberg et al., 2011; see also; Roelfsema, 2005). That is, complex tasks or thought trains, by their very nature, consist of a number of steps, or production rules, where one procedure needs to wait for the output from an earlier procedure. For example when computing (3 þ 9) * 12, one first needs to complete the addition and memorize the result before it can be used in the subsequent multiplication. The same holds for sensorimotor rules, for example "put the spoon in the cup that was selected". Here too the eventual application of the production rule will have to wait for the perceptual outcome. While each of the steps themselves may occur in parallel (e.g., here the selection of a cup and the detection of the spoon), and may even already start to activate their own associated actions in parallel (here an eye movement towards the cup and a hand movement toward the spoon), for coherent, adaptive behavior, one action will often have to wait for the outcome of another action. Specifically, according to Zylberberg et al.'s serial router model, selection is mediated by a competition between alternative productions, with the winning production rule being globally broadcasted to distant regions, where they can (i) trigger motor actions, (ii) change the state of working memory to initiate a new production rule starting from a different memory state, or (iii) activate and broadcast information that was in a 'latent' state for the later productions (such as sensory traces and synaptic memories). Note that this way, working memory can also be used as a medium for chaining multiple cognitive steps or for temporarily deferring decisions until a final production rule allows it to be converted into overt action (cf. Meyer & Kieras, 1997). The important conclusion is that the transitions from one production rule to the next occurs in a series not necessarily because of limitations, but because order is important. Embedding such procedures in action circuitries, which are inherently more serial than perceptual modules, may be the way in which the cognitive system imposes order upon itself.
At the level of neural implementation, important other questions remain. Specifically, we propose here that memoryrelated firing activity represents the recurrent feedback of an active sensory-action connection. Models like AuGMEnT (Rombouts et al., 2012;Rombouts et al., 2015) and WorkMATe (Kruijne et al., 2020) demonstrate that such sensory-action links can be plausibly implemented and learned through recurrent connections. However, these models are neutral about the coding of working memories without a current action link, which are outside the current "focus of attention". For these accessory or prospective memories, silent, synaptic coding is one possibility, as is persistent neural firing at a lower rate. It is even possible that whether a vWM representation counts as silent or not in neurophysiological terms is orthogonal to its influence on perception and action (Stokes, Muhle-Karbe, & Myers, 2020). Additionally, it remains to be established how these different states of representation are being controlled e that is, how the system flexibly juggles between active and inactive working memory states as the task progresses. Here too though, modeling endeavors indicate that this may be solved using action-based learning mechanisms (Kruijne et al., 2020).
Finally, as a field, we need to take steps at conceptual and methodological levels. Defining attention has always been a challenge. We believe that by defining attention as the mechanisms involved in the bidirectional link between c o r t e x 1 3 1 ( 2 0 2 0 ) 1 7 9 e1 9 4 sensory representations and action plans solves some of these problems, but also potentially creates new ones. For example, calling any cognitive operation with an attention-like effect on behavior (such as retro-cueing) an "internal" action, renders the account rather unfalsifiable. This requires clear criteria for when an operation is embedded in action-circuitries and when not. We believe neurophysiological measures will be helpful here, but also behavioral measures of mutual interference between attention and action (as addressed above). In any case, we will need clever paradigms that are geared more towards the prospective use of memory items, rather than reflecting the recent past.

Conclusions
The relationship between visual attention and vWM is more complicated than the field hoped for a decade or so ago, when evidence pointed towards models of functional unity. Paradoxically, attention clearly plays an important role in vWM, but it is not essential for the continuous maintenance of information. Instead of a sustained role in visual maintenance, the involvement of attention appears more momentary and dynamic, following task requirements. Here, we have argued that an action-oriented perspective provides a coherent explanation for the role of attention in vWM. Where previously attention has been regarded as the mental spotlight that serves to keep items alive on the mental canvas, according to the current view attention emerges from bridging perception and action, as sensory representations are made available for overt or covert operations, and at the same time become eligible for actionensuing reinforcement, resulting in performance benefits. The proposed framework thus provides an integrative perspective on attention, working memory, action and learning. At the same time, many important questions remain, which are exciting hallmarks of a burgeoning research field.