Our actions, including eye and hand movements, continually affect the sensory input we receive from the environment. Learning the regularities that connect our movements to their corresponding sensory outcomes allows the perceptual systems to discriminate self-caused sensory events from events caused by external sources (e.g., Hommel, Müsseler, Aschersleben, & Prinz, 2001; O’Regan & Noë, 2001; Shin, Proctor, & Capaldi, 2010). Indeed, immediately after performing an action, a stimulus that matches the learned action outcome typically elicits a weaker behavioral and neural response than does a physically identical stimulus that does not match the learned action outcome (e.g., Blakemore, Wolpert, & Frith, 1998; Bompas & O’Regan, 2006a; Cardoso-Leite, Mamassian, Schütz-Bosbach, & Waszak, 2010). This finding is described as sensory attenuation and has been explained in two different ways: first, in terms of inhibition of the predicted sensory response (Blakemore et al., 1998); second, and paradoxically, in terms of preactivation of the predicted sensory response (Waszak, Cardoso-Leite, & Hughes, 2012). Although these two accounts of sensory attenuation are inconsistent, they both explain a wide range of empirical phenomena.Footnote 1 Before describing the present study, we begin with a brief review of the phenomenon of sensory attenuation as it is characterized by the two accounts.

Empirical demonstrations of sensory attenuation in the visual domain have been reported by Cardoso-Leite et al. (2010) and Roussel, Hughes, and Waszak (2013), who had participants associate keypress actions with distinct visual action outcomes during an initial acquisition phase (e.g., Gabor stimuli with left or right orientation, respectively, linked to a left or right keypress). Next, in a test phase, keypress actions continued to produce sensory outcomes, although the outcomes were now either congruent or incongruent with the learned contingencies in the acquisition phase. An “action-congruent” stimulus, in the present context, is defined as an outcome whose feature matches the action–outcome association learned during the acquisition phase. In the test phase, participants reported the presence ( Cardoso-Leite et al., 2010) or the brightness level (Roussel et al., 2013) of stimuli as a function of action congruency. Both studies showed reduced visual sensitivity for action-congruent stimuli, relative to incongruent stimuli. Using a similar method, Pfister, Heinemann, Kiesel, Thomaschke, and Janczyk (2012) found that even preparing an action, without performing it, can interfere with the detection of learned action outcomes (see also Bompas & O’Regan, 2006a; Müsseler & Hommel, 1997; Stenner, Bauer, Sidarus, Heinze, Haggard, & Dolan, 2014).

The first account of sensory attenuation assumes that the predicted sensory action outcome is inhibited (Blakemore et al., 1998; Miall & Wolpert, 1996). According to this account, a comparator mechanism subtracts the actual sensory outcome from the predicted action outcome. If the actual outcome matches the internally generated prediction, its representation will be weaker. If, on the other hand, the stimulus does not match the internal prediction, its representation will be left intact (Fig. 1a). Psychophysical demonstrations of sensory attenuation generally fit the inhibition account (see, e.g., Cardoso-Leite et al., 2010). Furthermore, the event-related potentials associated with early auditory response (e.g., Bäß, Jacobsen, & Schröger, 2008) and early visual response (e.g., Kimura & Takeda, 2014; Roussel, Hughes, & Waszak, 2014) have been found to be weaker for a learned action outcome than for a stimulus that mismatched the action outcome (see also Hughes & Waszak, 2011). Again, these findings fit the inhibition account, which assumes that action–outcome associative learning enables the inhibition of anticipated action outcomes (Miall & Wolpert, 1996).

Fig. 1
figure 1

(a) According to the inhibition hypothesis, correct prediction of the cue feature reduces the signal strength caused by the cue, which should reduce the effects of both the valid and invalid cues in our paradigm. (b) According to the preactivation hypothesis, internal prediction increases the baseline activity—that is, the “noise level”—for the predicted feature, which should weaken the effect of invalid cues. Critically, the increased baseline activity also means that the representation of the predicted feature reaches peak activity faster than the representation of a nonpredicted feature, which should increase the effect of valid cues. From “A Preactivation Account of Sensory Attenuation,” by C. Roussel, G. Hughes, & F. Waszak, 2013, Neuropsychologia, 51, 922–929. Copyright 2013 by Elsevier. Reprinted with permission

The second account of sensory attenuation is based on internal preactivation of learned sensory outcomes. According to this account, planning an action is thought to increase the activity of the cells that represent the anticipated sensory outcome (Roussel et al., 2013; Waszak et al., 2012). This sensory preactivation, in turn, limits any further increase in the cell’s activity that is uniquely caused by external stimulation (Fig. 1b). Unlike the inhibition account, the preactivation account is not based on subtraction of two signals, but instead on the difficulty of detecting an external signal that is received during higher baseline activity. Analogously to Weber’s law, a stimulus that is encountered during preactivation evokes a weaker response, relative to a stimulus that is encountered against a lower baseline—that is, without preactivation.Footnote 2 Given that the preactivation account also predicts weaker stimulus representation for action outcomes, this account equally well fits the psychophysical observations of sensory attenuation (Bompas & O’Regan, 2006a; Cardoso-Leite et al., 2010; Roussel et al., 2013). The preactivation account can also be reconciled with attenuated electrophysiological response when we consider that stimulus-elicited neural response is measured with the assumption of equal baseline activity levels across conditions. This assumption is precisely what the preactivation account argues against; the baseline activity level is thought to increase for a learned action outcome, which in turn reduces the stimulus-elicited response. The attenuated electrophysiological responses could, therefore, reflect an increase in the baseline activity that is typically neglected in measurement.

Critically, the preactivation account makes the additional prediction that the representation of a sensory action outcome, although attenuated in strength, is formed faster due to the processing head start provided by preactivation (Hughes, Desantis, & Waszak, 2013). The inhibition account, by contrast, predicts no processing speed advantage for action outcomes relative to action-incongruent colors. In the present study, we employed a visual task in which the inhibition and preactivation accounts offer conflicting predictions based on their assumptions with regard to the processing speed of action-congruent and action-incongruent stimuli.

The visual attention task that we used included a salient spatial cue in which the cue colors were learned sensory outcomes of observers’ keypress actions. We should note that previous methods of investigation have typically associated action with a feature of the target that remains central to the task in the test phase. As such, they are primarily sensitive to the strength of stimulus representation (i.e., salience) and how it might change as a function of known action outcomes (e.g., Bompas & O’Regan, 2006a; Cardoso-Leite et al., 2010; Hughes & Waszak, 2011; Stenner et al., 2014). By contrast, by associating a cue feature with actions, the present paradigm affords sensitivity to both the salience of a representation and the speed with which that representation is formed (for comparable experimental designs, see, e.g., Gozli, Goodhew, Moskowitz, & Pratt, 2013; Gozli & Pratt, 2011; Kumar, Manjaly, & Sunny, 2015).

The sensitivity of the present paradigm to both salience and processing rate rests on the fact that we can investigate the effects of cues when they are spatially valid (indicating target location) and when they are spatially invalid (indicating a distractor location). With invalid cues, we can assess feature salience, because the lower salience of a cue feature allows faster attentional disengagement from the cue (Theeuwes, 2010). On the other hand, with valid cues we can assess processing rate, because a faster processing rate of a cue feature allows faster initial selection of the valid cue (cf. Bundesen, 1990). The inhibition account predicts that action-congruent colors will have lower salience than incongruent colors, although this account does not make a clear prediction about variations in processing speed. If anything, the inhibition process would more likely decrease the speed with which the representation of action-congruent cues reached peak activity level. If so, relative to action-incongruent cues, action-congruent cues would be easier to disengage from when invalid, and they would be selected more slowly when valid. By contrast, the preactivation account holds that action-congruent colors would have lower salience, but higher processing rates, than action-incongruent cues. Therefore, this account predicts that action-congruent cues would be easier to disengage from when invalid, and they would be selected faster when valid (Hughes et al., 2013; Waszak et al., 2012).

Similar to previous work on sensory attenuation, the present experiment consisted of an acquisition phase and a test phase. During the acquisition phase, participants’ keypresses determined the color of the cue (one key producing red, the other producing green), whereas the cue location was randomly selected from a set of four placeholders. Next, in a test phase, the same keys were followed by a cue and a search display (Fig. 2). In the test phase, if a red cue were to appear after pressing the “red” key (i.e., the key that had consistently produced red cues during acquisition), the cue color would be regarded as action-congruent, whereas it would be regarded as action-incongruent if it appeared after pressing the “green” key. Unlike in previous studies, in addition to manipulating whether the cue color was congruent with the learned action–outcome associations, we also manipulated the spatial validity of the cues. For one group, the cue was always invalid (indicating a distractor location), whereas for the other group, the cue was always valid (indicating the target location).

Fig. 2
figure 2

Sequence of events in a sample trial of the test phase in the “invalid-cue” condition

In summary, the inhibition account and the preactivation account both predict a smaller cost for action-congruent cues in the invalid-cue condition, but they make opposite predictions in the valid-cue condition. With valid cues, the inhibition account predicts smaller cueing with the action-congruent color, whereas the preactivation account predicts greater cueing with the action-congruent color, due to the speeded processing caused by feature preactivation.

Method

Participants

Thirty-six University of Toronto undergraduate students participated in the experiment in exchange for course credit (18 per condition). They all reported normal or corrected-to-normal vision and were unaware of the purpose of the study. All experimental protocols were approved by the Research Ethics Board of the University of Toronto.

Apparatus and stimuli

Participants performed the task in dimly lit rooms. Stimuli were presented on 19-in. CRT monitors set at a 1,024 × 768 resolution and 85-Hz refresh rate. Using a head- and chinrest, participants’ distance from the display was fixed at approximately 45 cm. The display structure and a sample sequence of events are shown in Fig. 2. Except for the color cues, all stimuli were presented in white (CIE XYZ = 33.60, 26.59, 93.80) against a black (XYZ = 0, 0, 0) background. The cue colors were red (XYZ = 41.24, 21.26, 1.93) and green (XYZ = 6.72, 13.43, 2.24). The placeholders were four squares (2.4° × 2.4°; frame width = 0.16°) that appeared above, below, to the left, and to the right of the display center (distance from center = 8°). When a placeholder turned into a color cue, its frame width increased to 0.24° of visual angle. The target was a tilted line (“\” vs. “/”; length = 1.4°, width = 0.1°) that would appear inside one placeholder. Each distractor was a letter “X” that appeared in a nontarget placeholder. Participants performed two types of responses. The responses that produced visual effects were performed using the index and middle fingers of the left hand and the “A” and “Q” buttons on the keyboard. We associated left-hand responses with action outcomes on the basis of prior research that had suggested that action–outcome associative learning may be stronger for left-hand responses (Melcher, Weidema, Eenshuistra, Hommel, & Gruber, 2008; Melcher et al., 2013). The search responses were performed with the right hand, using the left and right arrow keys (in response to the “\” and “/” targets, respectively).

Acquisition phase

Each trial of the acquisition phase began with the presentation of the fixation cross and the four placeholders. After a random delay of 1,000–1,500 ms, the fixation cross flickered (i.e., it disappeared for 100 ms and then reappeared). We instructed participants to press either the “Q” or the “A” key upon noticing the flicker. Moreover, we instructed them to make their selection spontaneously, to try to avoid patterns, and to try to select the two keys equally frequently. As soon as a keypress was recorded, the color cue appeared at one of the placeholders. The color of the cue was determined by the response (red or green after “Q” or “A,” respectively). The location of the cue was randomly chosen, since any of the four placeholders was equally likely be the cue location. The cue remained on display for 300 ms, after which the next trial began. If participants pressed a key other than “Q” or “A,” or if they pressed more than one key, they received visual feedback (“MISTAKE!”). If the response was faster than 100 ms, they also received visual feedback (“TOO FAST!”). No color cue was presented on error trials. The search task was not included in this phase.

Test phase

Similar to the acquisition phase, every test trial began with the presentation of the fixation cross and the four placeholders (Fig. 2). Participants again performed their first, left-hand response when they noticed the fixation mark flicker. There were three equiprobable types of trials, based on the color cue. The cue could be congruent or incongruent with the response–outcome associations during the acquisition phase, or it could be absent (i.e., no cue/action outcome). After a 100-ms delay, the cue was followed by the appearance of the search items in the placeholders. The cue and the search display remained on screen until a response was recorded. For the search display, we instructed participants to find the tilted line among the distractors (“X”s) and to identify the target tilt using the left/right arrow keys. Upon pressing an incorrect key, or pressing more than one key, participants received visual feedback (“MISTAKE!”). Finally, if the first response was a mistake, participants received visual feedback and no search display or cue was presented.

Design

Participants were randomly assigned to the valid-cue or invalid-cue condition. The two conditions had the same acquisition phase. In the test phase, the cue was either valid or invalid. An invalid cue never coincided with the target location and had to be ignored. A valid cue always coincided with the target location and, thus, had to be selected. Each participant completed 200 trials in the acquisition phase and 128 trials in the test phase. Each phase was preceded by 15 practice trials.

Results

Acquisition

Before calculating the mean response times (RTs), we excluded error trials and trials in which the RT fell 2.5 SDs beyond the total mean. The mean RT and percentage of errors in the acquisition phase were 379 (SE = 63 ms) and 3.0% (SE = 0.5%). Furthermore, participants selected the two keys with equal frequencies (51% and 49%, respectively, for the “Q” and “A” keys), t(35) = 1.71, SE = 0.02, p = .10.

Test

For the first voluntary keypress made with the left hand, the mean RT and percentage of errors in the test phase were 352 ms (SE = 59) and 2.6% (SE = 0.4%). Furthermore, participants continued to select both keys, although they slightly favored the “A” key, corresponding to the green cue, over the “Q” key, corresponding to the red cue (44.8% and 55.2%), t(35) = 2.16, SE = 0.05, p = .038.

For the search task, the mean RTs were submitted to a 3 × 2 mixed analysis of variance (ANOVA) with Cue Color (absent, congruent, or incongruent) as the within-subjects factor and Cue Validity (valid vs. invalid) as the between-subjects factor (Fig. 3). This analysis revealed a marginal main effect of cue validity [F(1, 34) = 3.90, p = .056, η p 2 = .103], a main effect of cue color [F[(2, 68) = 9.28, p < .001, η p 2 = .214], and a two-way interaction [F(2, 68) = 27.65, p < .001, η p 2 = .449]. The interaction indicated the rather trivial fact that the cueing effect depended on the cue validity. That is, when comparing cue-present trials (including both cue colors; i.e., two thirds of the trials) with cue-absent trials (the remaining one third), we found a 41-ms benefit for valid cues (Cohen’s d = 1.43) and a 17-ms cost with invalid cues (Cohen’s d = 0.72).

Fig. 3
figure 3

Response time (RT) data, graphed as a function of cue color (absent, action-congruent, or action-incongruent) and cue validity (valid vs. invalid). The percentage of errors for each condition is presented at the base of the corresponding bar graph. Error bars represent 95% within-subjects confidence intervals

Since our primary interest was in the difference between action-congruent and action-incongruent cue colors, we submitted the RT data to a 2 × 2 ANOVA, leaving out the absent trials. This analysis revealed a significant main effect of cue color [F(1, 34) = 12.43, p = .001, η p 2 = .268]. Regardless of the cue validity, an action-congruent cue color (548 ± 21 ms) led to faster responses, relative to an action-incongruent cue color (M ± SE = 562 ± 22 ms; Cohen’s d = 0.59, for the main effect of cue color). Most importantly, we found no two-way interaction [F(1, 34) = 0.008, p = .931, η p 2 < .001], showing that congruent colors conferred a benefit both when they were valid and when they were invalid. Finally, we also found a main effect of cue validity [F(1, 34) = 6.68, p = .014, η p 2 = .164], based on faster responses with valid cues (512 ± 28 ms) than with invalid cues (597 ± 19 ms).

We submitted the mean percentages of errors from the search task to the same 3 × 2 ANOVA, which did not reveal a significant effect of cue color [F(2, 68) = 0.30] or cue validity [F(1, 34) = 1.59, p = .21, η p 2 = .045], nor a two-way interaction [F(2, 68) = 0.29]. Error rates did not differ significantly across trials with congruent (5.1% ± 1.8%) and incongruent (4.5% ± 1.1%, p = .50) cues, inconsistent with the possibility that the RT difference was a speed–accuracy trade-off.

Discussion

Sensory events that are self-caused typically evoke an attenuated response, as compared to physically identical sensory events that are caused by external sources (e.g., Blakemore et al., 1998; Cardoso-Leite et al., 2010; Hughes et al., 2013). In the present study, we investigated two accounts of sensory attenuation (inhibition vs. preactivation) using a visuospatial task, in which the color of a salient cue could be linked to participants’ own actions. In an initial acquisition phase, participants learned that red and green cues consistently resulted from their own keypress actions. In the subsequent test phase, the cues continued to result from participants’ actions, but their colors could be congruent or incongruent with the action–outcome contingencies learned in the acquisition phase. We found that action-congruent cues were easier to ignore in the invalid-cue condition, consistent with both the inhibition account and the preactivation account. Importantly, we found action-congruent cues to be more effective in the valid-cue condition, suggesting that self-caused features can be selected faster. This finding is consistent with the preactivation account, according to which self-caused features receive a processing head start that speeds their processing, despite reducing their salience.

Our findings are consistent with a recent report by Desantis, Roussel, and Waszak (2014). In their study, participants learned to associate their actions with motion signals of a particular direction (Key 1 and Key 2, respectively causing upward and downward motion). After the acquisition phase, participants pressed the same keys and had to discriminate motion directions at a 75% discrimination threshold. The motion direction during the test phase could be congruent or incongruent with what participants learned during the acquisition phase. The results showed enhanced performance with action-congruent motion stimuli, as compared to action-incongruent stimuli. We reasoned, in agreement with Desantis et al., that although actions attenuate responses to sensory action outcomes, whether the action outcome is at a perceptual advantage or disadvantage ultimately depends on the nature of the task. Desantis et al. made the crucial point that the concept “signal” takes different meanings in detection and discrimination tasks. In a two-alternative forced choice discrimination task, for instance, each perceptual judgment is the result of a competition between two possible sensory states. Therefore, the advantage of each signal is defined in relation to the competing alternative signal. In this context, preactivation would provide the action-congruent sensory state a competitive advantage over the alternative state. By contrast, in a detection task, in which each perceptual judgment depends on the unique contribution of the external stimulus, preactivation would put the action-congruent stimulus at a disadvantage by reducing the unique contribution of the stimulus (see also Roussel et al., 2013). Applying a similar logic to the present study, we argue that action-congruent cues had an initial competitive advantage in visual selection, provided by internal preactivation. This initial advantage increased the benefit of valid cues. The same preactivation, however, also reduced the salience of action-congruent cues, allowing for rapid disengagement in the invalid-cue condition. Thus, whether feature preactivation confers an advantage or disadvantage depends on the specific characteristics of each task.

Our proposal, which is based on variations in both salience and processing speed, should be considered against alternative proposals that are based on either salience or processing speed alone.Footnote 3 One could argue that an increase in cue salience (due to action congruency), which leads to faster initial selection in the valid-cue condition, also leads to faster disengagement in the invalid-cue condition. In other words, faster initial selection enables faster disengagement. This would only be true if we could assume constancy in attentional dwell times despite variations in salience. This assumption is not easily justified, given that an increase in salience often leads to faster (and more probable) initial selection and slower disengagement, both of which indicate longer attentional dwell times for more salient items than for less salient items (e.g., Anderson, Laurent, & Yantis, 2011; Belopolsky, Schreij, & Theeuwes, 2010; Duncan, Ward, & Shapiro, 1994; Theeuwes, 2010). For this reason, the faster performance in the invalid-cue condition of our experiment cannot be easily explained if we rely solely on the assumption of increased cue salience. Alternatively, one might leave out the consideration of salience and consider only the faster processing for action-congruent cues, which would again fit with faster initial selection and faster disengagement. Although focusing on the overall performance improvement would connect the present finding with the literature on action-induced visual facilitation (e.g., Craighero, Bello, Fadiga, & Rizzolatti, 2002; Lindemann & Bekkering, 2009; Linnell, Humphreys, McIntyre, Laitinen, & Wing, 2005), it would fail to explain the underlying cause of the increase in processing speed in the present context. We would have to evoke an underlying explanation, such as the preactivation account, which would not be silent about changes in salience. Not considering changes in cue salience would also disregard previous reports of sensory attenuation found using similar designs (e.g., Bompas & O’Regan, 2006a, 2006b; Cardoso-Leite et al., 2010; Pfister et al., 2012; Roussel et al., 2013). Given that our experiment was motivated by the repeated demonstration of sensory attenuation (i.e., reduced salience) found with very similar acquisition phases, we cannot disregard variations in salience in our explanation.

The role of action in shaping perception has gained increased recognition (e.g., Hommel, 2009; O’Regan & Noë, 2001; Thomaschke, Hopkins, & Miall, 2012; for a review, see Herwig, 2015; Pratt, Taylor, & Gozli, in press). Actions change the sensory input we receive from the environment, and it is reasonable to suppose that the perceptual system takes advantage of the regularities that connect our actions with their corresponding sensory outcomes (e.g., Elsner & Hommel, 2001; Hommel, 2004; Hommel et al., 2001; O’Regan & Noë, 2001; Shin et al., 2010). Indeed, evidence suggests that preparing an action not only involves activity in motor–cortical areas, but also activity in sensory areas that underlie the expected outcome of the action (e.g., Gutteling et al., 2015). In a study by Kühn, Seurinck, Fias, and Waszak (2010), participants learned that their keypress actions produced house or face images. After learning the action–outcome associations, participants’ actions alone activated the sensory regions corresponding to visual processing of faces or houses. In a similar paradigm, Hughes and Waszak (2014) found modulated activity in posterior visual areas, prior to action execution, depending on whether participants selected a face- or a house-generating action. Thus, sensory brain regions are involved in the coding of actions, through the learned sensory outcomes of the actions.

It is important to consider whether actions are unique in their ability to modulate the effect of visual stimuli (see, e.g., Gozli, Moskowitz, & Pratt, 2014). Is it the intention to perform an action that results in preactivation of the color red, or is it, for instance, the tactile sensation associated with the action that causes the preactivation? An attempt to reduce the effect of actions to the sensory components of the actions remains consistent with the notion that sensory anticipation is essential to action representation (Hommel et al., 2001). According to such a view, the learned sensory outcomes of an action, collectively, constitute the representation of the action (Hommel et al., 2001; Shin et al., 2010). In other words, the features of an action are not inherently distinct from the features of perceptual events. Thus, to ask whether actions are unique in their ability to serve as a source of visual bias goes against this fundamental assumption, by restoring the strong distinction between action- and perception-related features.

The idea that an action is represented in terms of its collective sensory outcomes is consistent with Hoffmann, Lenhard, Sebald, and Pfister (2009). In their study, participants learned that keypress actions were associated with two distinct auditory outcomes. After this acquisition phase, auditory stimuli can generate a bias in selecting their corresponding actions. In order to examine the action components, Hoffmann et al. tested whether the auditory signals were associated with a finger movement (regardless of key) or with pressing a specific key (regardless of finger). Interestingly, neither the finger movements nor the keypresses alone evoked the auditory associations. It was only when the finger–key combinations were preserved that action selection was sensitive to concurrent auditory stimuli. In other words, dissecting the action into components eliminated the action–outcome associations.

It is, furthermore, worth considering whether the same modulated biases could be observed without the involvement of actions. Indeed, in agreement with Waszak et al. (2012), we propose that the action-driven effects in the present study are similar to the effects of repeated exposure to the same feature (e.g., Awh, Belopolsky, & Theeuwes, 2012). As in our findings, ignoring a consistently invalid peripheral cue has been shown to be more efficient if the cue color repeats across trials (cue color being a task-irrelevant feature), whereas selecting a consistently valid cue is also more efficient if the cue color repeats across trials (e.g., Pinto, Olivers, & Theeuwes, 2005; Vatterott & Vecera, 2012). Waszak et al. (2012) attributed the efficiency that comes with repeated exposure to a feature to sensory preactivation, suggesting that preactivation is not unique to action performance.

In conclusion, we argue that actions function as a source of visual bias by virtue of generating sensory preactivation of known action outcomes. In line with sensorimotor accounts of vision, our study suggests that the visual system is sensitive to learned associations that connect dynamic features of the action systems to sensory outcomes.