The attention system tends to avoid repetitively examining previously attended (old) object features, in order to potentially increase the efficiency of visual search (Klein, 2000; Posner & Cohen, 1984). Spatially, this mechanism manifests in Posner’s exogenous spatial-cuing task, in which a peripheral cue is first presented to attract spatial attention to the cued location (Posner & Cohen, 1984; Posner, Rafal, Choate, & Vaughan, 1985). Detection of a target appearing immediately at the cued location, as compared to detection of a target at an uncued location, will be both faster and more accurate—that is, a facilitatory effect will occur. However, if the cue–target stimulus onset asynchrony (SOA) is longer than 300 ms and the cue is not informative with regard to target location, detection of the target at the cued location will be delayed, as compared to detection of the target at an uncued location. The latter inhibitory effect is termed spatial inhibition of return (IOR).

Such repetition-induced inhibition exists not only in the spatial domain but also in nonspatial domains in which responses to old (repeated) nonspatial object features (e.g., color, shape, or line length) are delayed (Fox & de Fockert, 2001; Francis & Milliken, 2003; Grison, Paul, Kessler, & Tipper, 2005; Law, Pratt, & Abrams, 1995; Riggio, Patteri, & Umiltà, 2004; Tipper, Grison, & Kessler, 2003). For example, in a “prime–neutral-cue–target” paradigm, Law et al. presented three consecutive color patches at the same, central location and asked healthy adults to detect the onset of the third color patch. The first color patch (red or blue) served as a prime, which could have either the same color as or a different color from the third color patch (i.e., the target, red or blue). The intervening color patch had a color different from those of both the prime and the target. Detection response times (RTs) to the target were slower when the prime and the target had the same color than when they were of different colors (Law et al., 1995). This nonspatial inhibitory effect disappeared when the neutral cue was not presented (Law et al., 1995), and it turned into a facilitatory effect if the target was to be discriminated unless an appropriate neutral cue was presented between the prime and the target (Fuentes, Vivas, & Humphreys, 1999; Spadaro, He, & Milliken, 2012). For example, in a color-based “target–target” paradigm of repetition inhibition, Spadaro et al. presented two consecutive targets, either of which could be blue or yellow and each of which required a discriminative response with regard to the color. The repetition effect could be either facilitatory or inhibitory, depending on the presence of an intervening stimulus between the two targets. When there was no intervening stimulus, responses to a target that had the same (cued) color as the preceding target were faster than responses to a target that had a different (uncued) color—that is, a facilitatory effect. However, when there was an intervening stimulus, which was uninformative with regard to the color of the following target and required a response, the facilitatory effect turned into an inhibitory effect: Responses to a target with the same (cued) color as the preceding target were slower than responses to a target with a different (uncued) color (see also Fuentes et al., 1999, who used a prime–neutral-cue–target paradigm). Together, the results above suggested that the presence of the intervening neutral cue is potentially critical for nonspatial repetition inhibition to occur, especially in more complicated discrimination tasks.

To explain the repetition inhibition, an episodic-retrieval account was proposed (Lupiáñez & Milliken, 1999; Lupiáñez, Milliken, Solano, Weaver, & Tipper, 2001; Milliken, Tipper, Houghton, & Lupiáñez, 2000). According to this account, the onset of a prime is coded as an episodic representation or object file (Kahneman, Treisman, & Gibbs, 1992), and the subsequent onset of the target triggers the retrieval of the previous prime representation. The properties of the target will then be compared with those of the prime. If there is no match (uncued), a new episodic representation will be created for the target. On the contrary, if there is a match between the prime and the target (cued), the target will be integrated with the existed episodic representation of the prime. The latter integration process makes the target lose the benefits of novelty, causing a detection cost. An intervening event between the prime and the target further makes the integration process lose the beneficial effect of retrieval, resulting in an overall cost (see Lupiáñez, 2010, for a full discussion of this account). Furthermore, Grison and colleagues (Grison et al., 2005; Tipper et al., 2003) added inhibitory mechanisms to the episodic-retrieval process: If an intervening stimulus is presented between the prime and the target, a new episodic representation has to be created for it, making the old episodic representation of the prime be tagged for inhibition (Grison et al., 2005; Tipper et al., 2003). Thus, the onset of the target, which shares object features with the prime, triggers the retrieval of the representation of the prime together with the inhibitory tag, and accordingly slows down responses to the repeated targets (Neill, 1997; Zhou & Chen, 2008).

Spatially, location-based repetition inhibition occurs not only within a sensory modality, but also across different modalities (Spence & Driver, 1998a, 1998b). For example, in an audiovisual spatial-cueing paradigm, Spence and Driver (1998b) presented a spatially uninformative peripheral auditory cue prior to a visual target that required a speeded detection response. A central reorienting cue was presented between the cue and the target. The central cue could be either visual or auditory. Responses to visual targets that appeared at the same (cued) location as the peripheral auditory cue were significantly slower than responses to visual targets at the opposite (uncued) location from the auditory cue only when the central cue was auditory, but not when the central cue was visual or when there was no central cue at all. On the basis of these results, the researchers suggested that the conditional occurrence of cross-modal inhibition was due to the fact that auditory sounds were particularly effective at shifting cross-modal spatial attention (Spence & Driver, 1997), and they claimed that the presentation of an appropriate central reorienting cue played a crucial role in attracting attention away from the auditory cue (Spence & Driver, 1998b).

Although it has been well documented that spatial repetition inhibition occurs cross-modally, it remains unclear whether and how nonspatial repetition inhibition occurs across sensory modalities. In order to investigate these issues, we incorporated nonspatial repetition inhibition in a cross-modal repetition-priming paradigm, with an intervening neutral cue being presented between the prime and the target (see Fig. 1). The three consecutive stimuli (prime, neutral cue, and target) were from the same semantic category (i.e., color), and a two-choice disrimination task was performed on the target. The neutral cue always referred to a semantic identity different from both the prime and the target. The prime and the target could refer to either the same or different semantic identities. Also, the modalities of the prime, the neutral cue, and the target could be either visual or auditory. By adopting the above paradigm, we aimed to answer the following questions: (1) whether and how nonspatial repetition inhibition occurs cross-modally, and (2) whether the nonspatial representations inhibited by nonspatial repetition inhibition are modality-specific or supramodal.

Fig. 1
figure 1

Experimental paradigm and timing of the stimuli in Experiments 1 and 2. For both experiments, the prime–neutral-cue–target paradigm of nonspatial repetition inhibition was used. Written color words and the corresponding verbal sounds were used as stimuli. A trial with the sequence auditory–auditory–visual is illustrated in the figure as an example

Since both the modality and the semantic identity of the target were cued by the prime in the present cross-modal paradigm, for our study we used a 2 (target modality: auditory vs. visual) × 2 (prime–neutral-cue [PN] modality repetition: PN modality same vs. different) × 2 (prime–target modality repetition: modality cued vs. uncued) × 2 (prime–target identity repetition: identity cued vs. uncued) within-subjects design. In the framework of the episodic-retrieval account, both the modality (i.e., sensory-processing properties of the inputs) and the semantic identity of the prime were coded as integrated episodic representations. Upon the presentation of the neutral cue, a new episodic representation that integrates both the modality property and the semantic identity is correspondingly built up. With regard to our first research question, we predicted that if both the modality and semantic identity of the prime were tagged for inhibition, regardless of whether or not the new episodic representation of the neutral cue shared its sensory-processing properties (i.e., modality) with the prime, we should observe both modality-based and semantic-based repetition inhibition, independent of the modality relationship between the prime and the neutral cue. If, however, whether or not the modality or semantic identity of the prime was inhibited depended on whether the prime shared sensory-processing properties with the neutral cue, we should observe a significant interaction between prime–neutral-cue modality repetition and prime–target modality/identity repetition. With regard to our second research question, we predicted that if semantic-based repetition inhibition was modality-specific, we should observe significant semantic-based repetition inhibition only when the prime and the target not only referred to the same semantic identity but also shared a sensory modality, but not when they were from different modalities. In contrast, if the semantic representations inhibited were supramodal, once a new representation was created for the neutral cue and the old representation of the prime was tagged for inhibition, semantic-based repetition inhibition would be observed, irrespective of whether or not the modality of the target was cued.

Experiment 1

Method

Participants

A group of 25 right-handed undergraduate and graduate students (8 males, 17 females; age range: 18–24 years) participated in Experiment 1. All of them had normal or corrected-to-normal vision and normal hearing, and none of them had a history of neurological or psychiatric disorders. They all gave informed consent prior to the experiment, in accordance with the Helsinki declaration, and were paid for their participations after the experiment. This study was approved by the Academic Committee of the Department of Psychology, South China Normal University.

Stimuli and experimental design

The whole experiment was run in a dimly lighted soundproof room. Participants sat in front of a monitor screen, and the eye-to-monitor distance was 60 cm. All of the visual stimuli measured 2° (horizontally) × 2° (vertically). The default visual display was a box with black frames (2.5° × 2.5°) at the center of the screen with a white background. Visual stimuli were always presented inside the central box. Participants were instructed to fixate the central box throughout the experiment without moving their eyes. The auditory stimuli were voice recordings of a female speaker delivered binaurally via stereo headphones. Headphone volume was adjusted for each participant to a comfortable level. The visual and auditory stimuli of the prime and the target consisted of two written color words and their verbal sounds in Chinese: 红 (\hong\) and 蓝 (\lan\) [“red” (\rεd\) and “blue” (\blu:\) in English]. The visual and auditory forms of the neutral cue were the written color word 绿 (“green” in English) and its verbal sound \lü\ (\gri:n\ in English).

The modality of the prime (visual vs. auditory) was blocked. For the auditory-prime block, the prime of each trial was always auditory; for the visual-prime block, the prime of each trial was always visual. The order of the two types of blocks was counterbalanced across participants. For both types of blocks, at the start of each trial the written word or the verbal sound of one of the target colors (红 or 蓝; i.e., “red” or “blue” in English) was presented for 300 ms as the prime. The identity of the prime was uninformative with respect to that of the target. After an interval of 200 ms, the neutral cue was presented for 300 ms. The neutral cue was either the written word 绿 (“green” in English) or its verbal sound \lv\, which was always different from the identities of both the prime and the target. After another interval of 300 ms, the target was presented for 300 ms. The target was either the written word or the verbal sound of one of the two target colors—that is, 红 or 蓝. The identity of the target color could be either the same as or different from that of the prime (Fig. 1). Participants were required to discriminate the color represented by the target; they used the index fingers of both of their hands to respond, by pressing one button on the response box for 红 (“red”) and the other button for 蓝 (“blue”). The mapping between the two response buttons and the two colors was counterbalanced across participants.

Therefore, the experiment had a 2 (target modality: auditory vs. visual) × 2 (PN modality repetition: PN modality same vs. different) × 2 (prime–target modality repetition: modality cued vs. uncued) × 2 (prime–target identity repetition: identity cued vs. uncued) within-subjects four-factorial design. In each of the two types of prime blocks, there were eight experimental conditions and 48 trials under each of the eight conditions. Therefore, participants responded to 384 trials in total in each of the two types of block, and all of the trials within a block were randomly mixed. There was a 2-min rest after every 64 trials for both types of blocks. All of the participants completed a training session of 5 min prior to the formal experiment.

Statistical analysis of the behavioral data

For each experimental condition, omissions, incorrect responses, and trials with RTs ±3 SDs beyond the mean RT for all correct trials (3.2% of the data points were excluded as outliers in the whole experiment) were first excluded from further analysis. Mean RTs of the rest trials were then calculated for each of the experimental conditions. Error rates in each of the experimental conditions were calculated as the proportions between the sum of all omissions and incorrect trials and the overall number of trials; henceforth, they will be shown as percentages.

In order to avoid complicated four-way interactions, and since we focused on the effects of semantic identity repetition and modality repetition between the prime and the target, and how the two types of inhibitory effect were modulated by the modality repetition between the prime and the neutral cue, we ran a 2 (modality of target: auditory vs. visual) × 2 (prime–target modality repetition: modality cued vs. uncued) × 2 (prime–target identity repetition: identity cued vs. uncued) three-way repeated measures ANOVA for both the PN-modality-same condition and the PN-modality-different condition. Significant effects from the reported ANOVAs were examined further by planned t tests on the simple effects (with Bonferroni corrections).

Results and discussion

Table 1 summarizes the mean RTs and error rates in all of the experimental conditions. The between-subjects variability was excluded from the standard errors because of the within-subjects design of the present study (Cousineau, 2005; Morey, 2008). When the prime and the neutral cue were presented in the same sensory modality (PN modality same), the three-way repeated measures ANOVA revealed that the main effect of target modality was significant, F(1, 24) = 30.9, p < .001, indicating that responses were significantly slower to auditory targets (605 ms) than to visual targets (561 ms). The main effect of prime–target modality repetition was significant, F(1, 24) = 36.9, p < .001, indicating that RTs in the modality-cued condition (599 ms) were significantly longer than RTs in the modality-uncued condition (567 ms)—that is, a significant modality-based repetition inhibition effect. The main effect of prime–target identity repetition was also significant, F(1, 24) = 46.2, p < .001, indicating that participants responded significantly more slowly when the identity of the target was cued (595 ms) than when it was uncued (571 ms)—that is, a significant semantic-based repetition inhibition. Neither the two-way nor the three-way interactions were significant, all Fs < 1, suggesting that for both visual and auditory targets, comparable sizes of significant semantic-based repetition inhibition existed when the modality of the target was either cued or uncued, and similarly, comparable sizes of significant modality-based repetition inhibition existed when the identity of the target was either cued or uncued (Table 1 and Fig. 2a).

Table 1 Mean reaction times (in milliseconds) and error rates (%), with standard errors from which between-subjects variability was excluded, in all of the experimental conditions of Experiment 1
Fig. 2
figure 2

Mean response times (RTs), with standard errors from which between-subjects variability has been excluded, shown as a function of all of the experimental conditions in Experiment 1. (a) RTs in the eight experimental conditions when the prime and the neutral cue were from the same modality (PN modality same). (b) RTs in the eight experimental conditions when the prime and the neutral cue were from different modalities (PN modality different)

When the prime and the neutral cue were from different sensory modalities (PN modality different), the three-way repeated measures ANOVA revealed that the main effect of target modality was significant, F(1, 24) = 56.3, p < .001, showing that responses to the auditory targets (655 ms) were significantly slower than responses to the visual targets (580 ms) (Fig. 2b). The main effect of prime–target modality repetition was significant, F(1, 24) = 25.4, p < .001, indicating that responses to a cued modality (636 ms) were significantly slower than responses to an uncued modality (599 ms)—that is, a significant modality-based repetition inhibition. The main effect of prime–target identity repetition was also significant, F(1, 24) = 7.94, p < .05, showing that RTs in the identity-cued condition (612 ms) were significantly shorter than RTs in the identity-uncued condition (623 ms)—that is, a semantic-based facilitatory effect. The interaction between target modality and prime–target modality repetition [F(1, 24) = 3.50, p = .074], the interaction between target modality and prime–target identity repetition [F(1, 24) = 3.15, p = .088], and the interaction between prime–target modality repetition and prime–target identity repetition [F(1, 24) = 3.25, p = .084] were all marginally significant. The three-way interaction was not significant, F(1, 24) = 2.17, p > .1.

Since the analysis above suggested that in the PN-modality-different condition the effects of modality repetition and identity repetition depended on the modality of the target, we further carried out a 2 (prime–target modality repetition: modality cued vs. uncued) × 2 (prime–target identity repetition: identity cued vs. uncued) repeated measures ANOVA for the auditory and the visual targets. For the auditory targets (Fig. 2b, left), the main effect of prime–target modality repetition was not significant, F(1, 24) = 1.53, p > .1. The main effect of prime–target identity repetition was significant, F(1, 24) = 9.67, p < .01, indicating that responses in the identity-cued condition (646 ms) were significantly faster than responses in the identity-uncued condition (664 ms)—that is, a significant semantic-based facilitatory effect. The interaction was also significant, F(1, 24) = 4.27, p < .05. Further planned t tests on simple effects showed that this semantic-based facilitatory effect occurred only when the modality of the target was cued [29 ms; t(24) = 3.12, p < .01], but not when it was uncued (6 ms; t < 1). The modality-based inhibitory effect was not significant in the identity-cued condition (6 ms; t < 1), while it was marginally significant in the identity-uncued condition [29 ms; t(24) = 1.75, p = .092] (Fig. 2b, left). For the visual targets (Fig. 2b, right), the only significant effect was the main effect of prime–target modality repetition, F(1, 24) = 23.0, p < .001, indicating that participants responded significantly more slowly to the cued (609 ms) than to the uncued (552 ms) modality—that is, a significant modality-based inhibitory effect. Neither the main effect of prime–target identity repetition nor the interaction was significant, both ps > .1, suggesting comparable sizes of significant modality-based inhibition in both the identity-cued [59 ms; t(24) = 5.03, p < .001] and the identity-uncued [55 ms; t(24) = 3.89, p < .01] conditions (Fig. 2b, right).

Since the analysis of error rates revealed either patterns similar to those from the RTs or null effects, we report error rates under each of the experimental conditions in Table 1 but will discuss them no further.

Our results in Experiment 1 suggested that both semantic-based and modality-based repetition inhibition did occur, but that their existences were conditional. Semantic-based repetition inhibition occurred only when the prime and the neutral cue shared sensory-processing properties (i.e., when they were from the same sensory modality), but not when the prime and the neutral cue came from different modalities (Fig. 2b). Moreover, no matter whether the modality of the target was auditory or visual, and no matter whether the modality of the target was cued or uncued, comparable sizes of semantic-based repetition inhibition were observed in the PN-modality-same condition (Fig. 2a), suggesting that what was inhibited was a supramodal semantic representation independent of the target modality. On the other hand, the modality-based repetition inhibition occurred dependent on the modality of the target. Specifically speaking, we observed significant modality-based repetition inhibition in both the PN-modality-same and the PN-modality-different conditions, except for the auditory targets in the PN-modality-different condition (Fig. 2b, left).

Since the modality of the prime was blocked in the present experiment, it could be possible that sustained attention on the blocked modality of the prime throughout a task block might have resulted in the present pattern of results, in which only the presentation of a neutral cue sharing the same sensory properties with the prime could tag the prime for inhibition. To examine whether the conditional occurrences of semantic-based and the modality-based repetition inhibition in Experiment 1 changed when the modality of the prime was uncertain, we randomly mixed primes from different modalities in Experiment 2.

Experiment 2

Method

Participants

A new group of 22 right-handed undergraduate and graduate students (9 males, 13 females; age range: 18–24 years) participated in Experiment 2. All of them had normal or corrected-to-normal vision and normal hearing. They all gave informed consent prior to the experiment, in accordance with the Helsinki declaration, and were paid for their participation after the experiment. None of them had a history of neurological or psychiatric disorders. This study was approved by the Academic Committee of the Department of Psychology, South China Normal University.

Stimuli and experimental design

All of the experimental settings, designs, and procedures were the same as those in Experiment 1, except that the modality of the prime was not blocked, and all of the 16 experimental conditions were randomly mixed in the present experiment. Participants rested for 2 min every 64 trials.

Statistical analysis of the behavioral data

The statistical analysis in Experiment 2 was the same as that in Experiment 1, except that this time 6.8% of the overall data points were excluded as outliers in the whole experiment.

Results and discussion

The mean RTs and error rates in all of the experimental conditions of Experiment 2 are reported in Table 2. When the prime and the neutral cue were presented in the same sensory modality (PN modality same), a three-way repeated measures ANOVA revealed that the main effect of target modality was significant, F(1, 21) = 30.7, p < .001, indicating that participants responded significantly more slowly to the auditory targets (638 ms) than to the visual targets (586 ms) (Fig. 3a). The main effect of prime–target modality repetition was significant, F(1, 21) = 23.9, p < .001, showing that RTs in the modality-cued condition (632 ms) were significantly longer than RTs in the modality-uncued condition (592 ms)—that is, significant modality-based repetition inhibition. The main effect of prime–target identity repetition was also significant, F(1, 21) = 32.1, p < .001, indicating that responses to the identity-cued targets (624 ms) were significantly slower than responses to the identity-uncued targets (600 ms)—that is, a significant semantic-based repetition inhibition effect. The interaction between target modality and prime–target modality repetition [F(1, 21) = 4.72, p < .05], the interaction between target modality and prime–target identity repetition [F(1, 21) = 5.31, p < .05], and the interaction between prime–target modality repetition and prime–target identity repetition [F(1, 21) = 4.66, p < .05] were all significant. The three-way interaction was not significant, F(1, 21) = 1.74, p > .1.

Table 2 Mean reaction times (in milliseconds) and error rates (%), with standard errors from which between-subjects variability was excluded, in all of the experimental conditions of Experiment 2
Fig. 3
figure 3

Mean response times (RTs), with standard errors from which between-subjects variability has been excluded, shown as a function of all of the experimental conditions in Experiment 2. (a) RTs in the eight experimental conditions when the prime and the neutral cue were from the same modality (PN modality same). (b) RTs in the eight experimental conditions when the prime and the neutral cue were from different modalities (PN modality different)

Separate 2 (prime–target modality repetition: modality cued vs. uncued) × 2 (prime–target identity repetition: identity cued vs. uncued) repeated measures ANOVAs were then carried out for the auditory and visual targets. For the auditory targets (Fig. 3a, left), the main effect of prime–target modality repetition was significant, F(1, 21) = 34.9, p < .001, showing that responses were significantly slower when the modality of the target was cued (664 ms) than when it was uncued (612 ms)—that is, significant modality-based repetition inhibition. The main effect of prime–target identity repetition was also significant, F(1, 21) = 30.5, p < .001, indicating that participants responded significantly more slowly in the identity-cued condition (653 ms) than in the identity-uncued condition (623 ms)—that is, a significant effect of semantic-based repetition inhibition. Moreover, the interaction was significant, F(1, 21) = 4.42, p < .05. Planned t tests on the simple effects showed that the effect size of the semantic-based inhibition was significantly larger in the modality-uncued condition [41 ms; t(21) = 6.01, p < .001] than in the modality-cued condition [19 ms; t(21) = 2.29, p < .05], a 22-ms difference, t(21) = 2.10, p < .05. Likewise, the effect size of modality-based inhibition was also significantly larger in the identity-uncued condition [63 ms; t(21) = 5.76, p < .001] than in the identity-uncued condition [41 ms; t(21) = 4.31, p < .001], another 22-ms difference, t(21) = 2.10, p < .05 (Fig. 3a, left).

For the visual targets (Fig. 3a, right), the main effect of prime–target modality repetition was significant, F(1, 21) = 7.49, p < .05, indicating that RTs in the modality-cued condition (601 ms) were significantly longer than RTs in the modality-uncued condition (572 ms)—that is, significant modality-based inhibition. The main effect of prime–target identity repetition was also significant, F(1, 21) = 13.5, p < .01, showing that responses to the cued semantic identities (595 ms) were significantly slower than responses to the uncued identities (578 ms)—that is, a significant semantic-based inhibition effect. The interaction was not significant, F(1, 21) = 1.60, p > .1, suggesting that semantic-based repetition inhibition was of comparable sizes in both the modality-cued [13 ms; t(21) = 2.49, p < .05] and the modality-uncued [21 ms; t(21) = 3.41, p < .01] conditions, and similarly, that modality-based repetition inhibition was of comparable sizes in both the identity-cued [25 ms; t(21) = 2.34, p < .05] and the identity-uncued [33 ms; t(21) = 2.87, p < .01] conditions (Fig. 3a, right).

When the prime and the neutral cue were from different sensory modalities (PN modality different), the main effect of target modality was significant, F(1, 21) = 46.7, p < .001, showing significantly slower responses to the auditory targets (713 ms) than to the visual targets (627 ms) (Fig. 3b). The main effect of prime–target modality repetition was marginally significant, F(1, 21) = 3.50, p = .076, indicating a significant trend that responses to the modality-cued targets (677 ms) were slower than responses to the modality-uncued targets (663 ms). The main effect of prime–target identity repetition was not significant, F(1, 21) = 1.74, p > .1. The interaction between target modality and prime–target modality repetition [F(1, 21) = 6.01, p < .05] and the interaction between prime–target modality repetition and prime–target identity repetition [F(1, 21) = 4.88, p < .05] were both significant, but the interaction between target modality and prime–target identity repetition was not significant, F(1, 21) = 2.13, p > .1. The three-way interaction was significant, F(1, 21) = 8.45, p < .01.

A 2 (prime–target modality repetition: modality cued vs. uncued) × 2 (prime–target identity repetition: identity cued vs. uncued) repeated measures ANOVA was also carried out for the auditory and visual targets, respectively. For the auditory targets (Fig. 3b, left), the main effect of prime–target modality repetition was not significant, F(1, 21) = 1.36, p > .1. The main effect of prime–target identity repetition, however, was significant, F(1, 21) = 6.39, p < .05, indicating that RTs in the identity-cued condition (707 ms) were significantly shorter than RTs in the identity-uncued condition (720 ms)—that is, a semantic-based facilitatory effect. The interaction was also significant, F(1, 21) = 10.8, p < .01. Planned t tests on the simple effects showed that the semantic-based facilitatory effect occurred only when the modality of the target was cued [38 ms; t(21) = 4.00, p < .01], but not when the modality of the target was uncued [–11 ms; t(21) = 1.20, p > .1], and that significant modality-based facilitation occurred only in the identity-cued condition [45 ms; t(21) = 2.63, p < .05], but not in the identity-uncued condition (–4 ms; t < 1) (Fig. 3b, left). For the visual targets (Fig. 3b, right), the only significant effect was the main effect of prime–target modality repetition, F(1, 21) = 11.8, p < .01, indicating that participants responded significantly more slowly to the cued modality (652 ms) than to the uncued modality (603 ms)—that is, a significant modality-based repetition inhibition effect (Fig. 3b, right). Neither the main effect of prime–target identity repetition (F < 1) nor the interaction [F(1, 21) = 1.32, p > .1] was significant, suggesting that the significant modality-based inhibition was of comparable sizes in both the identity-cued [55 ms; t(21) = 3.71, p < .01] and the identity-uncued [42 ms; t(21) = 2.67, p < .05] conditions (Fig. 3b, right).

Since the analysis of error rates revealed either patterns similar to those from the RTs or null effects, we report error rates in each of the experimental conditions in Table 2 but will discuss them no further.

In Experiment 2, we replicated the conditional occurrences of the semantic-based and modality-based repetition inhibition effects obtained in Experiment 1, except for an enlarged semantic-based inhibitory effect for auditory targets in the PN-modality-same condition when the modality was uncued, as compared to when it was cued (Fig. 3a, left). This interaction was caused by reduced RTs to the auditory targets when both the modality and the identity were uncued. The only different experimental setting between Experiments 1 and 2 was the uncertainty of the modality of the prime. Since the modality of the prime in each trial was unpredictable in Experiment 2—that is, participants did not know the exact modality of the prime prior to each trial—attention on the modality of the prime in each trial should have been more transient. Therefore, it should have been easier to create a new episodic representation for the neutral cue and to tag the prime for inhibition, making it faster/more efficient to bias the attention system toward a new target, especially when both its semantic identity and sensory modality were new (uncued). Also, the reduced RTs happened only for the auditory target (Fig. 3a, left), but not for the visual target, even when both the semantic representation and the modality of the visual target were uncued as well (Fig. 3a, right). Since it has been suggested that auditory stimuli have stronger and more automatic alerting effect than do visual stimuli (Posner, Nissen, & Klein, 1976), only processing of the auditory, rather than the visual, targets could benefit from the modality uncertainty of the prime in the present experiment, when both the semantic identity and the modality of the target were uncued.

Besides the general pattern of the conditional occurrences of the semantic-based and modality-based types of repetition inhibition across Experiments 1 and 2, we consistently observed a semantic-based facilitation in both experiments in the PN-modality-different condition for the modality-cued auditory target (Figs. 2b, left, and 3b, left). And this semantic facilitation was caused by reduced RTs to the auditory targets in the PN-modality-different condition when both the modality and the semantic identity of the auditory targets were cued—that is, in the auditory–visual–auditory identity-cued condition. However, no facilitated responses were observed for the visual targets in the PN-modality-different condition, even when both the modality and the semantic identity of the visual targets were also cued—that is, in the visual–auditory–visual identity-cued condition (Figs. 2b, right, and 3b, right). As we discussed above, in the PN-modality-different condition—that is, when the prime and the neutral cue didn’t share sensory-processing properties—the semantic representation of the prime was not tagged for inhibition. Therefore, depending on the effectiveness of the neutral cue, the building up of a new episodic representation for the neutral cue could either leave some residual attentional resources on the prime, resulting in facilitation for the repeated target, or completely eliminate the attentional resources on the prime, making the activation level the same for the prime representation as for a new (uncued) representation, resulting in a null effect for the repeated target. Since it has been suggested before in the spatial domain that a visual neutral cue was not effective enough to attract attention completely away from the auditory prime (Spence & Driver, 1998b), creating a new episodic representation for the visual neutral cue in the auditory–visual–auditory condition (Figs. 2b, left, and 3b, left) could have left some residual attentional resources on the semantic identity of the auditory prime and caused facilitatory effects when the target identity was the same (cued) as that of the prime. By contrast, creating a new episodic representation for the more effective auditory neutral cue in the visual–auditory–visual condition (Figs. 2b, right, and 3b, right) could completely eliminate the attentional resources on the semantic identity of the visual prime and make its activation level the same as that for a new (uncued) stimulus, resulting in comparable RTs between the identity-cued and the identity-uncued targets.

Additionally, since we used only one level of SOA in the present two experiments, one might argue that the absence of semantic-based repetition inhibition in the PN-modality-different condition might be due to the fact that the semantic-based repetition inhibition might have a different time course when the prime and the neutral cue were cross-modal, so that the constant SOA between the prime and the target in both experiments was not sufficient for semantic-based repetition inhibition to manifest. In order to test this possibility, we ran a control experiment in which three levels of SOA (800, 1,000, and 1,200 ms) were used and only the experimental conditions in the PN-modality-different condition were included. We found a pattern of results similar to that in Experiments 1 and 2: Semantic-based repetition inhibition was not observed in the PN-modality-different conditions across all three levels of SOA (see Supplementary Fig. 1). Therefore, the failure to tag the semantic representation of the prime for inhibition was indeed due to the modality similarity between the prime and the neutral cue, rather than to the different time courses.

General discussion

In the two experiments of the present study, we aimed to answer two questions: (1) whether and how nonspatial repetition inhibition occurs cross-modally, and (2) whether the representations inhibited by repetition are modality-specific or supramodal. Since it has been suggested before that only with the introduction of an appropriate intervening stimulus between the prime and the target will nonspatial repetition inhibition occur in discrimination tasks (Fuentes et al., 1999; Spadaro et al., 2012), we adopted the prime–neutral-cue–target paradigm of nonspatial repetition inhibition in a two-choice discrimination task. No matter whether the modality of the prime was blocked (Exp. 1) or mixed (Exp. 2), our data consistently showed that semantic-based repetition inhibition occurred dependent on the modality relationship between the prime and the neutral cue, and that modality-based repetition inhibition occurred dependent on the modality of the target. Specifically speaking, semantic-based repetition inhibition occurred only when the prime and the neutral cue shared a sensory modality (Figs. 2a and 3a), but not when the prime and the neutral cue were from different sensory modalities (Figs. 2b and 3b). Modality-based repetition inhibition occurred both when the prime and the neutral cue shared and when they differed in sensory modality, except for auditory targets when the prime and the neutral cue were from different sensory modalities—that is, in the auditory–visual–auditory condition (Figs. 2 and 3).

According to the episodic-retrieval account (Lupiáñez & Milliken, 1999; Lupiáñez et al., 2001; Milliken et al., 2000), both the modality property and the semantic identity of the prime are coded into an integrated episodic representation (Kahneman et al., 1992). Upon the presentation of the neutral cue, a new episodic representation would accordingly be created, and the old episodic representation of the prime would be tagged for inhibition (Grison et al., 2005; Tipper et al., 2003). The question was whether the modality and the semantic identity of the prime were both unconditionally tagged for inhibition, or whether each of them was tagged for inhibition only under certain conditions. Our results indicated that the latter case was true. With regard to semantic-based inhibition, the semantic identity of the prime was tagged for inhibition only when the prime and the neutral cue shared sensory-processing properties (i.e., a modality) (Figs. 2a and 3a). These results suggest that only when the neutral cue could not be easily separated from the prime on the basis of early sensory-processing properties and the old representation of the prime competed for within-modal processing resources with the newly created representation of the neutral cue is the prime tagged for inhibition. By contrast, if the old representation differed from the new representation even in its early sensory-processing properties, the former representation would not be inhibited (Figs. 2b and 3b), probably because the two consecutive representations were clearly separable and did not compete for attentional resources within a certain sensory modality.

With regard to modality-based inhibition, the modality of the prime was inhibited both when the modalities of the prime and the neutral cue were the same and when they were different, except for auditory targets in the PN-modality-different condition. These results suggest that once the sensory modality of the prime has been attended, it will be tagged for inhibition, irrespective of whether the subsequent neutral cue is from the same or from a different modality. In order to build up an accurate episodic representation of a stimulus, the stimulus first has to be fully individualized from other stimuli, both physically and temporally (Kanwisher, 1987, 1991). Especially in the present cross-modal prime–neutral-cue–target paradigm, the early sensory-processing properties may play a critical role in individualizing the neutral cue from the prime. Therefore, in order to efficiently integrate the sensory properties and the semantic identity of the neutral cue into a new episodic representation, the modality property of the previous prime needs to be inhibited, irrespective of the modality relationship between the prime and the neutral cue. One exception, however, was with auditory targets in the PN-modality-different condition (Figs. 2b, left, and 3b, left). For the identity-cued auditory targets in the PN-modality-different condition, modality-based inhibition occurred in neither Experiment 1 nor Experiment 2, because of the reduced (facilitated) RTs in the auditory–visual–auditory identity-cued condition (see the discussion in Exp. 2). On the other hand, for the identity-uncued auditory targets in the PN-modality-different condition, modality-based inhibition occurred only in Experiment 1, but not in Experiment 2. The existence of repetition inhibition is determined not only by the physical match between the prime and the target, but also by the attentional set participants have adopted in the tasks at hand (Chen, Fuentes, & Zhou, 2010; Chen, Zhang, & Zhou, 2007; Lupiáñez et al., 2001; Lupiáñez et al. 2007; Milliken et al., 2000). Since the modality of the prime was blocked in Experiment 1, so that the participants were fully aware that the prime modality was uninformative with regard to the target, the participants could adopt a sustained inhibitory tendency toward the blocked modality throughout the task block. Therefore, as long as this modality-based inhibition was not cancelled out by the semantic-based facilitation, as happened in the auditory–visual–auditory identity-cued condition, the modality-based inhibition kept existing for the identity-uncued auditory targets in the PN-modality-different condition. By contrast, since the modalities of the primes were randomly mixed in Experiment 2, no inhibitory attentional set could be adopted toward a specific modality. Also, because auditory stimuli have a stronger alerting effect than do visual stimuli (Posner, Nissen, & Klein, 1976), it would be difficult for the visual neutral cue to tag the auditory prime for inhibition (i.e., in the auditory–visual–auditory condition), resulting in a null modality effect for auditory targets.

The second question in the present study was whether the semantic representation inhibited by repetition was supramodal or modality-specific. In the spatial domain, it has been suggested that what is inhibited by spatial IOR is a supramodal spatial representation (Spence et al., 2000; Tassinari & Campara, 1996). For example, Spence et al. (2000) used a target–target paradigm with spatial IOR, by presenting a random sequence of visual, tactile, and auditory targets to either the left or the right side of a central fixation, to investigate whether spatial IOR occurred between consecutive targets from the three different modalities. They found that detection responses were slower if the current target was presented at the same (cued) spatial position as the preceding target, irrespective of the modality of both the present and the preceding target, indicating that spatial IOR is supramodal. These results also provide supporting evidence that, despite the vast differences in their initial stages of spatial processing, different sensory modalities may share a spatial attention system in which spatial positions are coded as common representations across sensory modalities (Driver & Noesselt, 2008; Driver & Spence, 1998a, 1998b). However, in terms of the nonspatial repetition inhibition effect, it remains unclear whether the nonspatial representation inhibited is supramodal or modality-specific. In the present study, our results suggested that the conditional occurrence of semantic-based inhibition in the PN-modality-same condition was independent of both whether or not the modality of the target was cued and whether the modality of the target was visual or auditory. The validity of the modality cue affected the effect size of semantic-based inhibition only under certain conditions (e.g., in the visual–visual–auditory condition of Exp. 2), but not the occurrence of the effect per se. These results together suggested that once the semantic identity of the prime was tagged for inhibition, what was inhibited was a supramodal representation independent of both whether the modality of the target was cued and whether the modality of the target was visual or auditory.

The repetition inhibition—that is, the delayed responses induced by repeated stimulus features—has long been observed in both the spatial and nonspatial domains. Although the spatial and nonspatial inhibitory effects showed similar patterns, they differed in some respects. First, a spatial inhibitory effect was consistently observed in both detection tasks (Hu, Samuel, & Chan, 2011; Posner & Cohen, 1984) and a discrimination task in which participants were required to make a two-alternative forced choice (Hu & Samuel, 2011; Lupiáñez, Milan, Tornay, Madrid, & Tudela, 1997; Lupiáñez et al., 2001), but a nonspatial repetition effect was often observed in detection tasks (Hu et al., 2011; Law et al., 1995) but turned into a null/facilitatory effect in discrimination tasks (Hu & Samuel, 2011). Second, the introduction of a neutral cue between the prime and the target is not necessary for spatial inhibition to occur, since the spatial inhibitory effect was observed even when there was no central cue in both the detection task (Hu et al., 2011; Prime, Visser, & Ward, 2006) and the discrimination task (Hu & Samuel, 2011). Moreover, the spatial inhibitory effect could be observed even if attention was maintained at the cued location and no central cue was presented (Berger, Henik, & Rafal, 2005; Berlucchi, Chelazzi, & Tassinari, 2000; Chica, Lupiáñez, & Bartolomeo, 2006). Therefore, the role of the central reorienting cue during spatial inhibition should be something other than disengaging attention and returning it to fixation (see Lupiáñez, 2010, for a discussion). For example, Lupiáñez proposed that the onset of the peripheral cue in spatial IOR triggers both the orienting of attention and the activation of an object representation that is used to process the target (cue–target integration). The latter activation will enhance the discrimination of the target (facilitation) but hinder the target’s detection (inhibition). The central fixation cue between the peripheral cue and the target serves to stop the cue–target integration process, resulting in the absence of a facilitatory effect and the appearance of an inhibitory effect—that is, spatial IOR (Lupiáñez, 2010). In contrast to the minor role of the central cue for the spatial inhibitory effect, the intervening neutral cue plays a critical role for nonspatial inhibition to occur in discrimination tasks: The nonspatial inhibitory effect did not occur in a discrimination task (Hu & Samuel, 2011) unless an appropriate intervening neutral stimulus was presented between the cue and the target (Fuentes et al., 1999; Spadaro et al., 2012). In the present study, by adopting a cross-modal paradigm and by introducing a neutral cue between the prime and the target in a discrimination task, we further showed that it becomes necessary for selective attention to inhibit the old episodic representation of the prime only when building up a new episodic representation for the neutral cue is more demanding—for example, when the neutral cue shares sensory-processing properties with the prime, making the individualization of the neutral cue more difficult (as in our study), or when a response is required for the neutral cue in a target–target paradigm of repetition inhibition, increasing the processing load of the neutral cue (Spadaro et al., 2012).

In conclusion, by introducing a novel, cross-modal prime–neutral-cue–target paradigm, we revealed that both semantic-based and modality-based repetition inhibition can exist under certain conditions. In our experiments, semantic-based inhibition occurred only when the prime and the neutral cue were from the same modality, suggesting that the semantic identity of the prime was tagged for inhibition only when the consecutive prime and neutral cue were not clearly separable on the basis of their early sensory-processing properties, so that they probably competed for processing resources within a modality. Moreover, once the semantic identity of the prime was tagged for inhibition after the creation of a new episodic representation for the neutral cue, the semantic-based repetition inhibition occurred independent of both whether the modality of the target was cued and whether the modality of the target was visual or auditory, indicating that the semantic representation inhibited by the semantic-based repetition inhibition was supramodal. On the other hand, modality-based inhibition occurred in all of the conditions except the auditory–visual–auditory condition, suggesting that the occurrence of modality-based inhibition might depend on the relative alerting strength between the sensory modalities of consecutively presented stimuli.