Retrieval-induced forgetting refers to the finding that practicing retrieval of particular information decreases the recall of other, nonpracticed information related to the same cue (Anderson, Bjork, & Bjork, 1994; except where noted, all later Anderson studies were authored by M. C. Anderson). The retrieval practice paradigm was developed by Anderson et al. to demonstrate this phenomenon. The paradigm consists of three phases: In the first phase, participants are provided with a list of category–item pairs for study. In the following, retrieval practice phase, half of the items from half of the categories are practiced by presenting the category name and the initial letters of an item as cues. After a distractor task for about 20 min, a final test is given in which all items from all categories are tested using a category-plus-initial-letter cue. The practiced items from the practiced categories (Rp+ items) are recalled better than the nonpracticed items from the nonpracticed categories (Nrp items), demonstrating the positive effects of the retrieval practice. The recall of the nonpracticed items from the practiced categories (Rp– items), on the other hand, is lower than that of the nonpracticed items from the nonpracticed categories (Nrp). The lower recall of the Rp– items as compared with the Nrp items is termed the retrieval-induced forgetting effect (RIF effect).

Anderson et al. (1994) explained retrieval-induced forgetting in terms of inhibition. According to their reasoning, during retrieval practice, all items from the practiced category are activated and compete for recall. In order to overcome the competition from the inappropriate items, an inhibitory control mechanism reduces the activation of these irrelevant items. Since this inhibition is relatively long lasting, later recall of these suppressed items is impaired.

Retrieval-induced forgetting, however, can be also explained by strength-based models (J. R. Anderson, 1983; Mensink & Raaijmakers, 1988; Raaijmakers & Shiffrin, 1981). Strength-based models explain retrieval-induced forgetting in terms of the associative strength between cue and item. When the target items are practiced during the retrieval phase, the association between the cue and the target item is strengthened. In the test phase, where all items have to be recalled, the strengthened items interfere with the recall of the relatively weaker nonpracticed items, leading to impaired recall of the latter.

In general, the impaired recall of the nonpracticed items can be explained by both inhibition during the retrieval practice phase and competition during the test phase. Although both theories provide an explanation for the basic finding, certain results appear to be inconsistent with an explanation in terms of strength-based competition, and to be only explainable by the specific assumptions of the inhibition theory.

One of these assumptions is the retrieval specificity assumption. According to this hypothesis, retrieval-induced forgetting only occurs when the practice requires the active retrieval of the target item. Active search for the target item triggers all items related to the cue, and requires the inhibition of the irrelevant exemplars in order to restrict recall to the correct target item. If the target item was already given during the retrieval practice, no such activation and competition of nontarget items occurs, and thus no inhibition of these nonpracticed items is necessary.

Anderson et al. (2000) tested the retrieval specificity assumption using a modified version of the retrieval practice paradigm. In the retrieval practice phase, participants had to recall either the target item, given the category cue (competitive condition), or the category to which the target item belonged, given the target item as cue (noncompetitive condition). Although in both conditions the target items were recalled equally well on the final test, only the competitive condition led to impairment of the nonpracticed items. Hence, this demonstrates that strengthening of the target items by itself does not lead to impaired recall; only active retrieval of the nontarget items, which activates the inhibitory control mechanism, leads to impairment. The observed pattern was assumed to be inconsistent with an explanation in terms of strength-based competition:

The finding that practiced items can be significantly strengthened without impairing related items replicates previous work arguing against an interpretation of retrieval-induced forgetting in terms of strength-dependent competition (Anderson et al., 1994). Rather, the main factor determining retrieval-induced forgetting is the need to resolve competition during retrieval practice. (Anderson et al., 2000, p. 527)

Other studies testing the retrieval specificity assumption have made use of a standard study practice instead of noncompetitive retrieval practice. In all of these studies (e.g., Anderson & Bell, 2001; Bäuml & Aslan, 2004; Ciranni & Shimamura, 1999; Hanslmayr, Staudigl, Aslan, & Bäuml, 2010; Johansson, Aslan, Bäuml, Gäbel, & Mecklinger, 2007; Wimber, Rutschmann, Greenlee, & Bäuml, 2009),Footnote 1 similar findings have been obtained, although in some of these studies (Johansson et al., 2007; Wimber et al., 2009) no control condition was present, making it impossible to tell whether there was or was not inhibition in the noncompetitive condition. In other studies (Hanslmayr et al., 2010), performance on the practiced items was higher in the competitive condition; hence, the finding that competitive and noncompetitive forms of practice differ in the size of the RIF effect no longer uniquely favors an interpretation in terms of inhibition.

However, it should be noted that in most of these studies (Ciranni & Shimamura, 1999, being the exception) no feedback was given during the retrieval practice phase. Such a procedure has the effect that the observed probability of recall has little or no relation to the strength of the practiced items. That is, since no feedback was given during retrieval practice, it is likely that after the retrieval practice in the competitive condition, some items were learned very well, and others not at all. That is, an item that is correctly retrieved on the first practice trial will likely be retrieved again on the second and third trials, while an item that is not retrieved on the first trial will most likely also not be retrieved on the next trials. Since additional retrievals will make the association to the category cue stronger, this procedure of no feedback leads to a bimodal distribution of associative strength, with the recalled items at a very high level of strength and the nonrecalled items at a low level of strength. The additional strength will, however, have little effect on the recall probability (the recalled items are already at ceiling) but will lead to an increase in the amount of interference of these Rp+ items on the corresponding Rp– items.

Thus, it is likely that in the experiment of Anderson et al. (2000), the Rp+ items were quite a bit stronger in the competitive than in the noncompetitive condition (especially with respect to the category–item associations), despite the fact that average final recall was equivalent in the two conditions. Anderson et al. acknowledged this possibility but considered it unlikely:

More strengthening in the competitive practice condition might be expected, for instance, on the basis of work showing that retrieval practice facilitates later recall more than does simple reexposure of an item. . . . This hypothesis seems unlikely for several reasons. First, even if differences in strengthening went undetected by our final recall test, the substantial and statistically equivalent facilitation that did occur in the noncompetitive practice condition should have caused at least some impairment, but it did not. . . . Furthermore, even given that competitive practice strengthened individual items more, the summed competition exerted by all of the competitively practiced items is not likely to be larger than that exerted by noncompetitively practiced items. (Anderson et al., 2000, p. 528)

As we show in the Appendix, this reasoning is incorrect: It is possible for a standard strength-based competition model to produce an exact fit to the Anderson et al. (2000) data, including the lack of a RIF effect in the noncompetitive condition, in combination with equal recall of the Rp+ items in the two conditions.

Note that these results only show that the typical results found in these experiments are not incompatible with a strength-dependent competition model; they do not imply that the retrieval specificity assumption itself is incorrect. In order to show that, one would have to demonstrate that noncompetitive retrieval practice can also lead to a RIF effect. This is what we intend to show in the experiments reported in this article.

Present study

The present study investigates whether strengthening of the target item without active retrieval can lead to retrieval-induced forgetting. In our experiments, we used a modified version of the retrieval practice paradigm similar to the noncompetitive condition of Anderson et al. (2000): We provided the target item in the retrieval phase, and participants had to recall the category to which the target item belonged. In our version of the paradigm, we made a number of changes to optimize the manipulation (i.e., to increase the learning of the cue–target association). We assumed that the original noncompetitive condition in the experiment of Anderson et al. (2000) was rather easy: Participants might focus more on the item itself, and this might reduce the learning of the cue–target association. In order to make the task more challenging, we grouped the items in terms of properties (e.g., “round”–button) rather than semantic categories, and we selected low-frequency items from the different categories. Since the task with these changes becomes rather difficult, we presented the study list twice. In the retrieval practice phase, we also provided feedback after each trial to make sure that practice of the target items was effective.

According to the inhibition theory, when target items are strengthened without activation of and competition with the nontarget items, no inhibition is necessary, and retrieval-induced forgetting should in turn be eliminated. Consequently, in the present experiment, no retrieval-induced forgetting would be expected from an inhibitory view. On the other hand, the strength-based models still predict impairment for the nonpracticed items, since the cue–target associations are strengthened. If retrieval-induced forgetting is caused by interference from the strengthened target items during the test phase, as the strength-based models claim, then retrieval-induced forgetting should still occur.

One might argue that certain of the alterations that we made in the present experiment would affect the amount of retrieval-induced forgetting. For instance, low-frequency items might not compete during retrieval of the target item (Anderson et al., 1994), or double presentation of the study list might lead to integration of the items (e.g., Anderson & McCulloch, 1999). Note, however, that both low frequency and integration eliminate or reduce retrieval-induced forgetting, and in the present experiment, according to the inhibition account, no retrieval-induced forgetting would occur anyway. Therefore, such alterations in the task should not affect the predictions of the inhibition theory, which we aim to test in the present study.

Experiment 1

Method

Participants

A group of 36 students from the University of Amsterdam participated in the experiment in exchange for course credit or payment. The average age of the participants (13 male and 23 female) was 23 years (range: 18–49 years). All participants were native Dutch speakers.

Design

Retrieval practice status was manipulated within subjects and had three levels. Half of the items from half of the categories were practiced (Rp+ items), and the other half were not practiced (Rp– items). The remaining items from the nonpracticed categories (Nrp items) served as a baseline to measure the effects of retrieval practice and retrieval-induced forgetting. The counterbalancing of the items in the different conditions resulted in 12 lists that were used as between-subjects variable in the analyses.

Materials

Stimulus selection

Ten categories from the Camp, Jakab, and Raaijmakers (2010) category norms were selected. Eight of these categories (“wood,” “cold,” “loud,” “round,” “red,” “sharp,” “white,” and “soft”) were used as experimental categories, and two categories (“fly” and “swim”) as fillers. The categories grouped the items in terms of features and were unrelated to each other. The category names were unambiguous, were one word long, and had lengths between three and six letters.

Six items were chosen from each of the ten categories. The items that were selected belonged to only one of the categories. For instance, “snow” would not be used, because it could be part of the category “white” but also part of the category “soft.” The items were low-frequency words with M = 78.71 (median 77) average taxonomic frequency. Items were chosen with a length between three and eight letters and between one and three syllables. No two items within a category began with the same initial letter.

Study lists

In the study list, 48 experimental and 12 filler category–item pairs were presented. As in previous experiments (e.g., Anderson et al., 1994), six blocks were created. Each block consisted of one item from each of the eight categories. As in the Jakab and Raaijmakers (2009) experiment, the Rp+ and Rp– items were presented in an alternating order: Half of the practiced categories began with an Rp– item, and the other half with an Rp+ item. Within the block, the items were randomly selected. The study list began and finished with three filler items. The rest of the filler items were presented within two experimental blocks. The study lists were presented twice, resulting in 120 items on each list.

Retrieval practice lists

In order to eliminate competition between the items within a category, the category–target item association was practiced by retrieving the category name given the target item. In the retrieval practice phase, three items from each of the four experimental categories and three items from each of the two filler categories were practiced. Each category–item pair was practiced three times, resulting in 54 exemplars per list. Items were presented in an expanding schedule following the procedure of Anderson et al. (1994). On average, there were 3.7 items presented between the first and second presentations, and 6.7 items between the second and third presentations. No two category members were presented adjacently.

Test lists

In the test list, a category name and the initial letter of the tested item were provided. Each test list began with a filler category, followed by the eight experimental categories. Practiced and nonpracticed categories were tested in an alternating order. After the initial filler category, half of the test lists began with a practiced category, and the other half with a nonpracticed category. Within a list, the practiced category began with the practiced Rp+ item in half of the categories, the other half of the categories began with a nonpracticed, Rp– item. In total, 54 category–item pairs were tested.

Procedure

Participants were individually tested on a Pentium G3 computer. E-Prime (Schneider, Eschman, & Zuccolotto, 2002) was used to run the experiment. The procedure followed the retrieval practice paradigm developed by Anderson et al. (1994). The experiment consisted of four phases: the study phase, the retrieval practice phase, a distractor phase, and the test phase. Participants were seated in front of the computer and informed that they were taking part in a memory experiment. The further instructions were displayed on the computer screen. In the study phase, participants were instructed to learn the category–item pairs that appeared on the screen. The study trial started with a fixation point in the middle of the screen for 1,000 ms, followed by a blank screen for 500 ms; then, a category–item pair was presented for 5 s, followed by a blank screen for 500 ms, and the next trial was presented.

In the retrieval practice phase, participants were provided with the item and were instructed to type the category name plus the item. The retrieval practice trial also started with a fixation point for 1,000 ms in the middle of the computer screen, followed by a blank screen for 500 ms. After the blank screen, the target item was provided with an empty square underneath it for 10 s. Participants were instructed to type in the category plus the item they had learned in the study phase and to press enter. By pressing the enter button, the correct answer was presented for 2 s, followed by a blank screen for 500 ms until the next trial began. Note that in order to make the task more challenging, we did not present the initial letter of the category name as an additional cue. However feedback was given after each retrieval practice trial.

Between the retrieval practice and test phases, an unrelated visual task was presented for 20 min. In the test phase, participants were instructed to complete the item given the category plus an initial-letter cue. After a fixation point presented for 1,000 ms, followed by a blank screen for 500 ms, a category name and the initial letter of an item were presented, and participants had to complete the cue with an item they had learned in the study phase. After the task, participants filled in an on-screen exit interview. The task took about 50 min.

Results and discussion

Retrieval practice phase

In the retrieval practice phase, the category names were correctly recalled in 98.5% of the cases. Most of the errors were made on the first retrieval practice trial (recall on that trial was 96.5%). This retrieval rate is similar to that in the noncompetitive condition of Anderson et al. (2000; M = 99.3%).

Test phase

The recall rates were calculated for the three item types: Rp+, Rp–, and Nrp. Figure 1 shows the recall rates for the different item types. A repeated measures ANOVA was used, in which item type served as within-subjects factor and list as a between-subjects factor. The alpha level of .05 was used for all statistical tests.

Fig. 1
figure 1

Mean recall percentages (with standard errors) for the different item types in Experiment 1

The main effect of item type was significant, F(2, 48) = 56.74, MSE = 0.012, p < .001, η 2p = .703. A planned comparison revealed that Rp+ items were significantly better recalled (M = 64%) than Nrp items (M = 44%), F(1, 22) = 68.16, MSE = 0.022, p < .001, η 2p = .740. Hence, our retrieval practice was effective and improved the retrieval of the Rp+ items in the test phase. More importantly, the recall of the Rp– items was significantly lower (M = 38%) than that of the baseline items (M = 44%), F(1, 22) = 7.29, MSE = 0.017, p = .013, η 2p = .233. Thus, this demonstrates impairment for nonpracticed items in a noncompetitive condition.

We also examined the probabilities of recall as function of testing order. For this purpose, the data were split as a function of whether the item was tested in the first three or the final three test positions. The results of this analysis replicated the findings obtained by Anderson et al. (2000) for their competitive retrieval practice condition: There was a RIF effect for the items tested first (10%), but no RIF effect for the items tested second (2%). As in Anderson et al. (2000), this was mainly due to a decrease in the probability of recall for the Nrp items in the later test positions. These results show that the effect observed was not due to a differential output interference effect.

In summary, using a noncompetitive retrieval task leads to strengthening of the practiced items and also leads to impairment of the nonpracticed items. This result is contrary to the expectations based on the retrieval specificity assumption of the inhibitory view, which would not predict impairment in a noncompetitive condition. Since presenting the target item eliminates competition from other, nontarget items, no inhibition should be necessary. On the other hand, this result is consistent with the expectation from strength-based models that predict that strengthening of the practiced item leads to interference and blocking during the test phase.

A possible criticism of our results is that in the retrieval practice task, we asked the participants to type in both the category and the item. We used that procedure since we wanted to stay as close as possible to the procedure used by Anderson et al. (2000), in which the participants were also required to write down both the category name and the item. However, if the participants were looking down at the keyboard while typing the category name, they might have used an implicit retrieval attempt to recall the item, and this implicit retrieval might have acted like a regular competitive retrieval, leading to inhibition of other members of the category. A similar assumption has been previously made by Bäuml and Aslan (2006) and Aslan, Bäuml, and Grundgeiger (2007) to explain the inhibition observed in part-list cuing experiments. For that reason, we replicated the experiment with one minor change: The participants were no longer required to type in both the category and the item, but only the category.

Experiment 2

Our first experiment provided evidence that retrieval-induced forgetting also occurs without competition, and that strengthening of the target item is enough to cause such impairment. The aim of our second experiment was to show that this effect was not due to implicit retrieval of the target item, leading to inhibition of the other members of that category.

Method

Participants

A group of 24 students from the University of Amsterdam participated in the experiment in exchange for course credit or payment. The average age of the participants (2 male and 22 female) was 23.6 years (range: 18–49 years). All participants were native Dutch speakers.

Design

The design was identical to that of Experiment 1.

Materials and procedure

The same stimulus materials were used as in Experiment 1. The study and test lists were identical to those in Experiment 1. The only change occurred in the retrieval practice phase: In the present experiment, participants were told to type in only the category name, rather than both the category name and the item. The remaining aspects of the procedure were the same as in Experiment 1.

Results and discussion

Retrieval practice phase

In the retrieval practice phase, the category names were correctly recalled in 99% of the cases. All of the errors were made on the first retrieval practice trial (recall on that trial was 97%). This retrieval rate is similar that in Experiment 1.

Test phase

Figure 2 shows the recall rates for the different item types. A repeated measures ANOVA was used, in which item type served as a within-subjects factor and list as a between-subjects factor. The alpha level of .05 was used for all statistical tests.

Fig. 2
figure 2

Mean recall percentages (with standard errors) for the different item types in Experiment 2

The results replicated the findings from Experiment 1. The main effect of item type was significant, F(2, 24) = 29.87, MSE = 0.008, p < .001, η 2p = .713. A planned comparison revealed that Rp+ items were recalled significantly better (M = 61%) than Nrp items (M = 47%), F(1, 12) = 28.86, MSE = 0.017, p < .001, η 2p = .691. Hence, the retrieval practice was effective. Once again, a significant RIF effect was obtained: The recall of the Rp– items was significantly lower (M = 41%) than that of the baseline items (M = 47%), F(1, 12) = 6.51, MSE = 0.013, p = .025, η 2p = .352.

As in the first experiment, we also analyzed the probabilities of recall as a function of testing order. The results of this analysis replicated the previous results: There was a larger RIF effect for the items tested first (8%) and a smaller RIF effect for the items tested second (4%). As in the previous experiment and in Anderson et al. (2000), this result was mainly due to a decrease in the probability of recall for the Nrp items in the later test positions. Hence, the RIF effect that was observed was not due to a differential output interference effect.

Hence, we may conclude that the requirement in Experiment 1 to type in both the category name and the item was not responsible for the RIF effect observed with noncompetitive retrieval practice. This is also clear from an ANOVA that was run on the combined data from Experiments 1 and 2. The main effect of the between-subjects factor experiment was not significant, and neither were any of its interactions.

General discussion

The aim of the present study was to investigate whether strengthening the target items without active retrieval would result in retrieval-induced forgetting. According to the retrieval specificity assumption of the inhibition theory, as formulated by Anderson et al. (2000), impaired recall for the nontarget items should only occur if active retrieval of the target items takes place.

In both experiments, we did not find support for such a mechanism. Using noncompetitive retrieval, impairment occurred for the nonpracticed items. This suggests that retrieval-induced forgetting is not restricted to conditions in which active suppression of irrelevant items might occur. Hence, the exclusive role of inhibition in causing the retrieval-induced forgetting effect is not supported.

In Experiment 1, we used a setup similar to that of Anderson et al. (2000), with a noncompetitive condition in which the target item was provided and the category name had to be recalled. In contrast to their findings, impairment for the nontarget items occurred: Recall was lower for the Rp– items than for the Nrp items. In Experiment 2, we changed the procedure slightly to eliminate the possibility that there might have been covert retrieval of the item during the supposedly noncompetitive retrieval practice. Such an assumption was previously proposed by Bäuml and Aslan (2006) and Aslan, Bäuml, and Grundgeiger (2007) in their extension of the inhibition theory to part-list cuing. We did not find any evidence for such covert retrieval: The RIF effect observed was not changed when the participants did not have to type in the item but only the category name.

Hence, we conclude that strengthening the association between cue and target may be sufficient to cause impaired recall of the nonstrengthened items, as predicted by strength-based competition models. A control mechanism to inhibit irrelevant information during practice, as suggested by the inhibition account, is not necessary.

The present result contrasts with the results found by Anderson et al. (2000), who obtained no effect of noncompetitive retrieval practice, although they did find an effect of competitive retrieval practice. Since their study and ours used similar designs, the discrepancy in the observed data patterns might seem surprising. In the following paragraphs, we will explain the main differences between the two studies that might have led to these contradictory results and will give an alternative explanation for the data pattern found by Anderson et al. (2000).

First, the task given in the noncompetitive condition in the Anderson et al. (2000) experiment was very easy and probably did not lead to much additional storage (especially with regard to the category–item associations). It is quite likely that what was learned during the practice trials in this condition was restricted to the context-to-item associations. However, although this would lead to higher recall at the final test (as compared to the Nrp condition), it would not lead to a larger RIF effect, since such increases in context-to-item strengths affect performance on both the Rp– items and the Nrp items. Thus, if the noncompetitive practice did not lead to strengthening of the category–item associations, a strength-based account would not expect a differential interference effect on the Rp– items. Similar reasoning was used by Anderson (2003, p. 428) to explain why there might not be an inhibition effect when the task induces the participants to focus on the item, rather than on the category label.

In our experiments, the noncompetitive retrieval practice was more difficult than in the Anderson et al. (2000) experiment. Not only were the category–item associations that were used by Anderson et al. (2000) rather strong as compared to the stimulus set that was used in our experiments, but the participants were also provided with the first two letters of the category in addition to the item, whereas in our experiment we only provided the item. Both modifications of the procedure used by Anderson et al. (2000) made our practice phase more difficult, and we assumed that these changes would lead to better learning of the stimulus materials. In this way, we obtained a stronger association between the category cue and the practiced target, and thus possibly more interference during the test phase. In order to balance the difficulty caused by these changes in the task, we provided feedback during practice. In this way, we also ensured that all Rp+ items were truly practiced during the retrieval phase.

Second, as we demonstrated in the introduction, the observation that in Anderson et al.’s (2000) experiment (and in a number of other experiments) the recall levels of the Rp+ items were (almost) identical in the competitive and noncompetitive conditions cannot be used to conclude that the Rp+ items in the two conditions should be equally interfering according to a strength-based competition model. Since no feedback was given during retrieval practice, it is likely that after the retrieval practice in the competitive condition, some items were learned very well, and others not at all. That is, their procedure of no feedback during the retrieval practice task would lead to a bimodal distribution of associative strengths, with the recalled items at a very high level of strength and the nonrecalled items at a low level of strength. The additional strength would, however, have little effect on the recall probability (the recalled items were already at ceiling) but would lead to an increase in the amount of interference of these Rp+ items on the corresponding Rp– items.

In summary, it might be the case that in the experiment of Anderson et al. (2000) the Rp+ items were quite a bit stronger in the competitive condition than in the noncompetitive condition (especially with respect to the category–item associations), despite the fact that the average final recalls were equivalent in the two conditions.

Finally, although the present results are not compatible with a strict version of the inhibition account, a version that assumes that strength-dependent competition does not affect retrieval-induced forgetting (at least not when blocking and output interference are controlled), they do not of course rule out the possibility that both inhibition and competition affect the amount of retrieval-induced forgetting observed. For example, one might assume that two factors are responsible for the forgetting that is observed, an inhibition factor that would have its effect during the retrieval practice phase, and a competition factor that would have its effect during the final testing phase. Such an explanation (which is, of course, highly similar to the traditional two-factor theory of forgetting; see Postman, 1961) would be consistent with many of the findings previously reported in support of the inhibition account. On the other hand, such a proposal would be more difficult to test because of its flexibility, and if accepted, it would require a reconsideration of many arguments previously put forward in support of the inhibition account.

Conclusion

The aim of the present study was to investigate the retrieval specificity assumption of the inhibition theory. Retrieval specificity is a crucial property of the inhibitory account, since it differentiates the two approaches that have been proposed to explain retrieval-induced forgetting: the inhibitory view and strength-based accounts. Our study did not find any evidence to support the necessity of an inhibitory control process during the retrieval of the target items, and thus provides support for the assumption that strengthening of cue–target associations without active retrieval is sufficient to cause retrieval-induced forgetting, as proposed by strength-based models.