The list length effect in recognition memory: an analysis of potential confounds

Kinnell, Angela; Dennis, Simon

doi:10.3758/s13421-010-0007-6

The list length effect in recognition memory: an analysis of potential confounds

Published: 17 November 2010

Volume 39, pages 348–363, (2011)
Cite this article

Download PDF

Memory & Cognition Aims and scope Submit manuscript

The list length effect in recognition memory: an analysis of potential confounds

Download PDF

Angela Kinnell¹ &
Simon Dennis²

5714 Accesses
38 Citations
4 Altmetric
Explore all metrics

Abstract

The list length effect in recognition memory refers to the finding that recognition performance for a short list is superior to that for a long list. The list length effect is consistent with the predictions of item noise models, but context noise models predict no effect. Recently, it has been argued that if potential confounds are controlled, the list length effect is eliminated. We report the results of two experiments in which we looked at the role of attention and the remember–know task in the detection of the list length effect. We conclude that there is no list length effect when potential confounds are controlled and that it is the design used to control for attention that is most vital.

Twenty years of load theory—Where are we now, and where should we go next?

Article 04 January 2016

Gillian Murphy, John A. Groeger & Ciara M. Greene

OpenWMB: An open-source and automated working memory task battery for OpenSesame

Article Open access 04 April 2024

Fábio Monteiro, Letícia Botan Nascimento, … Carla S. Nascimento

Cognitive load theory and educational technology

Article 01 August 2019

John Sweller

The sources of interference that determine performance remain an unresolved issue in our understanding of recognition memory. The task of recognition involves determining whether an item appeared in a given context. On logical grounds, then, prime candidates for sources of interference are the other items that appeared on the list and the other contexts in which the item appeared (Humphreys, Wiles & Dennis, 1994). The item noise approach assumes that it is the other items on the study list that interfere with one's ability to recognize a test probe (Criss & Shiffrin, 2004a). There are numerous mathematical models of recognition memory that adopt this approach, including the global matching models (GMMs), such as the Theory of Distributed Associative Memory (TODAM; Gronlund & Elam, 1994; Murdock, 1982), Minerva II (Hintzman, 1986), the Matrix model (Pike, 1984), and Search of Associative Memory (SAM; Gillund & Shiffrin, 1984), as well as the Retrieving Effectively from Memory model (REM; Shiffrin & Steyvers, 1997) and the Subjective Likelihood model (SLiM; McClelland & Chappell, 1998). Alternatively, interference could arise from the other contexts in which an item has been encountered in the past (Dennis & Humphreys, 2001), and any interference from other items may be negligible (Criss & Shiffrin, 2004a). Context noise models are much fewer in number than item noise models and include the Bind Cue Decide Model of Episodic Memory (BCDMEM; Dennis & Humphreys, 2001) and the model of Anderson and Bower (1972). Furthermore, it is possible that both context and item noise play substantive roles in recognition performance (Cary & Reder, 2003; Criss & Shiffrin, 2004a).

Initial evidence in favor of the context noise approach came from the inability to find a list strength effect. The GMMs predict that if one increases the strength of some items on a list by increasing either the duration or the number of presentations, performance on unstrengthened items should decrease, because the amount of item interference increases (Shiffrin, Ratcliff, & Clark, 1990). However, this does not occur (Ratcliff, Clark & Shiffrin, 1990). Subsequently, item noise models have incorporated some form of differentiation mechanism, so that strengthening an item not only increases its strength, but also simultaneously decreases its similarity to other items, so that there is no net change in interference (McClelland & Chappell, 1998; Shiffrin & Steyvers, 1997). Consequently, the lack of a list strength effect is no longer indicative of the context noise approach, and attention has shifted to the impact of increasing the length of the list.

Item noise models predict that as one increases the number of items on the study list, one should compromise performance, since the amount of interference should increase. The existence of the list length effect in recognition has been well documented (e.g., Bowles & Glanzer, 1983; Cary & Reder, 2003; Gronlund & Elam, 1994; Murdock & Kahana, 1993; Murnane & Shiffrin, 1991; Shiffrin, Ratcliff, Murnane, & Nobel, 1993; Strong, 1912; Underwood, 1978), and as a result, its existence has been somewhat ubiquitously accepted in the literature (Cary & Reder, 2003; Dennis & Humphreys, 2001).

However, closer inspection of a number of published studies reveals contradictory findings. Schulman’s (1974) results indicated that memory for a particular word was unaffected by the number of words that followed it in study. Buratto and Lamberts (2008) conducted a study involving both list length and list strength manipulations. They did not identify a significant effect of list length on recognition performance. Jang and Huber (2008) tested participants on a series of lists, using the list-before-the-last paradigm of Shiffrin (1970). Participants were tested using both a free recall task and a recognition memory task. Jang and Huber found that the length of the study list (6 vs. 24 items) did not significantly affect recognition performance. Murnane and Shiffrin (1991) found no significant effect of list length in their Experiment 3 when short-list performance was compared with the equivalent portion of the long list. Dennis and Humphreys (2001) argued that previous studies that had identified the list length effect had failed to control for four possible confounds: retention interval, attention, displaced rehearsal, and contextual reinstatement. When they controlled for these confounds, they found no significant difference in recognition performance between a 24-word and a 72-word list and argued that interference does not generate a list length effect in recognition memory.

Four potential confounds of the list length effect

Retention interval

The first potential confound of the list length effect that was outlined by Dennis and Humphreys (2001) is retention interval. The retention interval is the duration of time elapsed between a word being presented at study and the subsequent testing of that item. More time is required to view all of the items on a long list than is needed for the short list, meaning that there is a longer retention interval for long-list items. This confound can be controlled by equating the average retention interval of the short and long lists, using either a retroactive or a proactive design (both designs are illustrated in Fig. 1). In the retroactive design, the short list is followed by a period of filler activity, such that the duration of the short list and filler activity combined is equal to the duration of the long list. Only the items at the start of the long list (the same number as is in the short list) are tested, such that target items from each list have had the same average retention interval (Cary & Reder, 2003; Dennis & Humphreys, 2001). The proactive design is the converse of this, with a period of filler activity preceding the short list. In this case, only the items at the end of the long list are tested, in order to equate the retention interval between lists.

Attention

Another possible confound of the list length effect, first raised by Underwood (1978), is attention. It is likely that participants will tire over the course of the long list to a greater extent than for the short list and, thus, pay comparatively less attention to the items. This is more problematic in the proactive design (Cary & Reder, 2003; Dennis & Humphreys, 2001; Underwood, 1978),in which it is the final items of the long list that are tested and performance on these items is compared with performance on the short list. In the retroactive design, all the targets appear at the beginning of the respective test lists, and the attention paid to the target items in each list should not differ significantly. Having participants perform an encoding task that requires a response during study (such as a pleasantness-rating task; Cary & Reder, 2003; Dennis & Humphreys, 2001) allows for the assumption that all items will have been processed to some level, regardless of fatigue. It should be noted, however, that there may be no way to completely eliminate attentional lapses in the proactive condition, and the larger the difference in length between the short and the long lists, the more likely it is that this confound will play a role in the list length finding.

Displaced rehearsal

Displaced rehearsal refers to differences in the pattern of item rehearsal between the short and long lists and may also confound the list length effect finding. This problem arises when retention interval is controlled and only some long list items are tested, whereas all the short-list items are included as targets. In this case, any rehearsal of short-list items will be beneficial to performance, since all the studied items are included as targets at test; therefore, any rehearsal is advantageous. However, there is no such guarantee with rehearsal of long-list items, since only a subset of the studied items are tested. Thus, rehearsal of nontested items would detract from the rehearsal of tested items, reducing performance on those targets. This would favor performance for the short list and could result in a list length effect (Cary & Reder, 2003; Dennis & Humphreys, 2001).

Furthermore, the issue of displaced rehearsal is exacerbated in the retroactive condition, wherein the period of filler activity follows the short list. This period, despite not being intended as a time of rehearsal, may nonetheless be used by participants as an opportunity for the rehearsal of short-list items. In the long list, items are continually presented, providing less opportunity for rehearsal. This would again favor performance in the short list and, possibly, give rise to the list length effect.

Displaced rehearsal can be controlled in a number of ways. First, it is important to ensure that the filler task is more interesting and stimulating for participants than the study task, thereby encouraging them to focus on the filler, rather than rehearse the items (Cary & Reder, 2003; Dennis & Humphreys, 2001). A second strategy is to include the recognition test as incidental, although this is problematic when a within-subjects design is used, since both tests cannot be incidental (Cary & Reder, 2003).

Contextual reinstatement

An influential view proposes that one component of context can be thought of as a set of elements that vary randomly with the passage of time (Estes, 1955; Mensink & Raaijmakers, 1989). The closer in time two events occur, the more similar their contexts will be. As a result, the current context at test is likely to differ from the current context at study as a function of the amount of time that has elapsed. The more similar the active contextual elements present at test are to those that were encoded during study, the better the performance will be.

At test, item noise models require a representation of the study context in order to retrieve all the list items associated with that context. Context noise models use the test probe to cue the retrieval of all previous contexts in which that item has been seen before and then compare the retrieved vector with the representation of the study context. Therefore, the reinstatement of the study context at test is important in both item and context noise models of recognition memory. The more accurate this reinstated study context is, the better the recognition performance is likely to be. It is also possible that participants will not attempt to reinstate an earlier study context at test and, rather, will use the current end-of-list context. Because context varies with the passing of time, there will be more scope for variability in the long list, which would negatively impact performance for that list, given that some of its items would have been studied less recently.

In addition, when retention interval controls are implemented, the issue of contextual reinstatement can become a problem, particularly in the retroactive design, when a period of puzzle filler activity follows the short, but not the long, list. The puzzle activity following the short list involves a clear change in context, through both the passage of time and the change of activity. As a result, when it is time for the test list, it is clear that the puzzle context is inappropriate for the memory test, and a reinstatement of the study context is likely to occur. There is no break at the end of the long list before the beginning of the test list and, thus, no clear demarcation that a change in context has occurred. In this situation, participants may rely on an end-of-list context, which may be different from the start-of-list context. In the retroactive condition, it is the first items of the long list that are tested. As such, reinstating the end-of-list context is unlikely to benefit performance for early items. Considered together, these factors would favor performance on the short list, where the study list context is reinstated more accurately following the puzzle activity.

To control for this confound, contextual reinstatement in both length conditions can be encouraged by including an extended period of filler activity after both the long and short lists, in addition to the period of filler activity already included as a control for retention interval (see Fig. 2). This value has typically varied between studies, from just 9 s (Gronlund & Elam, 1994) to 8 min (Dennis & Humphreys, 2001).

Past and current attempts to eliminate list length effect confounds

Following the work of Dennis and Humphreys (2001), Cary and Reder (2003) conducted three experiments that investigated the list length effect. Experiment 1 was the basic list length design with none of the controls outlined by Dennis and Humphreys implemented. List lengths were 16, 32, 48, and 64 words, with a test list immediately following each. A statistically significant effect of list length was identified.

Experiment 2 was identical to Experiment 1, with the exception of the addition of a 5-min word search puzzle following each study list and before the subsequent test list. This was done to decrease performance but could also be seen to function as a control for contextual reinstatement. No other confounds were controlled, and again, a statistically significant effect of list length resulted.

In Experiment 3, controls were implemented for all four of the potential list length effect confounds outlined by Dennis and Humphreys (2001). Contrary to their findings, however, a list length effect was identified. On the basis of this evidence, Cary and Reder (2003) argued that there is a list length effect in recognition memory and that Dennis and Humphreys's design was not strong enough to detect it.

A further analysis of Cary and Reder's (2003) published results suggests that their finding of a list length effect under all conditions may not be so clear. We compared the magnitude of the list length effect in each of Cary and Reder's three experiments. In Experiments 1 and 2, the present analysis involved only the 16-word (short) and 64-word (long) lists, in order to match the 1:4 list length ratio of the 20-word (short) and 80-word (long) lists in Experiment 3. A t-test for unequal samples was carried out to analyze the differences between the d′ scores for the short and long lists in each of the three experiments.

Analysis revealed that the difference between short and long list d′ in Experiment 1 was not statistically significantly different from the same comparison in Experiment 2,t(46) = 0.40, p > .05, two-tailed. This result is unsurprising, given the similarity in experimental design. Interestingly, the difference between short- and long-list d′ in each of these experiments was significantly different from the d′ difference in Experiment 3 [t(68) = 2.99, p < .05 (two-tailed) for Experiment 1 vs. Experiment 3, and t(56) = 3.04, p < .05 (two-tailed) for Experiment 2 vs. Experiment 3]. These results are illustrated in Fig. 3. Each bar represents the difference in short- and long-list d′ for each experiment. It is evident that despite the existence of a statistically significant list length effect in Experiment 3, the magnitude of this effect is different from that of the significant effects identified in the previous two experiments. As was previously noted, Experiment 3 involved the introduction of controls for the four potential list length effect confounds identified by Dennis and Humphreys (2001). It is clear in the present analysis that employing these controls reduces the magnitude of the list length effect by a statistically significant amount from the original experiments. The fact that a list length effect is still identified may highlight the difficulty in controlling for the possible confounds.

Furthermore, there were a number of differences between the list length studies of Dennis and Humphreys (2001) and Cary and Reder's (2003) Experiment 3. To begin with, a different list length manipulation was used. Dennis and Humphreys used both a 1:2 (40:80 word) ratio and a 1:3 (24:72 word) ratio, whereas Cary and Reder's Experiment 3 had a 1:4 (20:80 word) ratio. It is more likely that item interference will be detected in the latter experiment, given the stronger manipulation of list length, but there is also more potential for the list length confounds to play a role. The greater the ratio, the more likely there is to be differences in attention paid to short and long lists. A larger list length ratio also results in a longer period of filler activity following the short list and more opportunity to rehearse list items. Finally, a larger list length ratio would magnify any differences in contextual drift.

There was also a difference in the analysis of the results, with Cary and Reder (2003) combining the results of the retroactive and proactive conditions. It is therefore unclear whether both designs contributed to the list length effect or whether it was primarily the proactive condition, where attentional lapses are more likely to occur.

Cary and Reder (2003) also employed the remember–know (RK) paradigm in their study, as opposed to the traditional yes/no recognition paradigm. It was the first study to use the RK task to investigate the list length effect. Originally developed by Tulving (1985), the RK paradigm has been used as a means of investigating an individual's conscious experience and awareness in the recognition task (Dunn, 2004). The RK task is easily incorporated into the standard yes/no recognition paradigm. The most common method is a two-step procedure. After making a “yes” response, indicating that they have recognized the test probe from the study list, participants are given the additional step of deciding whether that decision was based on a remember or a know judgment. A remember response is said to signify that the participant can consciously recollect the experience of seeing the remembered word during study (Gardiner, 1988). Know responses are thought to be given when there is no such recollection, with the decision based primarily on a general feeling of familiarity with the test probe (Gardiner & Richardson-Klavehn, 2000; Knowlton & Squire, 1995).

The inclusion of the RK task in the study design could potentially confound the list length effect. Remember responses, in that they are based on recollection of the study experience, have been said to involve a recall-like process (Diana, Reder, Arndt & Park, 2006). Participants are no longer just asked whether or not they recognize a particular word, but rather, they are asked to recall elements of the study event. As Diana et al. noted, the use of the RK paradigm may alter the task requirements, such that participants may rely on recollection under those conditions more than they would with the yes/no recognition paradigm. Thus, the use of the RK task may induce recall. Since the list length effect is widely accepted to occur in recall, this may help to explain why Cary and Reder (2003) found a positive list length effect where Dennis and Humphreys (2001) did not.

There was also a difference in the contextual reinstatement control used in each of the experiments. Dennis and Humphreys (2001) included an 8-min period of puzzle filler before each test list in their second experiment, whereas Cary and Reder's (2003) control for contextual reinstatement was a 2-min period of algebra problem solving before each test list. It is possible that 2 min was not a sufficient amount of time to encourage participants to reinstate the study context at test, rather than rely on the end-of-list context. If this was the case, the list length effect may still be identified, despite the controls.

Dennis, Lee and Kinnell (2008) addressed this issue. Their study contained a condition that encouraged contextual reinstatement after both the short and long lists (filler condition) and one that facilitated contextual reinstatement only after the short list (no-filler condition). In both cases, controls were implemented for the other three potential confounds. Dennis et al. found a significant effect of list length in the no-filler condition, the condition in which contextual reinstatement was facilitated only after the short list, by means of puzzle filler activity. Conversely, no significant effect of list length was identified in the filler condition when controls for Dennis and Humphreys's (2001) four confounds were implemented. On this basis, it seems that failure to control for contextual reinstatement could confound the list length effect finding.

The present experiments continued to explore the role of potential confounds of the list length effect. Specifically, the role of attention and the use of the RK task in the detection of the list length effect in recognition memory was investigated.

Experiment 1 - attention

The aim of this experiment was to examine the influence of attention on the detection of the list length effect. More specifically, we wanted to investigate whether there are differences in performance depending on whether a retroactive or a proactive design is adopted. In addition, the difference between a condition in which participants were required to perform a pleasantness-rating task at study and a condition in which there was no such requirement was investigated. The pleasantness-rating task has been used in previous studies in an attempt to control for differential lapses in attention (Cary & Reder, 2003; Dennis & Humphreys, 2001).

Method

Participants

Participants were 160 psychology students from the University of Adelaide. Each received either course credit or a payment of $12 in exchange for their participation. All gave informed consent.

Design

This experiment had a 2 × 2 × 2 × 2 factorial design, with the factors being list length (short or long), word frequency (low or high), attention task (pleasantness rating or read only), and design (retroactive or proactive). List length and word frequency were within-subjects factors, whereas attention task and design were between-subjects manipulations.

The word frequency manipulation was included as a check of the power of the experimental design. The ability to detect a significant word frequency effect in this experiment would indicate that any failure to find a list length effect would not be because the power of the experiment was too poor to detect any effects.

Materials

The stimuli for this experiment were 140 five- and six-letter words from the Sydney Morning Herald Word Database (Dennis, 1995). Half of the words were of high frequency (100–200 occurrences per million), and half were of low frequency (1–4 occurrences per million). All the lists had the same number of five- and six-letter and high- and low-frequency words. All the words were randomly assigned to lists, with no participant seeing the same word twice, except for targets.

Procedure

Participants were first given an overview of the study and were introduced to the filler activity that would be used throughout the experiment. A computerized sliding tile puzzle was used as the filler task. An image of a fractal was split into 12 pieces of equal size and then scrambled. The participants' task was to rearrange the pieces and return the image to its original form.

Participants studied one short (20-word) and one long (80-word) list, the same list lengths as those in Cary and Reder's, (2003) study. Each study word appeared for 3,000 ms. Test lists were made up of 20 targets and 20 distractors. All the lists had half high-frequency and half low-frequency words. All the words were presented in lowercase letters in the center of a computer screen in white font on a blue background.

Participants were split equally into two attention task conditions. In the pleasantness-rating condition, participants were asked to rate the pleasantness of each word on the study list on a 6-point Likert scale (1, least pleasant; 6, most pleasant) by clicking the appropriate button while that word was being displayed on screen. Participants were told that if they missed rating one of the words within the 3,000 ms, they should rate the next word instead. In the read-only task condition, participants simply read the words of the study list as they appeared on the screen. No response was required.

Within each condition, the design of the lists was either retroactive or proactive in nature. Participants were again divided equally into these conditions. In the retroactive design, the short list was followed by a 3-min period of sliding tile puzzle filler, and the first 20 words of the long list were included as targets at test. In the proactive design, there was 3 min of puzzle filler before the beginning of the short list, and the last 20 words of the long list were tested.

Participants were given 15-s notice before the onset of the test list, which was in the form of the yes/no recognition paradigm. Each word was presented in the middle of the screen above two response buttons marked “yes” and “no.” Participants were instructed to respond “yes” if they recognized the word from the study list and to respond “no” if they did not recognize that word, by clicking on the appropriate button. The test list was self-paced, and a response was recorded for each test word. The targets were the entire study list (short list), the first 20 words of the long study list (retroactive design), or the last 20 words of the long list (proactive design).

An 8-min period of sliding tile puzzle filler activity was included before each test list This was done in an attempt to offset potential differences in contextual reinstatement of the short- and long-list study contexts at test.

The experiment was counterbalanced for order; within each condition, half of the participants began with the short list, and the other half began with the long list.

Results and discussion

The analysis of the present results presents a fundamental difficulty. In circumstances in which both the null and alternative hypotheses have theoretical implications, standard approaches to null hypothesis testing, such as an analysis of variance (ANOVA), are not applicable (Rouder, Speckman, Sun, Morey, & Iverson, 2009). Although we appreciate that most readers will be more familiar with ANOVA and, so, we provide this analysis, we also employ the Bayesian analysis introduced by Dennis et al. (2008), which is designed to address within-subjects designs that submit to a signal detection analysis. Under this approach, results are reported as a pair of probabilities—for example, (.05, .81). The first number presented is the probability that at least 90% of participants favor an error-only model; that is, their pattern of responses is better accounted for by a model in which there is no effect, just chance variation, analogous to the null hypothesis. The second number is the probability that at least 90% of participants are better accounted for by an error-plus-effect model; that is, their pattern of responses is better accounted for by a model in which there is both an effect and chance variation, analogous to the alternative hypothesis. Interpretation is straightforward and is not prone to the subtle mistakes that commonly plague the interpretation of p values. Note that the two probabilities do not sum to 1, because there is some probability that neither of these alternatives is the case. For instance, it might be that half of the subjects show an error-only pattern and half show an error-plus-effect pattern.

The Bayesian method has a number of advantages over null hypothesis testing (Dennis et al., 2008). For our purposes, the most critical of these is that evidence can be accumulated in favor of the error-only hypothesis. The first number in the parentheses is the probability that we should favor this hypothesis. In addition, rather than focus on a difference in the means of two distributions over participants, the method makes a statement about the patterns of responses that individuals tend to show. Whereas the comparison of means can lead one to make generalizations to a population when only a minority of participants are affected, the Dennis et al. method cannot be misused in this way. Furthermore, the method can be used iteratively and with arbitrarily small sample sizes, does not require edge corrections to avoid infinite d′s, and is inherently sensitive to the number of targets and distractors that were presented to the participants. For a more detailed discussion of the method and its advantages, see Dennis et al.

List length

Figure 4 shows the d′ results as a function of list length for all four conditions, and Table 1 shows the corresponding hit and false alarm rates. As recommended by Snodgrass and Corwin (1988), the edge correction applied to all hit and false alarm rates for use in d′ calculations was made by adding a value of .5 to the hit and false alarm counts and adding 1 to the number of target and distractor items. For all analyses, F < 1 and p > .05, unless otherwise stated.

Table 1 Hit and false alarm rates for each of the four attention conditions

Full size table

A 2 × 2 × 2 × 2 (length × frequency × task × design) repeated measures ANOVA yielded a nonsignificant effect of list length on d′, F(1,156) = 2.71, p = .1, and the hit rate, F(1,156) = 1.21, p = .27. However, there was a statistically significant effect of list length on the false alarm rate, F(1,156) = 11.01, p = .001, η _p ² = .07.

For comparison with the results of Cary and Reder (2003), a 2 × 2 × 2 (length × frequency × design) repeated measures ANOVA was carried out for the pleasantness task condition. Analysis revealed a nonsignificant interaction between list length and design on d′, F(1,78) = 2.09, p = .15, the hit rate, F(1,78) = 3.70, p = .06, and the false alarm rate. Cary and Reder obtained the same result and, on that basis, collapsed the retroactive and proactive conditions. Note that such a procedure relies on the inference that a nonsignificant interaction implies equality across conditions, which, as we will see, is not necessarily the case. In the present analysis, the conditions will remain separated.

Four planned comparisons were also carried out on each of the four subgroups in this experiment: pleasantness ratings in the retroactive condition, read only in the retroactive condition, pleasantness ratings in the proactive condition, and read only in the proactive condition. Both the one-way ANOVAs and Bayesian analyses^{Footnote 1} examined the effect of list length collapsed across word frequency.

Retroactive Pleasantness Condition

The retroactive pleasantness condition provided controls for all four potential confounds of the list length effect: retention interval, attention, displaced rehearsal, and contextual reinstatement. Repeated measures ANOVAs showed a nonsignificant effect of list length on d′, the hit rate, and the false alarm rate, F(1,39) = 3.95, p = .054. Similarly, the Bayesian analysis of the d′ values found in favor of the error-only model (.81, .01). Thus, there was no significant list length effect found in this condition, although the effect on the false alarm rate was close to significance.

Retroactive Read Condition

The retroactive read condition controlled for the potential influence of retention interval, displaced rehearsal, and contextual reinstatement. There was no control for attention implemented. Repeated measures ANOVAs in this condition yielded nonsignificant effects of list length on d′, F(1,39) = 3.06, p = .09, and the false alarm rate. There was, however, a statistically significant effect of list length on the hit rate, F(1,39) = 9.95, p = .003, η ²_p = .20. It should be noted, however, that in this condition, performance on the long list was superior to that on the short list, meaning that this result is significant in the direction opposite to that previously identified in the literature. Bayesian analysis of the d′ values again found in favor of the error-only model (.78, .06) and a null list length effect.

Proactive Pleasantness Condition

The proactive pleasantness condition involved controls for the four potential list length effect confounds. However, it remains the case that by the end of the long list,participants may not be paying as much attention to the study words as they were at the start of the list, and so the potential for an attention-induced list length effect remains. In this condition, repeated measures ANOVAs yielded statistically significant effects of list length on both d′, F(1,39) = 11.55, p = .002, η ²_p = .23, and the false alarm rate, F(1,39) = 6.72, p = .013, η ²_p = .15. There was no significant effect on the hit rate. In contrast to the ANOVA d′ results, the Bayesian analysis found in favor of the error-only model (.68, .13).

Proactive Read Condition

Finally, the proactive read condition provided controls for retention interval, displaced rehearsal, and contextual reinstatement. There was no control for differential lapses in attention, and, as was noted in the previous section, the use of the proactive design may have exacerbated this. A repeated measures ANOVA in this condition yielded a significant effect of list length on d′, F(1,39) = 8.26, p < .001, η ²_p = .17. There was, however, a nonsignificant effect of list length on both the hit rate, F(1,39) = 2.40, p = .13, and the false alarm rate, F(1,39) = 3.65, p = .06, although this was close to significance. The Bayesian analysis of d′ values was ambiguous for this condition (.46, .27).

Word frequency

A 2 × 2 × 2 × 2 (length × frequency × task × design) repeated measures ANOVA yielded a significant effect of word frequency on d′, F(1,156) = 232.79, p < .001, η ²_p = .60, in the overall data. Furthermore, planned comparisons were carried out on the word frequency data in each of the four conditions, collapsing across list length. A strong word frequency effect was identified under all conditions (see Fig. 5), using both the standard ANOVA analysis and the Bayesian analysis. These findings place a lower bound on the power of the experiment.

The retroactive pleasantness condition involved controls for all the confounds outlined by Dennis and Humphreys (2001), and the use of the retroactive design made it less likely that attention would play a part in a spurious list length effect finding. It should be noted that, in this condition, no list length effect was identified. At the other end of the scale, however, the proactive read condition involved no control for attention under circumstances (proactive design) in which inattention was likely. A positive list length effect was identified in this condition. In addition, it should be noted that in the retroactive read condition, long-list performance was superior to short-list performance. This result is the first example of superior recognition performance for the long list of which we are aware and a reversal of the traditional list length finding, which sees short-list performance exceed long-list performance. A significant reversal of the effect would be problematic for existing models to account for.

The results of this experiment suggest that it is the retroactive versus proactive distinction that is most influential in the detection of the list length effect, rather than the nature of the study task. This is critical to the comparison with Cary and Reder's (2003) Experiment 3, in which the retroactive and proactive design conditions were collapsed for analysis. When the present results were collapsed in the same manner, a positive list length effect was identified. It appears that the design of the experiment is important and that it is the proactive condition that drives the effect.

Experiment 2 – the remember–know task

The aim of Experiment 2 was to investigate the influence of the RK task at test on the list length effect finding, while controlling for the four potential confounds of the list length effect. It was thought that the RK task may induce recall and that a positive list length effect would be identified in that condition as it is in recall, whereas there would be a null list length effect under yes/no task conditions (a condition equivalent to the retroactive pleasantness condition in Experiment 1).