Strong memory illusions can be produced using simple experimental procedures based on lists of associated words. In the well-known Deese–Roediger–McDermott (DRM) procedure (Deese, 1959; Roediger & McDermott, 1995), participants study lists of words that are associates of a nonpresented “critical lure.” For example, participants study words such as “bed,” “dream,” and “wake,” which are associates of the critical lure “sleep.” When asked to recall or recognize the studied words, participants frequently endorse the critical lures as old. Similar effects occur with taxonomic categories (Buchanan, Brown, Cabeza, & Maitson, 1999; DeSoto & Roediger, 2014; Dewhurst, 2001; Dewhurst & Anderson, 1999; Smith, Ward, Tindell, Sifonis, & Wilkenfeld, 2000). Studying lists of exemplars from taxonomic categories leads to false memories of nonpresented exemplars with high output dominance. For instance, presentation of a list composed of exemplars from the category “furniture” (such as table, couch, bed, desk) produces false memories of the most frequently produced exemplar (i.e., the exemplar with the highest output dominance), the nonpresented exemplar “chair.” The higher the output dominance, the more frequent are the false memories of the nonpresented words (DeSoto & Roediger, 2014), indicating that the probability of producing false memories depends on the graded structure of the category (i.e., exemplars having different levels of representativeness within a category).

Taxonomic categories tend to have well-established memory representations (e.g., Collins & Loftus, 1975; Medin & Schaffer, 1978; Rosch & Mervis, 1975; Rumelhart, Hinton, & Williams, 1986) and to passively reflect the correlational structure of the environment. For example, if a category member has “feathers,” it is more likely to have “wings” and “fly” (because they typically co-occur) than to swim and have gills (Barsalou, 1985; Rosch, Mervis, Gray, Johnson, & Boyes-Braem, 1976). In this sense, mental representations of taxonomic categories play an important role in organizing information about the world, whereas categorical false memories may be seen as a cognitive side-effect of the activation of these representations.

However, people also actively derive new categories in order to attain goals (Barsalou, 1991). These goal-derived categories obtained by conceptual combination of existing knowledge are usually less well established in memory than are taxonomic categories and sometimes may even be created anew, as in the case of ad hoc categories (Barsalou, 1983, 1985).

Our focus in the present studies is on goal-derived subcategories (henceforth referred to as subcategories). To illustrate, a subcategory of the category “Sports” could be “Sports that are good for backache.” Although all members of a subcategory are also members of the corresponding taxonomic category, the mental representation activated by the subcategory (and the exemplars more representative of it) can be substantially different from the mental representation of the taxonomic category and characterized by different output dominance of the exemplars (Soro & Ferreira, 2017). To what extent these more flexible and fleeting goal-derived representations are capable of producing false memories is the question we address here.

In four studies, we test for the occurrence of false memories from subcategories’ representations using lists of words composed of exemplars belonging to the same preexisting and better-established taxonomic categories. Specifically, participants were presented with lists of taxonomic categories or subcategories derived from the former, using a DRM-like procedure (e.g., Dewhurst, Bould, Knott, & Thorley, 2009; Smith et al., 2000).

The broader theoretical question that we address is whether the type of relationship between list items affects false-memory production. Previous research by Roediger, Watson, McDermott, & Gallo (2001) showed that backward associative strength (BAS; i.e., the probability with which a list item elicits the critical lure on a free association task) is the main predictor of false memory. Drawing on Collins and Loftus (1975) spreading activation model, Roediger, Balota, and Watson (2001) proposed the dual-process activation-monitoring theory (AMT). According to the AMT, false memories in the DRM paradigm result from activation spreading from the list items (through semantic and associative networks) and converging on the critical lure, increasing its accessibility. Subsequent source-monitoring errors (Johnson, Hashtroudi, & Lindsay, 1993) leads to false recall/recognition of the critical lure as one of the words previously presented.

An alternative explanation of false memories comes from fuzzy-trace theory (FTT; Brainerd & Reyna, 2001, 2002). According to FTT, memory assessments are based on both verbatim and gist representations. The first encode the items’ superficial/perceptual details, and the latter encode the semantic features of the item or list. Words that are actually experienced (i.e., studied in lists) evoke both types of representations, so true memories are supported by both verbatim and gist representations. When words of a list share similar meaning or are semantically close, as it happens in lists of category exemplars, the gist extracted from the words converge on a list gist. Words very similar or semantically close to this list gist can be mistakenly remembered as having been present in the presented list, so false memories are supported only by gist representations.

Extant research provides converging evidence that associative strength is a better predictor of false memory than is thematic coherence or gist, even when participants are presented with categorically or taxonomically related items (e.g., Buchanan et al., 1999; Knott, Dewhurst, & Howe, 2012). However, false memories are observed in category lists even when BAS is low (Dewhurst et al., 2009), which suggests that there are other aspects such as semantic relatedness or shared meaning that contribute to this effect.

List items may be both semantically and associatively related (e.g., cat and dog), only associatively related (e.g., dog and leash) or only semantically related (e.g., dog and goat). It is thus possible to develop lists that dissociate shared meaning from associative relations. In fact, Coane, McBride, Termonen, and Cutting (2016) compared associative lists that shared semantic features with the critical lure (many of which were taxonomic categories lists) with lists consisting of items that were associated but did not share features with the lure (while controlling for BAS) and showed that the first contributed to false recognition of critical lures above and beyond associative strength. Such additive effects of shared meaning and association strength are consistent with FTT account of false memory, which emphasizes thematic similarity among studied items and critical lures. In the same vein, Soro, Ferreira, Semin, Mata, and Carneiro (2017) showed that lists based on goal-derived categories composed of items that had no preexisting associations with the critical lures (as indexed by the free association norms in Nelson, McEvoy, & Schreiber, 2004) produced false recognitions of these lures. These results challenge an AMT account of false memories. Since goal-derived categories are developed by manipulating existing knowledge and creating new shared meaning among the category exemplars in memory, accounts of false memory based on thematic similarity such as FTT seem to be better positioned to accommodate these results.

In the case of false memories for goal-derived categories, the word lists used in Soro et al. (2017) were not from subcategories of taxonomic categories. Instead, they were composed of exemplars that came from different taxonomic categories. For instance, the category “things to take on a camping trip” is composed of exemplars such as “water,” “tent,” and “matches,” which share few (if any) attributes. Hence, semantic relations developed or activated via conceptual combination (when participants studied these categories’ lists) were established between words coming from different taxonomic categories. Therefore, the exemplars of the goal-derived categories studied by Soro et al. (2017) faced no other strong preexisting semantic relations between them in long-term memory. In contrast, the subcategories used here are composed of exemplars that are also part of preexisting taxonomic categories. What we explore in the present studies is whether these subcategories can produce goal-derived representations consistently enough as to lead to different false memories than the ones typically produced by the well-established taxonomic representations in which they are embedded. Our goal is to explore if exemplars that share preexisting semantic relations of taxonomic nature could compose goal-derived representations capable of generating different false memories. More specifically we compare frequencies of false recognition for the most frequently produced exemplar from two different representations of the same category: the default taxonomic one and a subcategory conditioned by a goal.

As mentioned before, lists of exemplars from taxonomic categories are rich in semantic relations, but they also have strong representations in long-term memory, evidenced by how easily participants access the name of the category from their exemplars and vice versa (Barsalou, 1983). This fast access is likely to occur via semantic networks of concepts sharing many features. In contrast, the subcategories we use in the present studies came from Portuguese norms of goal-derived categories (Soro & Ferreira, 2017), which are unlikely to be frequently produced by the participants. Hence, they tend to have weaker representation in long-term memory.

False memories stemming from “novel” or not frequently processed subcategories, would be mainly derived from active processing of meaning when establishing a new representation of an already well represented category. A consistently different pattern of false memories for subcategories compared with their superordinate taxonomic categories would be an effect difficult to be explained by the AMT, because false memories would change according to the context instead of being only the reflection of preexisting associations or relations between the items.

To manipulate the context, we presented the same lists with either their category name or their subcategory name. Given the notion of flexible conceptual representations of categorical knowledge (Barsalou, 1999; Casasanto & Lupyan, 2015), it might be possible to influence the development of new category representations by simply providing names of subcategories for lists of exemplars from taxonomic categories, leading to different patterns of false memories, even if the subcategory does not have stable representations in long-term memory.

Summing up, the studies here reported manipulated context by presenting categorical lists with default taxonomic category or subcategory names. This manipulation was expected to guide the semantic relations established during study of the categorical lists leading to differences in the representational structures consistent enough to produce different patterns of false memories. Experiment 1 used lists of words composed of exemplars with high output dominance in taxonomic categories and exemplars with high output dominance in the respective subcategories, allowing us to explore the impact of manipulating the category names on the exact same lists. Experiment 2 manipulated not only the categories’ names but also the lists of words presented, which were all exemplars with high output dominance either from taxonomic categories or from subcategories. Experiment 3 replicated the results for lists of subcategories in Experiment 2 and tested whether the difference found between false recognitions of subcategory lures and taxonomic lures in a goal-derived context (i.e., with presentation of subcategory lists and names), observed in Experiment 2, was due to retrieval monitoring processes (i.e., finding taxonomic lures too distinctive to be falsely recognized). Experiment 4 replicated the previous results using lists in which output dominance of critical lures was better controlled and using a larger sample, allowing more test power to observe the expected effect.

Experiment 1

The aim of Experiment 1 was to test if providing different names (default taxonomic names vs. subcategory names) for the same lists of category exemplars would elicit the development of different semantic organizations and consequently influence the pattern of potential false recognitions. Specifically, participants were presented with the same lists of exemplars from taxonomic categories under either a taxonomic name (e.g., “sports”) or a goal-derived subcategory name (e.g., “sports that are good for backache”) followed by a recognition task that included lures related to the taxonomic category representation and lures related to the subcategory representation. A goal-derived representation of subcategories created online as a result of the subcategory’s name presentation is expected to produce higher levels of false recognition of subcategory lures in comparison to taxonomic lures. A condition where no name was presented before each list was also included. In this case it was assumed that taxonomic categories are likely to work as the default representations of category organization as they are closer to a basic level of classification than the subcategories (Rosch et al., 1976). Thus, the no-name condition is expected to produce a pattern of false recognition similar to the one produced when the taxonomic names are presented.

Method

Participants

Seventy-five undergraduate students from the University of Lisbon (Mage = 19.73 years, SD = 3.05 years, 67 females) participated in the experiment in exchange for course credit.

Material

Ten lists of mixed category representations were created. Each list included five frequently produced exemplars from taxonomic representations (e.g., “sports”) and five frequently produced exemplars from subcategory representations (e.g., “sports that are good for backache”). We ensured that the five exemplars from one structure were also produced in the other structure and that the average output dominance for the 10 exemplars was similar between structures (see Table 1). The exemplars were obtained from Portuguese norms for taxonomic categories, ad hoc categories, and ad hoc subcategories (Soro & Ferreira, 2017). Exemplars were ordered by including first the most frequently produced exemplar from the subcategory followed by the most frequently produced exemplar from the taxonomic category, followed by the second most frequently produced exemplar from the subcategory, and so forth.

Table 1 Example of list used in Experiment 1, with output dominance of critical lures and list exemplars from taxonomic and subcategory categorical representations

In the recognition task, participants were presented with 60 words comprising 30 targets (exemplars from the first, sixth, and tenth positions of each presented list), 20 critical lures (10 from subcategories and 10 from taxonomic categories), and 10 unrelated lures from nonpresented category lists (from Pinto’s, 1992, output dominance norms). The critical lures from subcategories and taxonomic categories were selected in such a way that a critical lure from one had low to no frequency of production in the other; at the same time, their output dominance in their respective categories (i.e., subcategory critical lure in subcategories and taxonomic critical lure in taxonomic categories) were as high and as similar as possible. For instance, in the case of the taxonomic category “sports” and the subcategory “sports that are good for backache,” the taxonomic critical lure “basketball” had an output dominance of .57 in the taxonomic category and .03 in the subcategory, while the subcategory critical lure “yoga” had an output dominance of .59 in the subcategory and .02 in taxonomic category (see Table S1 in Supplemental Material) for the full lists of exemplars and their output dominance across both types of categories). Unrelated lures were selected from nonpresented category lists so that their average frequency of production would be similar to that of the subcategory and taxonomic lures in their respective categories.

Design

The presentation of the lists’ names was manipulated between subjects, so that one group of participants studied the lists under taxonomic names (N = 21), other studied the lists under subcategory names (N = 27), and another group studied the lists without any list names (N = 27). The dependent variables were recognition proportions for targets, subcategory, and taxonomic critical lures and unrelated lures. Sensitivity analysis with 21 to 27 participants for each of the aforementioned conditions at a .05 significance level and power of .80 shows that the smaller magnitude of effect size the design could reliably detect is ηp2 = .12.

Procedure

Participants were instructed to memorize the words presented on the computer screen for a subsequent memory task. A screen preceding each list announced the beginning of a new list for 5 s. In the conditions where the lists were preceded by a name (either subcategory or taxonomic name), the screen also contained the list’s name. Each word was presented individually in the center of the screen for 1.5 s, with a 1-s blank screen between words. The presentation order of the lists was randomized. After presentation of the lists, participants played the game Tetris as a distractor task for 3 minutes, which was followed by the instructions for the recognition task.

In the recognition task, the words were presented individually in a random order and, for each word, participants had to answer if it was old (presented in the studied lists) or new (not presented in the studied lists). The remember–know task was included in the recognition task, but was not the focus of the present paper. For this reason, they were not included in the analyses presented here, but their response proportions are available in the Supplemental Material (see Table S8).

Results

Table 2 displays the proportion of presented (targets) and nonpresented (lures) words recognized in the three name conditions. The overall proportion for hit rates was higher than for false-alarms rates, and false-recognition rates were higher for taxonomic lures than for subcategories lures. This difference is observed in lists presented with taxonomic names and with no names; however, there was no difference when lists were presented with subcategory names.

Table 2 Proportions of veridical and false recognitions under each list name presentation condition in Experiment 1

Hit rates

A one-way ANOVA, with name (subcategory name, taxonomic name, no name) as a between-subjects factor was performed for veridical recognitions and showed no significant differences, F < 1.

False-alarm rates

A 2 × 3 ANOVA, with lure (taxonomic lures, subcategory lures) as a within-subjects factor, and name (subcategory name, taxonomic name, no name) as a between-subjects factor was performed for false recognitions (see Fig. 1). A main effect of lure was observed, F(1, 72) = 9.40, p = .003, ηp2= .11, with taxonomic lures showing higher levels of false recognition than subcategory lures. There was a marginally significant main effect of name, F(2, 72) = 2.99, p = .056, ηp2 = .08. False-alarm rates tended to be higher under no name (M = .30, SD = .22), followed by taxonomic name (M = .23, SD = .21) and subcategory name conditions (M = .18, SD = .15). The interaction between lure and name was also significant, F(2, 72) = 8.10, p < .001, ηp2 = .18. Taxonomic lures were more falsely recognized than subcategory lures under taxonomic name, F(1, 72) = 4.17, p = .045, ηp2 = .05, and under no-name conditions, F(1, 72) = 19.75, p < .001, ηp2 = .21. Under subcategory name, there was no significant difference between false recognition of taxonomic lures and subcategory lures, F(1, 72) = 1.44, p = .234, ηp2 = .02.

Fig. 1
figure 1

Proportions of false recognition with standard error bars for taxonomic and subcategory lures under different names in Experiment 1

Item analysis

Item analysis showed that both types of critical lures varied considerably in their production of false recognitions in all conditions (see Table 3). This variability could indicate the influence of other semantic variables in the production of false recognitions besides lure activation via associative relations and/or shared meaning between list items and critical lure. One relevant difference found was regarding word frequency, where taxonomic lures show higher word frequencyFootnote 1 in average (Mword frequency = 37.36) than in subcategory lures (Mword frequency = 15.31). Effects of word frequency in false recognition are not straightforward. While Roediger et al. (2001) found no effects in false recognition from word frequency, Anaki, Faran, Ben-Shalom, and Henik (2005) found that low-frequency lures have more false recognitions than do high-frequency lures. Göz (2005), on the other hand, found that high-frequency lures have higher rates of false recognition than do low-frequency lures. In the present study, we found no significant correlation between false recognition for each type of critical lure and word frequency. The correlation coefficient for subcategory lures ranged between −.13 and .21 (ps > .562), while for taxonomic lures it ranged from −.21 to −.17 (ps > .568).Footnote 2

Table 3 Proportions of false recognitions for each lure in each list name condition in Experiment 1

Discussion

The same false-recognition pattern for lists presented with a taxonomic name and with no name suggests that participants on these conditions perceived the mixed lists as taxonomic categories. The presentation of subcategory names seems to have provided a context strong enough to disrupt the representation of taxonomic organizations, or to make the taxonomic lures distinctive and thus more promptly discarded by retrieval monitoring processes. However, it did not create a new subcategory representation strong enough to increase false recognition of subcategory lures.

The study of lists of highly dominant exemplars of subcategories along with the subcategories’ names may be necessary to find the expected changes in the categorical representations and, consequently, in the pattern of false memories. Experiment 2 tested for this possibility by using separate lists of exemplars from subcategories and taxonomic categories.

Experiment 2

In Experiment 2, the mixed lists were replaced by lists composed of high output dominance exemplars produced for subcategories and lists composed of high output dominance exemplars produced for taxonomic categories. These lists were presented either with or without the category’s name. Our goal was to test whether the cumulative influence of manipulating the category name and the lists of exemplars could be enough to trigger a new subcategory representation strong enough to overcome the dominance of taxonomic structure.

In two conditions, the lists and category cues were crossed so that participants were presented with lists of frequently produced exemplars from subcategories under taxonomic names and vice versa. This crossover is expected to produce a disruption of representational structure activation.

Method

Participants

One hundred and forty-eight undergraduate students from the University of Lisbon (Mage = 21.37years, SD = 6.87 years, 107 females) participated in the experiment in exchange for course credit.

Material

Fourteen lists were used. Half were composed of exemplars with high output dominance in taxonomic categories (e.g., “sports”) while the other half were composed of exemplars with high output dominance in subcategories based in the same taxonomic categories (e.g., “sports usually played by rich people”). The subcategory lists were selected from the same Portuguese production frequency norms (Soro & Ferreira, 2017) used in Experiment 1. Both types of lists were composed of the 10 most frequently produced exemplars, presented in decreasing order, except for the first most produced one, which was selected as the critical lure. Critical lures were not presented in the list from which they came, nor in the alternative representation’s list (e.g., “soccer,” which is the critical lure for the category “sports,” did not appear in the taxonomic list “sports,” nor in the subcategory list “sports usually played by rich people”). Lists of exemplars and their output dominance are presented in Supplemental Material (Table S2).

The recognition task had a total of 49 items, composed of 14 targets (study words taken from the first and fifth position of the presented lists), seven subcategory lures (the most frequently produced exemplar for each subcategory), seven taxonomic lures (the most frequently produced exemplar for each taxonomic category), and 21 unrelated lures from seven nonpresented taxonomic category lists—the first, second and fifth most produced exemplars according to Pinto’s (1992) output dominance norms.

Design

List type and presentation of category name were manipulated between participants, creating six conditions. Half of the participants studied taxonomic lists under taxonomic names (N = 25), subcategory names (N = 25), or no names (N = 24), and the other half studied subcategory lists under taxonomic names (N = 25), subcategory names (N = 24), or no names (N = 25). The dependent variables were the proportion of recognitions for targets, taxonomic critical lures, subcategory critical lures, and unrelated lures. Sensitivity analysis with 24 to 25 participants for each of the aforementioned conditions, at a .05 significance level and power of .80, shows that the smaller magnitude of effect size the design could reliably detect is ηp2 = .06.

Procedure

Participants were randomly assigned to one of the six conditions. The procedure was the same as in Experiment 1, except for the distractor task (5 minutes of sudoku) and the inclusion of instructions for guess responses in the remember–know task (as in Experiment 1, the results from the remember–know task will not be the focus of analysis, but their response proportions are available in Table S9 in the Supplemental Material).

Results

Table 4 displays proportions of recognitions for targets and lures in the six conditions. Proportions aggregating across subcategory and taxonomic lists are displayed in Table 5. The data show that overall hit rates were higher than false-alarm rates for critical lures, which were higher than unrelated lures rates.

Table 4 Proportions of veridical and false recognitions under the different conditions of name and list presentation in Experiment 2
Table 5 Proportions of veridical and false recognitions for each name presentation condition in Experiment 2

Our main interest is on the differences in false recognition between taxonomic and subcategory lures in the different conditions. Levels of false recognition of taxonomic lures were generally higher than subcategory lures across conditions, especially when the presented lists were taxonomic. However, this pattern was inverted when subcategory lists were presented with subcategory names. In this case, false recognition of subcategory lures was higher than false recognition of taxonomic lures.

Hit rates

A 3 × 2 ANOVA, with name (subcategory name, taxonomic name, and no name) and list (subcategory list, taxonomic list) as between-subjects factors was performed for veridical recognitions. There was only a main effect of name, F(2, 142) = 5.52, p = .005, ηp2 = .07 (see Table 5). A post hoc Tukey test revealed that target recognition levels were lower under no name compared with subcategory name, p = .003, d = .65, and marginally compared with taxonomic name condition, p = .060, d = .45. This suggests that subcategory and taxonomic names provided an organizational advantage at encoding, which later helped with the recognition of targets.

False-alarm rates

A 2 × 2 × 3 ANOVA, with lure (subcategory lure, taxonomic lure) as a within-subjects factor and both list (subcategory list, taxonomic list) and name (subcategory name, taxonomic name, no name) as between-subjects factors was performed for false recognitions (see Fig. 2). There was a main effect of lure, showing that taxonomic lures had more false recognitions than subcategory lures did, F(1, 142) = 7.60, p = .007, ηp2 = .05. There was an interaction between lure and list, F(1, 142) = 11.63, p < .001, ηp2 = .08, indicating that taxonomic lures were more falsely recognized for taxonomic but not for subcategory lists (see Table 4). There was also an interaction between lure and name, F(2, 142) = 6.64, p = .002, ηp2 = .08. Taxonomic lures were more falsely recognized than subcategory lures for taxonomic names and no names, but not for subcategory names (see Table 5).

Fig. 2
figure 2

Proportions of false recognition with standard error bars for subcategory and taxonomic lures for different lists under different names in Experiment 2

More importantly, planned comparisons used to test for differences between false recognition of taxonomic and subcategory lures in the different conditions showed that taxonomic lures were more falsely recognized than subcategory lures when both list and name were from taxonomic categories, F(1, 142) = 7.10, p = .009, ηp2 = .05, and for taxonomic lists with no names, F(1, 142) = 17.25, p < .001, ηp2 = .11. However, there was no difference in false recognition between critical lures when there was a mismatch between list and name—that is, when subcategory lists were presented with taxonomic names, F(1, 142) = 1.26, p = .263, ηp2 = .01, or when taxonomic lists were presented with subcategory names, F < 1. Subcategory lists presented with no name also did not show differences in false recognition between taxonomic and subcategory lures, F < 1. While subcategory name or subcategory list did not produce substantially different patterns of false recognition by themselves, the combination of both did. When subcategory lists were presented with subcategory names, subcategory lures had more false recognitions than taxonomic lures did, F(1, 142) = 6.64, p = .011, ηp2 = .05. This suggests that a cohesive gist from a novel subcategory representation was only consistently evoked by the presence of both subcategory structure and name.

Item analysis

As in the first experiment, subcategory and taxonomic lures varied considerably in their rate of false recognition, in all conditions (see Table 6). Taxonomic lures had higher average word frequency (Mword frequency = 96.87) than did subcategory lures (Mword frequency = 4.88). For both subcategory and taxonomic lures, there were no significant correlations between false recognitions and word frequency. However, while for subcategory lures the correlation coefficients were low to moderate, ranging between .02 and .38 (ps > .406) across all conditions, for taxonomic lures, the correlation coefficients were all negative and above .40 (ps > .200), with the exception of the condition of taxonomic lists presented with no name, r = .11, p = .818. Such negative correlations for taxonomic lures might occur because these lures are activated during the study phase, thus behaving like a presented item in the recognition task, and being more frequently recognized the lower their word frequency, as predicted by the mirror effect (Anaki et al., 2005).Footnote 3

Table 6 Proportions of false recognitions for each lure in each list and name condition

Discussion

The same pattern of false recognitions found for lists presented with taxonomic names and no names indicates a tendency for all lists to be encoded as taxonomic categories. However, the subcategory lists alone caused some disruption in the taxonomic representations, as evidenced by the similar levels of false recognitions for subcategory and taxonomic lures when the list names were not presented (which is not observed for taxonomic lists presented with no name). This disruption is maintained even when a taxonomic name is presented, which could mean that the subcategory list breached the semantic organization induced by the taxonomic name. The subcategory name alone also produced some disruption of the encoding and representation of taxonomic lists, leading to similar levels of false recognition between both lures (possibly due to greater perceived distinctiveness of common lures).

The relative superiority of subcategory false memories when compared with taxonomic false memories emerged only when the subcategory lists were accompanied by their corresponding names. Apparently, the expected online establishment of new subcategory (or activation of rarely used subcategory) representations depends on the presence of both the subcategory name and a list composition that reinforces this name by presenting high output dominance exemplars of the subcategory. It is worth noting that in this condition, the level of false recognition of subcategory lures is close to the level of false recognition of taxonomic lures for taxonomic lists with taxonomic names. This indicates that representations of new subcategories were consistent enough to elicit false recognitions as intrusive as the ones produced for taxonomic categories.

The decrease in false recognition of taxonomic lures compared with subcategory lures in a subcategory context (i.e., subcategory lists under subcategory names) could be interpreted as deriving from strategic processing during recognition, such as retrieval monitoring (Gallo, 2006, 2010), especially when considering that false recognition of unrelated lures was lower when subcategory names were presented with subcategory lists. Taxonomic lures may become highly distinctive when presented at the recognition test after studying subcategory lists with subcategory names. Such distinctiveness could then be used to identify these lures as new items, not presented in the study lists. The decrease in false recognition for unrelated lures in subcategory lists with subcategory names is congruent with such a possibility. Experiment 3 was aimed at clarifying this issue.

Experiment 3

The main goal of Experiment 3 was to test whether the false recognition pattern observed in Experiment 2 could be the result of strategic retrieval monitoring and distinctiveness effects rather than the result of establishing a subcategory concept more consistent than the preexistent taxonomic representation in which the subcategory is embedded. The same subcategory lists of Experiment 2 were presented in the study phase and were followed, in one condition, by a speeded recognition task. Time pressure at test has been shown to hamper strategic memory-editing processes at retrieval, reducing distinctiveness effects (Dodson & Hege, 2005) and increasing the use of familiarity as a criterion for recognition (Benjamin, 2001). The other condition used the standard (self-paced) recognition task used in Experiments 1 and 2.

Method

Participants

One hundred and eighty-three participants, undergraduates from the University of Lisbon (Mage = 24.75 years, SD = 5.07 years, 128 females) participated in the experiment in exchange for gift vouchers.

Material

Experiment 3 used the same seven subcategory lists and the same words that were used in the recognition task of the subcategory list condition of Experiment 2.

Design

Type of name associated to the presented lists (subcategory name, taxonomic name, no name) and type of recognition (self-paced, speeded) were both manipulated between participants, so that half of the participants responded to a self-paced recognition condition for subcategory lists presented with subcategory names (N = 32), taxonomic names (N = 31), or no names (N = 32), and the other half responded to a speeded recognition condition for lists presented with subcategory names (N = 32), taxonomic names (N = 31), or no names (N = 31). The dependent variables were recognition proportion for targets, subcategory lures, taxonomic lures, and unrelated lures. Sensitivity analysis with 31 to 32 participants for each of the aforementioned conditions, at a .05 significance level and power of .80, shows that the smaller magnitude of effect size the design could reliably detect is ηp2 = .05.

Procedure

In the self-paced recognition condition, the procedure was the same as in Experiment 2.Footnote 4 In the speeded condition, participants were instructed to respond as fast as possible. They began by performing a short practice task where the words YES or NO were presented in the screen, and they were asked to respond by pressing the keys “y” and “n,” respectively, to familiarize themselves with the response time frame. Following the practice task, participants were introduced to the recognition task. The words were presented for 250 ms after which participants had 500 ms to respond. If the answer was given after 500 ms, a message was presented asking them to respond faster. If no response was given until 1,500 ms after the response window, the trial ended, and a message instructing participants to respond faster in the next trials was displayed.

Results

In the speeded conditions, the responses given until 1,000 ms (which includes the first 250 ms of word presentation, the 500 ms window of response, and up to 250 ms after the response window) were included in the analyses.Footnote 5 In total, 3.83% of the responses were removed from the analyses for being slower than 1,000 ms (3.59%) or for not being responded to at all (0.25%). Mean reaction time in the self-paced condition was 1,932.24 s (SD = 548.22), and in the speeded condition was 635.64 (SD = 64.68).

Table 7 displays proportions of recognitions for targets and lures in the six conditions. Proportions aggregating across self-paced and speeded conditions are displayed in Table 8. Repeating the pattern found in Experiment 2, false-alarm rates for taxonomic lures were higher than for subcategory lures, except when the subcategory lists were presented with subcategory names, in which case the pattern is inverted.

Table 7 Proportions of veridical and false recognitions under different name presentation conditions in Experiment 3
Table 8 Proportions of veridical and false recognitions under different name presentation conditions in Experiment 3

Hit rates

A 3 × 2 ANOVA, with name (subcategory name, taxonomic name, no name) and recognition (self-paced, speeded) as between-subjects factors, was performed for veridical recognitions. A main effect of recognition showed more hits in self-paced than in speeded recognition, F(1, 177) = 15.51, p < .001, ηp2 = .08, replicating previous results (e.g., Benjamin, 2001; Carneiro et al., 2012; Dodson & Hege, 2005). There was also a main effect of name, F(2, 177) = 4.29, p = .015, ηp2 = .04 (see Table 8). A post hoc Tukey test showed that target recognition was more frequent under subcategory name than under no name, p = .010, d = .53. This suggests an organizational advantage of name presentation for subcategory structures (as in Experiment 2).

False-alarm rates

Proportions of false alarms for unrelated lures differed noticeably between recognition conditions (see Table 7), which could indicate a higher bias of participants in accepting words as recognized. For this reason, false recognitions of taxonomic and subcategory lures were corrected before being included in the analysis (by subtracting the rates of false alarms for unrelated lures for each participant).

A 2 × 3 × 2 ANOVA, with lures (subcategory lures, taxonomic lures) as a within-participants factor, and name (subcategory name, taxonomic name, no name) and recognition (self-paced, speeded) as between-participants factors, was performed on corrected false recognitions (see Fig. 3). There was a main effect of recognition, F(1, 177) = 9.55, p = .002, ηp2 = .05, showing that false recognition was more frequent under speeded than under self-paced recognition. There was an interaction between lure and recognition, F(1, 177) = 8.25, p = .005, ηp2 = .04, showing that speeded recognition increased false recognitions of taxonomic lures compared with subcategory lures, whereas there was no difference between lures in self-paced recognition. There was also an interaction between lure and name, F(2, 177) = 26.31, p < .001, ηp2 = .23, indicating that subcategory names led to higher rates of false recognition for subcategory lures than for taxonomic ones, while the opposite was found for taxonomic names and no names.

Fig. 3
figure 3

Proportions of corrected false recognition with standard error bars of subcategory and taxonomic lures for different recognition tasks under different name presentation conditions in Experiment 3

More importantly, planned comparisons used to test for differences between false recognition of taxonomic and subcategory lures in the different conditions show that taxonomic lures were more falsely recognized than subcategory lures when no names were presented, in both speeded recognition, F(1, 177) = 15.07, p < .001, ηp2 = .08 and self-paced recognition, F(1, 177) = 4.30, p = .039, ηp2 = .02. When taxonomic names were presented, this difference occurred only in speeded recognition, F(1, 177) = 15.47, p < .001, ηp2 = .08, but not in self-paced recognition, F(1, 177) = 1.98, p = .161, ηp2 = .01. However, when subcategory names were presented, subcategory lures were more falsely recognized than taxonomic lures, in both speeded recognition, F(1, 177) = 4.36, p = .038, ηp2 = .02, and self-paced recognition, F(1, 177) = 23.43, p < .001, ηp2 = .12.

Item analysis

Compared with the previous experiments, the lists in the present study showed higher rates of false recognition (see Table 9). The material used was the same as in Experiment 2, so the differences in word frequency remain. The correlations between false recognitions and frequency were statistically nonsignificant. False recognitions for subcategory lures show a positive correlation with word frequency in self-paced recognition with presentation of subcategory names (r = .68, p = .091) and with no names (r = .55, p = .198). The same occurs in speeded recognition with presentation of subcategory names (r = .46, p = .296). For taxonomic lures, the correlations were negative in all conditions (r < −.44, ps > .108). One potential explanation is that taxonomic lures are produced during the study phase, behaving as presented items and producing a mirror effect in comparison with subcategory lures.Footnote 6

Table 9 Mean false recognitions for each lure in each list name condition from each category

Discussion

Whereas in the self-paced condition taxonomic lures and subcategory lures produced similar levels of false recognition in general, in the speeded condition, participants produced more false recognitions for taxonomic lures than for subcategory lures. This pattern suggests that retrieval monitoring processes in self-paced recognition had a greater effect in taxonomic lures, potentially decreasing the false recognition of taxonomic lures when a subcategory list was accompanied by a subcategory name (perhaps by their distinctiveness in this context). However, even when controlling for the effect of retrieval monitoring processes, subcategory lures showed higher levels of false recognition than taxonomic lures in a subcategory context (i.e., subcategory lists presented with subcategory names), which suggests that subcategories can produce substantial false memories as long as they are properly contextualized.

Experiment 4

According to sensitivity analysis, the minimum effect reliably identified was relatively high (ηp2 = .12) for Experiment 1, and moderate for Experiment 2p2 = .06) and Experiment 3p2 = .05). The goal of Experiment 4 was to observe the pattern of results between taxonomic and subcategory lures with a larger sample providing a more adequate power of the test. Experiment 4 used only subcategory lists under different names. The subcategories were the same as those used in Experiment 1 because they allow for better control over the frequency of production between subcategory and taxonomic lures.

Method

Participants

One hundred and ninety-one participants, undergraduates from the University of Lisbon (Mage = 24.93 years, SD = 6.90 years, 145 females) participated in the experiment in exchange for gift vouchers. Assuming an effect size of ηp2 = .06 (medium effect) for the expected pattern of false recognitions, the design would require a total sample of 158.

Material

Experiment 4 used the same subcategories from Experiment 1 (the only exception was “fruits that can be played as marbles” that was replaced by “fruits that can be thrown at other people”). Exemplars with the highest output dominance in the subcategory representation were included in the presented lists. The critical lures (from each taxonomic and subcategory lists) were the same as in Experiment 1 and were selected so that their output dominances were as close as possible (lists of exemplars and critical items, as well as their output dominance, are presented in Supplemental Material, Table S3). The recognition task was composed of the two critical lures, one from each category, two exemplars from the presented lists (the first and the fifth in the presentation order), and two exemplars from 10 nonpresented category lists (the first and the fifth most frequently produced).

Design and procedure

The same lists were presented to all participants. Presentation of lists’ names was manipulated between participants where one group studied lists under no name (N = 63), other group studied lists under taxonomic names (N = 64), and another studied lists under subcategory names (N = 64). The dependent variables were recognition proportions for targets, subcategory, and taxonomic lures. The procedure was the same as in Experiment 2.Footnote 7 Sensitivity analysis with 63 to 64 participants for each of the aforementioned conditions, at .05 level of significance and power of .80, shows that the smaller magnitude of effect size the design could reliably detect is ηp2 = .05

Results

Recognition proportions of targets, critical lures, and unrelated lures are presented in Table 10. The table also presents the overall recognition means for items and for name presentation condition.

Table 10 Proportions of veridical and false recognitions under each list name presentation condition in Experiment 4

Hit rates

A one-way ANOVA, with Name (subcategory name, taxonomic name and no-name) as between-subjects factors was performed for veridical recognitions. There were no significant differences in target recognitions between list name conditions, F < 1.

False-alarm rates

A 2 × 3 ANOVA, with lure (taxonomic lures, subcategory lures) as a within-subjects factor and name (subcategory name, taxonomic name and no-name) as a between-subjects factor was performed on false recognitions (see Fig. 4). The only significant result was a Lure × Name interaction, F(2, 188) = 23.53, p < .001, ηp2 = .20, in which taxonomic lures were falsely recognized more often than subcategory lures under both taxonomic names, F(1, 188) = 13.53, p < .001, ηp2 = .07, and under no names, F(1, 188) = 8.21, p = .005, ηp2 = .04. Under subcategory names, however, subcategory lures were falsely recognized more often than taxonomic lures were, F(1, 188) = 26.01, p < .001, ηp2 = .12.

Fig. 4
figure 4

Proportions of false recognition with standard error bars for taxonomic and subcategory lures under different names in Experiment 4

Item analysis

As in the previous experiments, false recognition of taxonomic and subcategory lures varied considerably between lists (see Table 11). Mean word frequency was higher for taxonomic lures (Mword frequency = 31.41) than for subcategory lures (Mword frequency = 16.08). Correlations between false recognitions and word frequency were not significant; nonetheless, subcategory lures showed positive correlations with word frequency (r = .52, p = .120, under no name, and r = .55, p = .101, under subcategory name). Taxonomic lures have a negative correlation with word frequency under no name (r = −.39, p = .267) and under subcategory names (r = −.48, p = .160), but not under taxonomic names (r = .11, p = .751).Footnote 8

Table 11 Proportions of false recognitions for each lure in each list name condition in Experiment 4

Discussion

Experiment 4 presents a pattern of results similar to the previous experiments. False recognition of subcategory lures from goal-derived subcategories were consistently higher than false recognitions of taxonomic lures from default category representations, when in the appropriate context (list of exemplars who could be associated to the goal-derived subcategory and presentation of a name). This pattern of results was observed with a more controlled material in terms of output dominance of the critical lures and with a sample size that provided a more adequate power of detecting the expected effect.

General discussion

In four experiments, we found that subcategories from broader taxonomic categories (Barsalou, 1985) generated goal-derived semantic relations capable of interfering with the false memories induced by preexistent relations from these taxonomic categories. In Experiments 24, these goal-derived semantic relations were consistent enough as to reverse the pattern of false memories such that false recognitions produced by subcategories became more frequent than false recognitions produced by taxonomic categories when they were presented with their names.

In Experiment 1, we used hybrid lists, such that half of each list of exemplars was composed of high output dominance exemplars from a taxonomic category, whereas the other half corresponded to high output dominance exemplars from the corresponding subcategory. Participants studied these lists under taxonomic names, subcategory names, or with no name introducing each list. Results showed substantially more false recognitions for taxonomic lures than for subcategory lures, unless they were presented with subcategory names, in which case there was no significant difference in false recognition between the two types of lures.

Experiment 2 followed the procedure of Experiment 1, except that the lists presented were exclusively composed of exemplars of taxonomic or of subcategories. A clear pattern of context-specific representation emerged for subcategory lists when subcategory names were presented, such that, in these cases, subcategory lures were more falsely recognized than taxonomic lures.

Experiment 3 replicated Experiment 2 results and included a speeded recognition task to test whether the false recognition pattern observed in Experiment 2 could stem from strategic retrieval monitoring and distinctiveness effects rather than the establishment of goal-derived subcategories. The comparison between the self-paced and the speeded recognition conditions revealed that retrieval monitoring processes might have affected results by decreasing the rate of false recognition of taxonomic lures. Nonetheless, even when processes of retrieval monitoring were hindered by speeded recognition, subcategory lures were more falsely recognized than taxonomic lures in a subcategory context. This pattern of false recognitions reinforces the notion that novel (goal-derived) semantic relations between concepts that already share other preexistent and stable (taxonomic) semantic relations can produce specific memory intrusions.

The decrease in taxonomic false recognitions found in subcategories lists (Experiment 2) is likely to also result from a corresponding decrease in output dominance of these lists’ exemplars in the graded structure of the broader taxonomic categories (in which the subcategories lists were embedded). In accordance with this interpretation, studies of category learning show that when a more diverse sampling of the category is presented from the beginning (compared with when the most typical exemplars of a category are presented together in the beginning of the task), subjects identify new exemplars of the category less accurately and make less extreme typicality ratings (Elio & Anderson, 1984).Footnote 9 Importantly, this decrease in false recognitions also implies category malleability, in the sense that false memories for common categories are not the product of a default representation activated by the presented gist (the category’s name), but stem from malleable categorical representations that change according to the structure of the encoded stimuli.

Experiment 4 used a better controlled set of materials and a larger sample to provide a more adequate power for detecting the expected effect size. Results showed higher level of false recognitions of taxonomic lures when compared with subcategory lures under the taxonomic name and no-name conditions. However, under subcategory names, subcategory lures were falsely recognized more often than taxonomic lures were. In other words, the presentation of subcategory names was enough not only to disrupt the use of default taxonomic organizations (as happened in Experiment 1) but also to create a new subcategory representation cohesive enough to increase false recognition of subcategory critical lures.

The obstacles in developing new semantic subcategory representations in the presence of well-established taxonomic ones, verified in the experiments here, bears similarities to results concerning episodic priming effects in newly acquired associations between items (Dagenbach, Horst, & Carr, 1990). These authors found evidence that episodic priming has fewer chances of occurring between words that integrate preexisting associative networks than between words that are previously unrelated. Similarly, in our studies, a consistent representation of a “new” goal-derived subcategory strong enough to create false memories would have to bypass or at least prevail over the preexistent default semantic relations entailed in their status as a member of a taxonomic category. Given these obstacles, it is important to note the consistency with which items from the same taxonomic category could produce patterns of false recognitions that are different from the ones predicted by the common taxonomic representation of the category. In the specific settings of our manipulations of list names and list structure, subcategories reliably produced false memories across three studies with false recognition rates of the subcategory lures comparable with those found for taxonomic lures.

The goal-derived nature of the subcategories used here and the different patterns of false recognition produced under different category labels suggests that these false recognitions are not solely due to automatic and relatively “passive” activation of preexistent associations. They more likely stem from convergent conceptual processing of the material (prompted by the implicated goals), which led to the extraction of “new” or less accessible semantic features of the lists exemplars. Certainly, by using exemplars from the same taxonomic category, some level of associative relation is expected between items on the lists, but, according to the AMT, they should accrue in the most frequently produced exemplar from the taxonomic category. Thus, whereas a fluctuation of false recognition of taxonomic lures is expected when the composition of the list changes, the emergence of new false recognitions of subcategory lures is not. In other words, the AMT’s account of categorical false memories does not seem to be enough to explain the patterns of false recognition of subcategory lures. FTT, on the other hand, can more easily accommodate the results found, because it focuses on the convergence of conceptual meaning by extraction of the gist of the exemplars (and ultimately of the list), which is what happens in the case of the subcategory lists. In this case, however, subcategories’ gist is not spontaneously extracted due to the prevalence of the taxonomic representation. The establishment of the specific subcategory representations seems to be dependent on context cues (subcategories lists and labels). While this may seem a too narrow scenario for the emergence of these false memories, the reported effects are nevertheless compelling because they stem from subcategories that often serve the pursuit of one’s objectives in everyday contexts. Under such circumstances, people are more likely to be aware of the concept behind the subcategory (its name).

By broadening the scope of the occurrence of false memories to goal-derived semantic organizations, these findings support a more dynamic and flexible view concerning the origins of false memories. The goal-derived aspect of the subcategories makes them more similar to representations used in real-world environments, allowing the exploration of false memories with extended implications for everyday life contexts while taking advantage of the simplicity and controllability of a DRM-like paradigm. The results also provide further evidence of the impact that meaning extraction processes can have on the production of false memories, in opposition to purely associative ones.

In sum, false memories are based not only on associative strength in lexical networks and similarity in semantic networks but may also be the result of the categories people derive when making plans and pursuing goals in their daily lives. In this sense, false memories are not necessarily the hallmarks of well-established representations. They may often be the cognitive costs of active planning carried out by the cognitive system.

Limitations and future research

Subcategory representations are strictly identified in the present experiments, as they hinge on the proportion of false memories of a single word (selected to be the critical lure) from each list. Future research would benefit from using other measures to capture the emergence of new subcategory structures. For instance, new studies could use a larger variety of critical lures in the recognition tasks, including free recall tests, and assess the subcategories’ graded structure through other measures (e.g., typicality or ideals) besides output dominance. These measures could help to explain whether other processes, besides gist meaning, associations, or semantic relationships, may be involved in the production of false memories for subcategories, as well as to capture consistent variations in the representation of taxonomic categories according to variations in the exemplars presented during encoding.

In all the presented studies, the lists varied in how frequently they produced false recognitions. This invites further exploration of the item features that might give rise to such variability. Indeed, the material was not controlled for other variables that may affect (false) recognition. Correlations between word frequency and false recognition, from each lure used, did not reach statistical significance. Nevertheless, the fact that taxonomic lures have higher word frequency than subcategory lures do could be affecting the results. The interactions found between name presentation conditions could not be explained by word frequency or other semantic variable alone, but these variables could condition the size of the differences found in each condition. Further studies should better control word frequency in the material used as well as explore other item features that may affect false memories for goal-derived categories.

The manipulation of name presentation for the lists was quite simple and straightforward. More engaging and goal-oriented context manipulations could lead to the development of clearer conceptual structures and, as a result, clearer changes in the frequency of memory illusions. For example, requesting participants to actively imagine planning a picnic before list presentation could activate schematic knowledge such as “where to go,” “what to take,” and “how to get there,” potentially increasing the number of specific false memories about subcategories like “places to have a picnic,” “food usually taken for a picnic,” and “tools useful in a picnic.” More generally, priming a goal-derived scenario that activates the representation of the subcategories presented in the encoding phase should increase memory illusions.

Conclusion

Semantic relations established during the study of category lists have the capacity to affect memory illusions despite the preexistent semantic relations among the same stimuli. This suggests that the constructive nature of memory builds on dynamic categorical relations that are instantiated in flexible and adaptive ways to serve new goals. By exploring such psychological processes of meaning making, our goal was to pave the way for future research that may further close the gap between fundamental research on categorical false memories and the practical use people make of categories.