Practice makes imperfect: Working memory training can harm recognition memory performance

Matzen, Laura E.; Trumbo, Michael C.; Haass, Michael J.; Hunter, Michael A.; Silva, Austin; Stevens-Adams, Susan M.; Bunting, Michael F.; O’Rourke, Polly

doi:10.3758/s13421-016-0629-4

Practice makes imperfect: Working memory training can harm recognition memory performance

Published: 05 July 2016

Volume 44, pages 1168–1182, (2016)
Cite this article

Download PDF

Memory & Cognition Aims and scope Submit manuscript

Practice makes imperfect: Working memory training can harm recognition memory performance

Download PDF

Laura E. Matzen¹,
Michael C. Trumbo^1,2,
Michael J. Haass¹,
Michael A. Hunter²,
Austin Silva¹,
Susan M. Stevens-Adams¹,
Michael F. Bunting³ &
…
Polly O’Rourke³

3993 Accesses
6 Citations
72 Altmetric
10 Mentions
Explore all metrics

Abstract

There is a great deal of debate concerning the benefits of working memory (WM) training and whether that training can transfer to other tasks. Although a consistent finding is that WM training programs elicit a short-term near-transfer effect (i.e., improvement in WM skills), results are inconsistent when considering persistence of such improvement and far transfer effects. In this study, we compared three groups of participants: a group that received WM training, a group that received training on how to use a mental imagery memory strategy, and a control group that received no training. Although the WM training group improved on the trained task, their posttraining performance on nontrained WM tasks did not differ from that of the other two groups. In addition, although the imagery training group’s performance on a recognition memory task increased after training, the WM training group’s performance on the task decreased after training. Participants’ descriptions of the strategies they used to remember the studied items indicated that WM training may lead people to adopt memory strategies that are less effective for other types of memory tasks. These results indicate that WM training may have unintended consequences for other types of memory performance.

Does working memory training have to be adaptive?

Article 26 February 2015

Divergent Research Methods Limit Understanding of Working Memory Training

Article 09 May 2019

Training-induced improvement in working memory tasks results from switching to efficient strategies

Article Open access 15 October 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Working memory (WM) refers to the brain system used for storage and manipulation of transitory information necessary for complex tasks such as learning, reasoning, and language comprehension (Becker & Morris, 1999). Recent research has indicated that WM training, where people repeatedly practice increasingly difficult WM tasks, can improve both WM capacity and other aspects of cognitive performance (cf. Jaeggi, Buschkuehl, Jonides, & Perrig, 2008). Improved performance on a trained task should bolster performance for additional domains and tasks to the extent that they rely on overlapping cognitive abilities or share neural systems (Dahlin, Neely, Larsson, Backman, & Nyberg, 2008). Because WM has been identified as a central component of general cognition (Engle, Tuholski, Laughlin, & Conway, 1999), the conjecture is that improvement on the trained WM task will result not only in near transfer (improvement on other WM tasks, such as a spatial WM task following training of a verbal WM task, indicative of heightened WM capacity) but also in far transfer (improvements in other domains, such as fluid intelligence tests).

Fluid intelligence refers to those aspects of intelligence that allow for adaptive reasoning and problem solving (Carpenter, Just, & Shell, 1990), and is a construct predictive of academic achievement (Rohde & Thompson, 2007). Previous research has demonstrated that WM capacity and fluid intelligence are strongly related constructs, sharing approximately 50% of their variance (Kane, Hambrick, & Conway, 2005). Fluid intelligence was largely thought to be immutable, but a study by Jaeggi and colleagues (2008) showed that WM training could improve fluid intelligence performance. This finding is significant from a theoretical as well as a practical perspective, and many additional studies have investigated the benefits of WM training for fluid intelligence and other types of cognitive processes. These studies have shown that WM training can improve episodic memory (Rudebeck, Bor, Ormond, O’Reilly, & Lee, 2012), attention (Chein & Morrison, 2010), and can provide general cognitive enhancement for children (Alloway, Bibile, & Lau, 2013). Additionally, WM training has been proposed as a remediating intervention for populations such as adults with amnestic mild cognitive impairment (Carretti, Borella, Fostinelli, & Zavagnin, 2013) and children with dyslexia (Luo, Wang, Wu, Zhu, & Zhang, 2013) or ADHD (Holmes et al., 2010).

However, other research paints a less optimistic view of the benefits of WM training. The idea that WM training can improve fluid intelligence runs contrary to more than a century of research on cognitive training within psychological and educational science. Numerous studies have demonstrated that although task-specific performance commonly increases with training, transfer of this learning to other tasks or domains is rare (Chase & Ericsson, 1981; Ericsson & Delaney, 1998; Healy, Wohldmann, Sutton, & Bourne, 2006; Singley & Anderson, 1989; Thorndike & Woodworth, 1901). Several recent studies of WM training have failed to find near or far task transfer (Chooi & Thompson, 2012; Redick et al., 2013; Thompson et al., 2013), and a recent meta-analysis concluded that while WM training consistently produces near-transfer effects, these effects tend to be short-lived, and improvement fails to generalize to other domains (Melby-Lervåg & Hulme, 2013). Furthermore, researchers have cited a number of methodological concerns within the WM training field, including use of single tasks to define WM change, inconsistent use of valid WM tasks, comparison of trained groups to a no-contact control group, and subjective measurements of change (Shipstead, Redick, & Engle, 2012).

Given the mixed results regarding the transfer of WM training to other tasks and the ability of WM training to improve fluid intelligence, it seems prudent to approach this topic with a dose of skepticism. In this study, we sought to examine the effects of WM training on both untrained WM tasks and on a verbal recognition memory task. Although WM training has been touted as improving academic success, there has been little research on how WM training impacts other types of memory that are also crucial for learning, such as recognition memory. In one of the few studies to test the effects of WM training on recognition memory, Rudebeck and colleagues (2012) showed that spatial WM training can benefit performance on visual recognition memory tasks. However, it remains unclear whether WM training can improve verbal recognition memory, a type of memory that is particularly important in most educational settings.

With this study, we also sought to address some of the methodological issues that have made it difficult to interpret the results of prior WM training studies (cf. Redick et al., 2013; Shipstead, Redick, & Engle, 2010). Specifically, this study used both a no-contact control group and an active control group in which participants were trained to use mental imagery as a memory strategy. Participants were assigned semirandomly to the three experimental groups (the groups were balanced by gender and age). We selected mental imagery training as the active control condition because mental imagery is long-established as an effective technique for improving recognition memory performance (Paivio, 1971; Prestianni & Zacks, 1974). In theory, both training techniques should improve recognition memory performance, but it was unclear how they would stack up against one another in practice. The two types of training are fundamentally different in terms of the role of memory strategies in the training tasks. Using mental imagery as a memory aid is clearly a strategy, whereas there is debate about the role of strategy in WM tasks, which are intended to be process training, rather than strategy training (Lövdén, Bäckman, Lindenberger, Schaefer, & Schmiedek, 2010). Although several studies have found that strategy use improves performance on WM tasks (McNamara & Scott, 2001; Turley-Ames & Whitfield, 2003), the fundamental goal of WM training is to enhance a basic cognitive ability that can translate to improved performance in tasks that were not trained. Thus, adaptive WM training regimens intentionally discourage participants from developing task-specific memory strategies (e.g., Jaeggi et al., 2008). In this study, we sought to explore hypotheses about transfer of training from strategy-based versus process-based memory training regimens.

Task selection and hypotheses

All participants completed the same battery of memory tasks before and after training. The battery included a verbal WM task (listening span), a spatial WM task (rotation span), and a verbal recognition memory task. Participants were semirandomly assigned to one of three memory training groups. The participants in the control group received no memory training. The participants placed in the WM training group completed a series of training sessions that consisted of an adaptive n-back task and an adaptive symmetry span task. These tasks are similar to those that have been used in prior WM training studies (cf. Melby-Lervåg & Hulme, 2013) and were intended to target both verbal and spatial WM. In accordance with past research and with commercial memory training programs, the tasks adapted their difficulty based on the participant’s performance.

The participants in the mental imagery training group were trained to create vivid mental images as an aid for memorizing lists of words. They were shown examples using both concrete and abstract words, and practiced using a mental imagery strategy on a series of short recall tasks. The imagery training was intended to improve participants’ memory strategies by teaching them to associate the to-be-remembered items with concrete, vivid, and meaningful or bizarre mental images, all qualities that should improve subsequent memory performance (Baddeley & Andrade, 2000; Nelson & Schreiber, 1992; Paivio, 1965; Paivio, Walsh, & Bons, 1994; West & Holcomb, 2000).

We hypothesized that the participants in the control group would not exhibit any significant differences in performance between the pre- and posttraining baseline tasks. For the participants in the imagery training group, we expected that using a mental imagery memory strategy would lead to a general improvement on the recognition memory task after training. In addition, because strategy use has been shown to improve performance on WM tasks (McNamara & Scott, 2001; St. Clair-Thompson, Stevens, Hunt, & Bolder, 2010; Turley-Ames & Whitfield, 2003), we hypothesized that the imagery training group’s performance on the WM baseline tasks would also improve after training. Because the imagery training involved memorizing lists of words, the recognition memory task can be thought of as a near-transfer task and the WM baseline tasks as far-transfer tasks for participants in the imagery training group. For the WM training group, we expected to see near transfer in which participants improved their performance on the WM baseline tasks after training. We also predicted that there would be far transfer of the WM training to the recognition memory task. Specifically, if WM training improves WM capacity, we would expect to see higher performance for the once-presented and repeated words, which may be encoded better after spending more time in WM (Braun & Rubin, 1998).

The recognition memory task included several conditions that allowed us to investigate the impact of WM training on repetition effects, spacing effects, and testing effects. Analyses of these effects before and after training are presented in the Supplemental Materials.

Method

Each participant in the experiment completed tasks over the course of a 5-week period. During the first week, participants completed pretraining baseline memory tasks that included a verbal WM task (listening span), a spatial WM task (rotation span), and a verbal recognition memory task. During the next 3 weeks, participants completed memory training sessions that differed based on the training group to which they were assigned. Participants assigned to the mental imagery training group completed three training sessions (one per week), and participants in the WM training group completed 14 training sessions (four to five per week) during the 3-week training period. Participants assigned to the control group did not complete any tasks during the training period. At the end of the training period, all participants completed the same baseline tasks for a second time. Each of the baseline and training tasks is described in detail.

Participants

Eighty-six participants recruited from the employee population of Sandia National Laboratories participated in this experiment and were paid for their time. All were right-handed, had no early exposure to languages other than English, and had no history of neurological disease or defect. Participants were assigned semirandomly to one of the three training groups (each participant’s age and gender were taken into account in group assignment to keep the groups as demographically balanced as possible). Eight participants dropped out of the study before completing all of the sessions (one from the control group, four from the mental imagery group, and three from the WM training group), and four additional participants (two each from the control group and WM training group) failed to follow instructions and were excluded from the data analysis. Of the remaining 74 participants, 25 (12 female) were in the control group, 24 (10 female) were in the imagery training group, and 25 (13 female) were in the WM training group. The mean age for all of the participants was 37 years (range 18–63 years). The mean ages for each group were 37 years for the control group (range 18–61 years), 39 years for the imagery training group (range 18–63 years), and 35 years for the WM training group (range 20–63 years). Figure S1 in the Supplemental Materials shows the distributions of age and educational background for the participants in this study.

Baseline tasks

Listening span task

Based on Daneman and Blennerhassett (1984), the listening span task required participants to recall a sequence of symbols in the order in which they were presented. The presentation of the symbols was interleaved with the auditory presentation of sentences. Participants had to indicate whether the sentences made sense or not. Participants practiced the two tasks separately, and then performed both in the dual-task phase.

Materials

Materials for the memory task were nine black Wingdings symbols in size 24 font presented against a white background:

The secondary task was comprised of 110 spoken sentences, half of which were sensible (“They gave the waiter a tip even though he was rude”), and half of which were not (“The children were summer and wanted their parents to come home”). The same speaker was used for all sentence recordings.

Procedure

The listening span task had three phases, the first of which was a memory task involving the sequences of symbols. Participants saw a series of symbols that were presented for 1,000 ms each in the center of a computer screen. They were then shown a recall screen that displayed all of the symbols and were asked to select the symbols in the order in which they had appeared. Participants could edit their selections and insert “blanks” into the sequence in place of symbols that they could not recall. Following each response screen, participants saw feedback on their performance (i.e., “You recalled X of Y items correctly”).

The second task required participants to listen to sentences and judge their sensibility. During sentence presentation, participants received instructions to click the mouse once they could tell whether the sentence made sense or not, at which point the sentence stopped playing and they responded by clicking either the yes or no button on the screen. Participants received feedback on their accuracy after each response.

During the dual-task phase, participants saw a new symbol after judging each sentence. The recall screen appeared after a sequence of four to eight symbols had accrued. Each participant saw two sequences of four and five symbols, and three sequences of six, seven, and eight symbols. The different sequence lengths were randomly ordered for each participant and for each session (pre- and posttraining).

Rotation span task

Based on Shah and Miyake (1996), this task required participants to recall sequences of arrows of varying length and orientation. The presentation of the arrows was interleaved with the presentation of letter characters. Participants had to make a judgment as to whether the letters appeared normally or backward. After each block, participants were asked to recall the sequence of arrows. As in the listening span task, participants practiced the two tasks separately, and then performed both in the dual task phase.

Materials

For the memory task, the item set was comprised of pictures of long and short arrows at eight orientations, (0°, 45°, 90°, 135°, 180°, 225°, 270°, 315°). The secondary task used five letters, (R, L, J, G, and F), at eight different orientations, (0°, 45°, 90°, 135°, 180°, 225°, 270°, 315°), with both normal and backward versions (flipped along the vertical axis). All of the stimuli where white and were presented on a black background.

Procedure

On each trial, participants saw a sequence of rotated letters, each of which was followed by an arrow. For each letter, the participants had to press a key on the keyboard to indicate whether the letter was presented normally or backward. The letter remained on the screen until the participant made a response. Then an arrow was presented for 1,000 ms. The trials varied in length and contained between two and five arrows. After the last arrow was presented, participants were asked to recall the sequence of arrows they had seen in that trial. The recall screen showed all 16 possible arrows (long or short arrows at each of eight orientations). Participants clicked on the arrows to indicate which arrows had appeared in the previous sequence, in the order they appeared. As in the listening span task, participants could edit their response and insert blanks in the place of arrows they had forgotten. When the participants completed their responses for this trial, they clicked the “next” button to advance to the next trial.

Recognition memory task

In the recognition memory task, participants were shown a list of common English nouns and were asked to memorize them for a subsequent recognition test. The task was designed to include several conditions with varying levels of difficulty. There were words that were studied only once, words that were repeated at short (one intervening item) and long (nine intervening items) lags, and words that were quizzed at short or long lags within the study block. We expected that the testing effect (Karpicke & Roediger, 2008) would make the quizzed words easiest to remember on the subsequent memory test, whereas the words that were repeated but not quizzed would be more difficult to remember. For the repeated words, we expected that the spacing effect (Melton, 1970) would lead to better performance for the words that were repeated after a longer lag. We expected the poorest memory performance for the words that were studied only once.

Materials

The recognition memory task used a list of 1,344 words, all of which were common English nouns. The average length of the nouns was five letters, and their average frequency was 55.67 (based on the Kucera & Francis, 1967, norms included in Balota et al., 2002). The words were assigned to counterbalanced experimental lists such that every word appeared in every study and test condition across lists.

The experimental lists were divided into six study-test blocks with equal numbers of each item type in each block. The words were placed in a pseudorandom order within the blocks such that no more than three items in the same condition appeared in sequence. Within each study-test block, there were 28 words that were studied once, 14 words that were studied twice with a short lag between repetitions, 14 words that were studied twice with a long lag between repetitions, 14 words that were studied and then quizzed after a short lag, and 14 words that were studied and then quizzed after a long lag. In addition to the studied items, there were 28 words that served as new items for the quizzes within the study blocks (these words were quizzed but had not been studied) and 84 words that served as new, unstudied items in the subsequent recognition test. In total, each study block contained 112 study words (including repeated study words) and 56 quizzed words. Each test block contained 168 test words, half of which had been studied and half of which were new.

Three of the study-test blocks were presented to each participant during the pretraining baseline session, and the other three were presented during the posttraining session. The placement of the blocks (pre or posttraining) was counterbalanced across participants.

Procedure

The participants were instructed that they would be tested on their memory for a list of study words. Throughout each task, a fixation cross was shown in the center of the screen. Prior to the presentation of each word, a yellow or red dot appeared on the screen immediately above the fixation cross. Participants were instructed that the yellow dot indicated that the next word was a study word and that they should silently read that word and try to remember it for later. They were told that the red dot indicated that the next word was a quiz word, and that following the word they should press a button to indicate whether or not they had studied that word earlier in the session. The study or quiz word was presented 600–800 ms after the dot disappeared and remained on the screen for 1 second. The words were presented immediately above the fixation cross. If the word was a quiz word, it was followed by a red question mark that remained on the screen until the participant pressed a response button. Participants pressed one of two buttons on a game controller, labeled yes and no, to indicate whether they remembered studying that word. They did not receive feedback about the accuracy of their responses.

At the end of each study block, participants took a short break before beginning the test block. All of the words in the test block were presented in the same way as the quizzed items from the study block. Each test word was preceded by a red dot and followed by a question mark. While the question mark was on the screen, participants pressed the yes or no button to indicate whether they remembered studying that word during the study block. As in the study block, they did not receive feedback about the accuracy of their responses. It took participants approximately 10 minutes to complete each study block and 12 minutes to complete the corresponding test block.

Training tasks

Mental imagery training

In the three weeks in between the pretraining and posttraining baseline sessions, 24 of the participants completed three memory training sessions in which they practiced using a mental imagery strategy to remember word lists for a free recall test. The training sessions became more difficult as the participants progressed by using longer word lists, shorter encoding times, and more words with low imagability.

Materials

The memory tests used in the mental imagery training consisted of 168 nouns. Care was taken to ensure that none of the words used in the training sessions appeared in any of the pretraining or posttraining baseline tasks. Of the 168 nouns, 49 had low imagability (ratings below 400 in the norms included in the MRC Psycholinguistic Database; Wilson, 1988) and the remainder had high imagability (ratings above 550).

Procedure

The mental imagery training consisted of three sessions, and participants were asked to complete a session once a week for 3 weeks, for a total of 90 minutes of training (assuming completion of all three sessions). Each session took approximately half an hour to complete. During the training, the participants were then given examples of mental imagery. The examples, which included both concrete and abstract concepts, explained that creating detailed and unusual mental images could be helpful for remembering information. After a short practice session in which participants were asked to generate and describe mental images for a short list of words, the training provided examples of grouping several mental images into one scene to increase their memorability. The participants were then asked to practice the mental imagery strategy by memorizing two lists of words, each of which was followed by a recall test. During the first practice list, the participants controlled how long the study words were presented. For the second practice list, each word was presented for 3 seconds. Each practice list contained 10 words, and participants had 10 chances to enter the words during the recall test. Participants received feedback after each entry. After the memory test, the participants were asked to describe the mental images that they had generated for the word list and to rate the effectiveness of the mental imagery strategy.

In the second and third training sessions, participants saw a brief review of the examples of mental imagery that were presented in the first session and were then asked to practice the mental imagery strategy while completing memory tests with the same format as the tests used in the first training session. As the training progressed, the study lists became longer and included more words with low imagability. The encoding time per word also decreased to 2 seconds. The structure of each study list is shown in Table S1 in the Supplemental Materials.

Working memory training

The 25 participants in the WM training group were trained on two tasks, the adaptive n-back task and the symmetry span task. Participants were loaned a laptop containing the two tasks and were asked to do each task once on every business day for 3 weeks. They were told that they could skip the training on one day of their choice, for a total of 14 sessions. Each training session lasted approximately 25 minutes, for a total of 350 minutes of training (assuming completion of all 14 sessions).

Adaptive n-back task

In the adaptive n-back task, sequences of 25 single letters appeared one at a time on the screen, and participants were asked to indicate with a button press whether the current letter matched the letter that appeared n items earlier. For example, if subjects were shown the sequence A-B-C-B in a two-back task, they would indicate that the second “B” was a target because it matched the letter that had appeared two letters back. They would respond “nontarget” to the other items in the sequence. Within each sequence of 25 items, five were targets and up to five items were lures, whereas the rest of the items were nontargets (see also Sprenger et al., 2013). The lures were letters that matched a letter that had appeared in position n - 1 or n + 1 (Kane, Conway, Miura, & Colflesh, 2007). For example, in the sequence A-B-A-C-D in a three-back task, the second A is a lure (an n - 1 lure) because it repeats a letter that appeared recently, but it is not a correct match for the three-back task. There were three lure conditions that corresponded with three levels of difficulty (no lures, n + 1 lures only, and both n + l and n - 1 lures). The difficulty level of the task changed based on the participant’s performance. When participants achieved at least 85% accuracy for one sequence of 25 letters, difficulty was increased for the next sequence by increasing the lure difficulty level. Once participants achieved at least 85% accuracy for a given n at the highest level of lure difficulty, n was increased by one, and the level of lure difficulty was reset to the lowest level (no lures). If accuracy fell below 65%, task difficulty decreased for the next sequence, first by decreasing lure difficulty and then by decreasing n. Task difficulty, therefore, represented both the value of n and the lure difficulty level. Each training session consisted of nine sequences of letters and lasted for approximately 10 minutes. Performance was scored by computing the response accuracy for each level of n and the mean level of n achieved during the training session. All participants started the first training session at a difficulty level of n = 2 with no lures.

Symmetry span task

The symmetry span task required participants to remember the locations of a sequence of blocks that appeared in a 4 × 4 grid, in the order in which they were presented. The blocks were presented serially, for 650 ms each. The presentation of the blocks was interleaved with the presentation of a design on an 8 × 8 grid. Participants had to determine if the design was symmetrical across the vertical axis. At the end of a series of these presentations, participants saw a blank 4 × 4 grid and were asked to recall the positions and order of the to-be-remembered blocks by clicking on the grid. The score was the number of blocks recalled in the correct serial order. The difficulty of this task was adjusted by changing the number of blocks that the participant needed to remember. Performance was evaluated after four sets of memory responses. If the participant got three or more correct, the sequence length increased by one for the next set of four. Conversely, if performance fell below two correct, the sequence length decreased by one for the next set. All participants started with a sequence length of three blocks. Each training session lasted for 15 minutes, and the number of sets included in each session varied depending on the sequence length and how many sequences the participants were able to complete in 15 minutes.

Memory strategy survey

At the end of their participation in the study, the participants were asked to complete a follow-up questionnaire about their use of various memory strategies in the baseline memory tasks. Fifty-six of the participants completed the survey (18 from the control group, 19 from the mental imagery training group, and 19 from the WM training group). The first part of the survey asked participants to describe their memory strategy for each task. The second part gave examples of different memory strategies (mental imagery, generating sentences or stories, linking items to one another, rehearsal and self-quizzing) and participants were asked if they had used those strategies on the pre- and posttraining memory tasks.

Results

Mental imagery training

Twenty-three of the 24 participants completed all three of the imagery training sessions, and one participant completed only two of the training sessions. The average number of words recalled by the participants remained fairly consistent across the 14 memory tests used in the imagery training session, even as the encoding task became more difficult (longer word lists, shorter encoding times, more abstract words). Figure 1 shows the average number of words recalled on each memory test. Participants recalled an average of 8.33 words per test list during the first training session, 7.42 words per list during the second training session, and 8.04 words per list during the third training session. A repeated-measures ANOVA showed that there was a significant effect of test number on the number of words recalled on each test, F(13, 284) = 3.68, p < .001, η_p ² = 0.14. Paired t tests comparing average performance in each of the three training sessions showed that participants performed significantly worse during the second training session relative to the first session, t(23) = 2.87, p < .01, Hedges’s g _av = 0.63, and significantly better during the third training session relative to the second training session, t(22) = 2.67, p < .01, Hedges’s g _av = 0.38. As the memory tests became more difficult, participants reported that it became more difficult to create mental images for the word lists, and they felt that the imagery strategy was less effective for the more difficult lists. Additional analyses are presented in the Supplemental Materials.

Working memory training

Twenty-four of the 25 participants in the WM training group completed at least 12 of the 14 WM training sessions, and one participant completed nine of the training sessions. The participants who completed at least 12 of the WM training sessions were included in the analysis of the WM training. On average, the participants’ performance improved across the training sessions for both training tasks. During the first training session, the participants had an average n-back level of 1.81 and an average symmetry span difficulty level of 3.77. On the 12th training session, the participants’ average n level was 4.23 and their average symmetry span difficulty level was 5.43. However, there was a great deal of variability across participants. The average n level achieved by each participant on the 12th training session ranged from 1.0 to 8.83. Similarly, the average level of difficulty achieved by each participant for the 12th session of the symmetry span task ranged from 3.18 to 7.33. Figure 2 shows the changes in performance across the WM training sessions. Repeated-measures ANOVAs showed that there were significant effects of training session on scores for both the symmetry span task, F(11, 242) = 14.76, p < .001, η_p ² = 0.40, and the n-back task, F(11, 231) = 32.74, p < .001, η_p ² = 0.61. Paired t tests comparing the first and 12th training sessions showed that participants scored significantly higher on the 12th session in both the symmetry span task, t(22) = 8.56, p < .001, Hedges’s g _av = 1.50, and the n-back task, t(21) = 7.64, p < .001, Hedges’s g _av = 1.93.

Baseline memory tasks

Although participants generally improved their performance on the tasks on which they were trained, the key question was whether their training would affect their performance on untrained memory tasks. To address this question, we compared the three training groups’ changes in performance on the three pre- and posttraining baseline tasks. Given the large age range of the participants, the statistical tests were run with and without including age as a covariate. Including age did not change the results of the tests unless otherwise noted.