Text processing is a highly complex cognitive task in that its successful completion depends on the coordinated processing and maintenance of several different kinds of information, including words and the concepts they denote, the syntactic and semantic relationships between words within a sentence, the semantic relationships between ideas within and between sentences, and general world knowledge. Indeed, a widespread assumption among contemporary models of text comprehension is that successful comprehension depends heavily on the effective use of limited working memory (WM) resources (e.g., Goldman, Varma, & Coté, 1996; Graesser, Gernsbacher, & Goldman, 2003; Just & Carpenter, 1992; Kintsch, 1998; Perfetti, 1988; van den Broek, 2010).

Consistent with this assumption, research has repeatedly demonstrated that individual differences in WM predict variance in comprehension. The most common approach to measuring WM has involved complex span tasks (see Conway et al., 2005; Kane et al., 2004). For example, Daneman and Carpenter’s (1980) original reading span (RSPAN) task required participants to read a series of unrelated sentences aloud, after which they were asked to recall the last word of each sentence. Given concerns that sentence-final words were being generated from recall of the gist of the sentence, rather than from maintenance in working memory (Conway et al., 2005), more recent versions of the RSPAN task typically present an unrelated target (e.g., a word or letter) after each sentence to be maintained for subsequent recall after the end of the sentence set. The operation span (OSPAN) task is conceptually similar, requiring the processing of arithmetic equations along with maintenance of unrelated targets that follow each equation. More generally, complex span measures include both processing and maintenance task components. Of greatest interest for present purposes, a key feature of these measures is that the content to be processed and the content to be maintained are independent of one another. Even Daneman and Carpenter’s original version of the RSPAN task involved maintenance of information that was irrelevant to subsequent processing (preceding sentence-final words were irrelevant to processing the content of subsequent sentences).

A second category of WM measures includes tasks, referred to here as content-embedded tasks, that also require processing and maintenance of information to perform successfully (Cowan, 1999; Kyllonen & Christal, 1990; Woltz, 1988). Content-embedded measures differ critically from complex span tasks in one key respect: The content to be maintained is processing relevant rather than processingirrelevant. As was previously described, in complex span tasks, the information that must be maintained is extraneous to the processing task. In content-embedded tasks, the information that is being processed or updated in WM is also the information that must be maintained for subsequent output. For example, in the digit-recoding task, participants are presented with a string of numerals that they are required to maintain in WM and process (e.g., if the digit string was 7 3 8 4 2 5, questions might include, “What number came after 4?” and “What is the difference between the third and last numbers?”). Thus, in contrast to complex span tasks, the content that must be maintained for output in content-embedded tasks is processing relevant rather than extraneous.

Although both content-embedded tasks (e.g., Was & Woltz, 2007) and complex span tasks (e.g., Engle, Carullo, & Collins, 1991) correlate with measures of comprehension, complex span tasks have been the WM measure of choice. Indeed, the volume of studies that have examined associations between complex span tasks and comprehension is too vast to be summarized here (for recent reviews, see Carretti, Borella, Cornoldi, & De Beni, 2009; Daneman & Merikle, 1996; Unsworth & Engle, 2007b), whereas Was and Woltz reported the only prior study to examine the association between content-embedded tasks and comprehension. Most important, no prior research has directly compared the association between comprehension and WM as measured by complex span tasks versus content-embedded tasks.

This direct comparison is informative because comprehension theory predicts that content-embedded tasks will be more effective than complex span tasks for capturing the association between WM and comprehension. A core assumption of models of comprehension is that comprehension relies on concurrent maintenance and processing of task-relevant content. For example, according to the construction-integration (CI) model (Kintsch, 1988, 1998), the conundrum of comprehension is that all of the information in a text far exceeds the capacity of WM and, thus, cannot be processed concurrently, yet comprehension requires integration of information across sentences and sections. According to the CI model, this problem is solved by processing text in cycles in which comprehension processes operate on one segment of text at a time (roughly corresponding to a sentence). In brief, each cycle involves temporarily maintaining a small amount of information from the previous cycle, inputting new information from the current text segment, and then integrating the old and new information. To maintain coherence across segments, a subset of the most central information is maintained in WM to participate in the next processing cycle. Thus, comprehension is heavily dependent on maintaining task-relevant content in WM both within and across processing cycles. The key point here is that whereas complex span tasks do not require maintenance of process-relevant information in WM, content-embedded tasks of WM do require such maintenance and, thus, are predicted to show stronger associations with comprehension.

In the present investigation, we used three content-embedded tasks, three complex span tasks, and three measures of comprehension to derive latent factors capturing individual differences in each of these constructs. Because the content-embedded tasks required individuals to process and update information actively maintained in WM, as does comprehension, we expected content-embedded tasks to account for a greater amount of unique variance in comprehension than would complex span measures of WM that relied on the active maintenance of content that was extraneous to the processing task.

Method

Participants

Two hundred sixty-one undergraduates at a large Midwestern state university participated as part of a larger study and received either partial course credit or monetary compensation.

Materials and procedure

Complex span tasks

All three complex span tasks were modified versions of those described in Kane et al. (2004): OSPAN, RSPAN, and counting span (CSPAN). We selected these three measures on the basis of convention, in that these complex span measures are the most commonly used to create the latent WM construct in factor analyses and structural equation models (e.g., Colom, Rebollo, Abad, & Shih, 2006; Conway, Cowan, Bunting, Therriault, & Minkoff, 2002; Engle, Kane, &Tuholski, 1999a; Léphine & Barrouillet, 2005; Mogle, Lovett, Stawski, & Sliwinski, 2008; Unsworth & Engle, 2007a). Conway et al. (2005) provided a methodological guide and review of complex span measures, suggesting and providing evidence that CSPAN, OSPAN, and RSPAN are valid and reliable measures of WM. Performance on each task was computed using partial-credit unit scoring (for details, see Conway et al., 2005).

In the OPSAN task, participants read a mathematical operation aloud (e.g., “Is (4 × 2) + 3 = 12?”), reported whether it was correct, and then read a target word aloud (e.g., home). Immediately thereafter, the experimenter pressed a key to present the next operation–word pair onscreen [e.g., “Is (9 ÷ 3) + 4 = 7? APPLE”]. Following the final pair of the trial, participants recalled the target words in serial order (e.g., home, apple). The OSPAN task consisted of 15 experimenter-paced trials that ranged from two to six operation–word pairs. The words and the order of set sizes were initially randomized, and that order was used for all participants.

In the RSPAN task, participants read a sentence aloud (e.g., “Mr. Owens left the lawnmower in the lemon”), reported whether it made sense, and then read an unrelated word aloud (e.g., eagle). Once the word was read aloud, the experimenter pressed a key to present the next sentence–word pair, and so on. After the final pair of each trial, participants wrote the target words in serial order. The RSPAN task consisted of 15 experimenter-paced trials that ranged from two to six sentence–word pairs presented in a fixed random order.

In the CSPAN task, participants were presented with a random array of shapes, each of which contained from three to nine dark blue circles, as well as a varying number of light blue circles and dark blue squares. Participants were asked to count the number of dark blue circles, to click on each one using the mouse (a checkmark appeared on the dark blue circle once they clicked on it), and to memorize the total number for a later recall test. After clicking on the last dark blue circle within an array, a new array appeared onscreen. After participants completed the final array, a recall cue appeared, and they recalled the total number of dark blue circles from each array in that trial in serial order. For instance, if the first array had three dark blue circles, the second had eight, and the third had two, the participant would type “3, 8, 2.” Again, the task consisted of 15 trials that ranged from two to six arrays (i.e., two to six to-be-remembered numbers) presented in a fixed random order.

Content-embedded tasks

Variants of all three content-embedded tasks have been used in previous research (Ackerman, Beier, & Boyle, 2002; Kyllonen & Christal, 1990; Was & Woltz, 2007; Woltz, 1988).

On each of the 18 trials in the alphabet WM task, participants were presented with either one or two nonadjacent letters from the alphabet for 2.5 s, followed by a transformation direction and number (−3, −2, −1, + 1, + 2, + 3). Participants were instructed to increment or decrement each stimulus letter according to the transformation value (e.g., ME . . . −2 = KC). The transformation value remained on the screen until the participant was ready to respond. Participants were instructed to solve the problem before advancing to the response alternative screen. Once participants advanced to the response alternative screen, they were given 6 sto choose an option by pressing a number key from 1 to 8. The time limit was imposed to prevent participants from solving the problems while examining the alternatives in the response window.

The 18 trials occurred in two blocks of 9 trials. The trials of each block represented a 2 × 2 × 3 design with number of stimulus letters (one or two), forward or backward recoding direction, and recoding distance (one, two, or three) as the design facets. The order of trials within each block was randomized for each participant.

In each of the 24 trials in the ABCD WM task, participants interpreted three statements that, together, defined the order of the letters A,B,C, and D. One statement defined the order of A and B (e.g., “B comes after A,” interpreted as AB). Another statement defined the order of C and D (e.g., “D comes before C,” interpreted as DC). The third statement defined the order of AB relative to CD (e.g., “Set 1 comes after Set 2,” interpreted as DCAB). The order of the three statements and the ordering operations in each statement were varied across trials. Processing time for each statement was self-paced, with a limit of 20 s. After interpreting all three statements, participants selected a response from an alphabetized list of eight possible orders. The 24 trials were divided into two 12-trial blocks.

On each trial in the digit-recoding task, participants were presented with six digits at a rate of 2.25 s per digit. Participants then answered two questions presented one at a time about the order of the numbers (e.g., if the digit string was 9 3 4 6 2 5, questions might include, “What number precedes 2?” and “What is the difference between the first and last numbers?”). All answers were numeric, and participants entered them on the keyboard number pad. The 24 trials were divided into two 12-trial blocks.

Comprehension measures

Our three comprehension measures included (1) the reading comprehension task from the Air Force Officer Qualifying Test (see Kane et al., 2004), (2) the Shipley Vocabulary Test (Zachary, 1986), and (3) the ACT (previously, American College Testing Program, Inc.) assessment reading scores (participants granted consent for their ACT scores to be accessed from the registrar).

Results

Four of the 261 participants had accuracies of zero for one or more of the WM measures. These participants’ data were eliminated from analyses. Table 1 displays the means, standard deviations, reliability estimates, and intercorrelations among the nine observed variables. Two features of the data are noteworthy. First, reliability estimates for all tasks are acceptable (values displayed on the matrix diagonal). Second, the tasks representing each construct had reasonably high correlations with one another. One concern about the intercorrelations factor is that CSPAN was more strongly correlated with the CE tasks than with the other span tasks. This issue is addressed further below.

Table 1 Means, standard deviations, and correlations of 12 observed variables

Structural equation modeling

Figure 1 presents the hypothesized model with standardized path coefficients represented in the model (the standardized coefficients are shown in boldface, and estimated factor correlations are shown in parentheses). Analysis of the structural equation model indicated that the model was a good fit to the data. Fit indices for the model are as follows: χ2(24, N = 256) = 65.67, p < .001, CFI = .93, and RMSEA = .08. Although the χ2 statistic is significant, values for CFI and RMSEA indicated that the model provides an adequate fit of the data.Footnote 1 The correlation between the latent factors of span and content-embedded WM indicates that although the two factors share common variance, the majority of the variance in the two factors (approximately 56%) is not shared.

Fig. 1
figure 1

Structural equation model with standardized parameter estimates. Estimates contained in parentheses represent factor correlations. All paths, with the exception of the direct effect of complex span on comprehension, are significant at the .01 level. Error variance is not represented

Of greatest interest, the focus of the present investigation was to determine whether the content-embedded measures of WM account for a greater amount of unique variance in comprehension than do complex span measures of working memory. In the tested model, the estimated standardized total effects of span on comprehension were β = .15, and the total effects of content-embedded WM on comprehension were β = .56. These results indicated that the content-embedded factor accounted for 31% of unique variance in comprehension, whereas complex span accounted for only 2% of unique variance in comprehension.

Table 1 shows that CSPAN correlated more highly with the content-embedded tasks than with the other span tasks. In contrast to RSPAN and OSPAN, in which the memoranda are completely unrelated to the processing stimuli, in CSPAN the memoranda are related to the processing stimuli. Our decision to include CPSAN as a complex span task in the initial hypothesized model was based on convention, given the history of previous research in which these three tasks have been combined to create a latent WM factor (e.g., Engle et al., 1999a). However, CSPAN arguably aligns conceptually with content-embedded tasks as defined here. Accordingly, we also tested a model in which CSPAN was loaded on the content-embedded factor, and this modification improved model fit, χ2(24, N = 256) = 31.12, p = .01; CFI = .99, RMSEA = .03. Eliminating CSPAN from the model altogether also did not qualitatively alter the conclusions supported by the conventionally motivated model reported in Fig. 1.

Discussion

Reading comprehension (and many other complex cognitive tasks more generally) requires one to process task-relevant information that is actively maintained in WM. Therefore, a theoretically motivated prediction is that individual differences in comprehension will be predicted better by WM tasks that capture this concurrent demand than tasks that do not. The present results confirm this hypothesis by demonstrating that content-embedded tasks of WM are a superior measure for capturing the association between comprehension and WM.

The present data also provide evidence that complex span tasks and content-embedded WM tasks reflect related but separable WM processes. One interpretation is that complex span measures more heavily reflect individual differences in the ability to control attention to actively maintain memory elements in the face of interference or distraction (Conway & Engle, 1994; Engle et al., 1999a). For example, processing the sentences or equations in the RSPAN and OSPAN tasks may interfere with maintaining the extraneous word lists, which is the goal of the task.

In contrast, the content-embedded WM tasks provide a more direct measure of an individual’s ability to maintain information in WM that is relevant to the cognitive process being performed. To successfully complete the content-embedded tasks, one must process information currently active in working memory, which is akin to what occurs during comprehension. These tasks also may require controlled attention, but the key difference between content-embedded and complex span tasks is that, for content-embedded tasks, one must continually update processing-relevant information that is being maintained in WM, whereas for complex span tasks, one must simply keep information active in WM while completing the processing required for an unrelated task.

The stronger relationship between the content-embedded factor and comprehension may also indicate that this factor reflects general cognitive ability better than does the complex span factor. The complex span tasks are all structured in a similar manner, whereas successful completion of each of the content-embedded tasks requires different processing and memory load demands. The methodological differences among the observed variables, yet cohesion of the latent content-embedded factor represent a cognitive ability more general than the more specified ability to maintain a memory load during interference.An interesting direction for future research will be to explore the associations between content-embedded tasks, complex span tasks, and conventional measures of general cognitive ability.

One possible criticism of the present investigation concerns the use of CSPAN as a complex span task. As was described earlier, our inclusion of CSPAN as an indicator of complex span was motivated by a sizable number of prior studies that have used CSPAN, OSPAN, and RSPAN to capture complex span. One possible explanation for the larger correlations between CSPAN and the content-embedded measures is that CSPAN is not as reliable a measure of complex span as OSPAN and RSPAN. Indeed, in some investigations in which the same three complex span measures have been used, CSPAN has been found to produce higher zero-order correlations with tasks representing constructs other than do RSPAN and OSPAN (e.g., Engle, Tuholski, Laughlin, & Conway, 1999b; Mogle et al., 2008). Nonetheless, the most important point for present purposes is that all variants of the model (CSPAN loading on the complex span factor, CSPAN loading on the content-embedded factor, or CSPAN removed from the model) support the same qualitative conclusions.

A second possible reason that CSPAN correlates with the content-embedded tasks is that CSPAN requires participants to remember information related to the processing component of the task (to revisit, CSPAN involves counting the number of dark blue circles in a series of arrays and maintaining those numbers in memory for later recall). In general, complex span tasks can be content-embedded tasks under conditions in which the processing component is contentcongruent with the memory component. Although the most frequently used complex span tasks (RSPAN and OSPAN) do not meet these requirements, other span tasks, such as CSPAN, may do so.

In sum, the present investigation shows that individual differences in comprehension are predicted better by content-embedded measures than by complex span measures of WM. The present evidence supports the theoretical hypothesis that the coordination of interrelated content within a limited capacity system is particularly important to individual differences in comprehension. Complex span measures provide valuable insights regarding individual differences in the ability to actively maintain elements in WM in the face of distraction or during unrelated processing. However, it is our recommendation that when researchers are interested in exploring the relationship between WM and comprehension, content-embedded tasks will capture more of their relationship and, hence, should provide a more useful measure of WM. More generally, comprehension is just one among many complex cognitive processes that involve high contentcongruency between memory elements and processing, and thus we would also recommend further exploration and use of content-embedded tasks to capture the relationship between WM and other complex cognitive processes of this sort.