Individual differences in working memory and processing speed predict anticipatory spoken language processing in the visual world

ABSTRACT Several mechanisms of predictive language processing have been proposed. The possible influence of mediating factors such as working memory and processing speed, however, has largely been ignored. We sought to find evidence for such an influence using an individual differences approach. 105 participants from 32–77 years of age received spoken instructions (e.g. “Kijk naar deCOM afgebeelde pianoCOM”– look at the displayed piano) while viewing 4 objects. Articles (Dutch “het” or “de”) were gender-marked such that the article agreed in gender only with the target. Participants could thus use article gender information to predict the target. Multiple regression analyses showed that enhanced working memory abilities and faster processing speed predicted anticipatory eye movements. Models of predictive language processing therefore must take mediating factors into account. More generally, our results are consistent with the notion that working memory grounds language in space and time, linking linguistic and visual–spatial representations.

It is noteworthy that so far very little research has investigated the influence of mediating factors on prediction in language processing. The role of working memory and general processing speed on anticipatory language processing, for instance, has hardly been looked at. There are several reasons why ignoring mediating factors is problematic. First, it is very likely that mediating factors modulate anticipation in some situations but less in others. Anticipatory eye movements in the visual world (which occur in many real-world situations such as when we give and receive directions or comment on the state of our visual surroundings), for example, require the building of a representational network (i.e. online models) allowing for visual objects to be linked to unfolding linguistic information, places, times, and each other. It is therefore likely that working memory is particularly important for anticipatory processing in a situation when spoken language is used in relation to a co-present visual environment. Second, it is conceivable that mediating factors interact in particular ways with different mechanism of prediction. Perhaps working memory capacity is more important for combinatorial mechanisms of prediction (cf. Kuperberg, 2007) than for simple associative mechanisms (cf. Kuperberg, 2007;Pickering & Garrod, 2013). Third, theoretical models of predictive language processing will only be complete if they can fully account for the interplay of individual differences in the mediating factors and the mechanisms of prediction.
Indeed, the idea that working memory may be important for language-vision interactions is not new (Huettig, Olivers, & Hartsuiker, 2011;Knoeferle & Crocker, 2007; cf. Spivey, Richardson, & Fitneva, 2004). Huettig, Olivers, et al. (2011) propose that working memory grounds language in space and time, allowing for short-term connections between objects and linking linguistic and visuospatial representations. Similar proposals have been made in vision research. Vision researchers have suggested that visual objects are "instantiated" and that type representations are bound to a specific location and moment in time (e.g. an object file, token, or index, Kahneman, Treisman, & Gibbs, 1992;Kanwisher, 1987;Pylyshyn, 2001). Huettig, Olivers, et al. (2011) suggest that working memory enables us to link language to the here and now, or, when anticipating things, the there and then. If this notion is correct, then individual differences in working memory should influence language-mediated anticipatory eye movements. In the present study, we sought to find evidence for such an influence of working memory using an individual differences approach.
The notion that general processing speed and cognitive efficiency may impact on anticipatory language processing is also not entirely new. General processing speed has been linked to individual differences in many cognitive tasks (e.g. Kail & Salthouse, 1994;Salthouse, 1996). Furthermore, age-related cognitive decline has been accounted for as an effect of slowing of the speed with which processing operations can be executed (Salthouse, 1996). Processing speed may play a role in sentence processing if it indexes the speed with which information can be retrieved from longterm memory, and the speed with which unfolding information can be integrated into a representation of the sentence meaning. Good connectivity between different neural processing regions may contribute to a high speed of information integration. Peelle, Troiani, Wingfield, and Grossman (2010), for example, investigated neural connectivity in younger and older adults and found that, compared to younger adults, neural connectivity in the older adults was reduced. Peelle et al. (2010) argued that the pattern of reduced coordination of activity between processing regions may relate to older adults' difficulty with sentence comprehension in certain situations (e.g. difficult listening conditions).
An important concern when assessing performance on any task is of course that participants in the experiments are a representative sample of the whole population. It has been argued that the student participants used in most experiments in experimental psychology are the WEIRDest (Western Educated Industrialized Rich Democratic) people in the world from which to draw general conclusions about human behaviour (Henrich, Heine, & Norenzayan, 2010; see also Arnett, 2008). In the present study, we made an explicit attempt to get a more heterogeneous sample of participants. Our adult participants were of varying ages and educational backgrounds, such that the sample is potentially more variable in working memory and processing speed than in typical university student samples. Working memory and speed decline with advancing adult age (Hultsch, Hertzog, Small, McDonald-Miszczak, & Dixon, 1992;Park et al., 2002). Differences in working memory have been linked to individual differences in speech perception in noisy conditions (Akeroyd, 2008;Rönnberg et al., 2013). Additionally, verbal working memory ability has been shown to predict older adults' ability to use context information for the recognition of upcoming words in listening to sentences (Janse & Jesse, 2014). These results suggest that those with better working memory are better able to keep and update a coherent representation of the sentence content in working memory.
Predictive language processing in older adults has mainly been investigated in reading studies. Rayner, Reichle, Stroud, Williams, and Pollatsek (2006) suggested that older readers adopt a "riskier" reading strategy than younger adult readers, with older readers more often skipping words, possibly on the basis of their guess of what the next word will be. This finding could be interpreted as indicating that older adults predict more than younger adults to compensate for age-related cognitive decline. It should be noted, however, that the older adults also regressed more to earlier words, and that predictability effects on older readers' fixation data were found to be equally large as for younger readers (Rayner et al., 2006). Another line of research has measured ERPs and used word-by-word reading to investigate younger and older adults' semantic integration of final words in sentences varying in semantic constraint (e.g. Federmeier & Kutas, 2005). Federmeier and colleagues repeatedly found that older adults showed smaller and delayed effects of contextual constraint compared to young adults, which was attributed to decreased reliance on predictive processing in older age (Federmeier, Kutas, & Schul, 2010;Huang, Meyer, & Federmeier, 2012;Wlotko & Federmeier, 2012). Wlotko and Federmeier (2012) speculate, in line with the argument put forward by Peelle et al. (2010), that older adults' decreased predictive processing may be due to less-efficient functional connectivity, or that predictive processing has become too costly or inefficient for older adults due to decreased availability of neural resources.

Present study
We conducted an eye-tracking experiment and administered several control tasks to assess whether working memory and processing speed independently contribute to language-mediated anticipatory eye movements in adult native speakers of Dutch. In order to reduce the likelihood that anticipation would be driven by semantically fitting thematic referents (Kamide, Altmann, & Haywoo, 2003;cf. Huettig & Altmann, 2005) and/or simple word associations (e.g. between verb and noun, e.g. censure and newspaper as in "to censure the newspaper", or weave and cloth as in "to weave the cloth"), we presented participants with simple Dutch spoken instructions such as "kijk naar de afgebeelde piano" (look at the displayed piano) as they were looking at the target object (e.g. piano) and three unrelated distractor objects. This way the Dutch article "de" or "het" was the only cue in the sentence that could be used for anticipation. Note that Dutch has a two-way grammatical gender system and makes a distinction between common and neuter gender. Gender is marked on a number of agreeing elements accompanying the noun or referring to it such as determiners, adjectives, demonstratives, and pronouns (Blom, Polišenská, & Weerman, 2008, for further discussion). For the present study we used definite nouns, which are preceded by the definite determiner de (common nouns), as in de piano "the piano" or by the definite determiner het (neuter nouns), as in het paard "the horse". Participants therefore could use gender information from the article for prediction of the upcoming noun, as targets but not unrelated distractors agreed in gender with the article presented in the spoken sentence.
We assessed individual differences in working memory using an auditory nonword repetition (NWR) task as an index of verbal/phonological short-term memory (Gathercole & Baddeley, 1996;Thorn & Gathercole, 1999) and a backwards digit span task which is more appropriate to measure the manipulation of items in memory rather than their storage and reproduction. In order to assess spatial working memory, we used the Corsi block tapping task (Corsi, 1972). To measure general processing speed, we used a digit-symbol substitution (DSS) test and a letter comparison task. Finally, to make sure that any potential individual differences would not just reflect non-verbal intelligence (often referred to as the "g-factor"), we also administered Raven's progressive matrices to participants.

Participants
One hundred and five participants (21 of whom were male) were drawn from the Max Planck Institute for Psycholinguistics participant pool and were paid for their participation. They were all native speakers of Dutch, and none of them wore hearing aids. Their age varied from 32 to 77 years, with a mean age of 55.75 (SD = 9.81). Twenty-nine of these participants were aged over 60. Participants were tested without glasses or with glasses or contact lenses if they reported impaired vision. Participants were also given plenty of preview of the visual objects (4000 ms) before the onset of the spoken instruction so that they could recognise the objects for what they were.
Hearing sensitivity was assessed with a Maico ST20 portable audiometer (air conduction thresholds only) for both ears at octave frequencies from 250 Hz to 8 kHz. As the majority of our participants were aged below 60, we did not use the high-frequency hearing loss index (PTA high averaged over 1, 2, and 4 kHz, which is often used to index age-related hearing loss) but used the pure-tone average (PTA) threshold in the participant's better ear averaged over 0.5, 1, and 2 kHz (the standard PTA). However, due to logistic reasons (availability of the audiometer), hearing data were not available for all participants. Hence, hearing data are missing for 25 participants, all aged below 60, and all reported to have normal hearing. For the remaining 80 participants, mean PTA was 12.73 (SD = 6.41) (range = 0-33, and note that higher values indicate poorer hearing). PTA values in between 26 and 40 dB represent mild hearing loss, such that all participants in the sample had relatively normal hearing.

Materials and design
Participants received 40 spoken instructions (e.g. "kijk naar de afgebeelde piano"look at the displayed piano; or "kijk naar het afgebeelde paard"look at the displayed horse). Twenty of the instructions contained common gender words and 20 others neuter gender words. For example, on one trial (Figure 1(a)) the target object piano was a "de word" but the three unrelated distractors (pig, paper, and plate) were neuter gender ("het") words. Conversely, on some other trials (Figure 1(b)) the target was a neuter gender ("het") word (e.g. "paard"horse) but the three unrelated distractors (scissors, shark, and screwdriver) were "de words". Instructions containing common and neuter gender nouns were randomised. The word "afgebeelde", displayed, was inserted between article and noun in the spoken instructions to ensure participants had ample time to anticipate the target object. The instructions were read aloud with a neutral intonation contour by a female native speaker of Dutch in a sound-damped booth. Digital recordings (sample rate 44.1 kHz, 16 bit sampling resolution) were stored on a computer. The average noun onset occurred 2009 ms after article onset.
Pictures were line drawings taken from the Severens, Van Lommel, Ratinckx, and Hartsuiker, (2005) set and were matched for CELEX word frequency, number of picture names, h-statistic (which compensates for overestimating name agreement when participants assigned many different names infrequently and one single name very frequently), and picture naming time.
The task consisted of the presentation of 50 nonwords, all of which were phonotactically legal in Dutch Figure 1. Example displays for one "de" trial (a), with the target: de piano, and three "het" unrelated distractors) and a "het" trial (b), with the target: het paard (horse), and three "de" unrelated distractors). (de Jong & van der Leij, 1999). The nonword items were presented over headphones at a fixed mean presentation level of 70 dB SPL for all participants. Participants were seated in a sound-attenuating booth. Each nonword was presented only once, after which participants were asked to repeat the nonword. Inter-trial time was three seconds. Nonwords of different syllable lengths (two to five syllables long) were presented intermixed, but the order in which they were presented was kept constant for all participants. Due to technical failure, NWR data for two participants were missing. Mean score for this task was 38.55 (out of a total of 50, SD = 4.14), and ranged from 23.32 to 46.77. Higher scores reflected better auditory verbal short-term memory.
Working memory: backward digit span A backwards-recall digit span task (a subpart of the Wechsler Adult Intelligence Test, 2004) was used to measure individual working memory capacity. Since the backwards-recall variant of this task requires manipulation of presented materials, rather than storage and reproduction, the task is considered to be an index of working, rather than short-term, memory (Baddeley, 2006). We added a general working memory task such as digit span because we were interested in the influence of general working memory capacity in addition to the influence of verbal/phonological working memory (as assessed by NWR, which is more directly related to the spoken language input) and visuospatial working memory (cf. Corsi blocks measure below, which is more directly related to the visual input). In the computerized variant of this task used here, a series of digits was shown sequentially in the centre of the computer screen. Each digit was presented for one second and with one second in between consecutive digits. Digits were presented in a large white font (Arial, font size 100) against a black background. After presentation of the digit sequence (e.g. 3 6 2), the participant was prompted to recall the digits in the reverse order (e.g. 2 6 3). Participants typed in their responses with a computer keyboard. Participants were first presented with two threedigit trials to become familiarized with the task. They were then tested on digit sequences of two up to eight digits (two trials for each sequence length, making up 14 trials in total). Note that participants were asked to proceed to up to eight-digit sequences, regardless of their performance on shorter digit sequences. Data for one participant were missing due to technical failure. Individual working memory performance was operationalised as the number of correctly recalled digit sequences (out of 14 test trials). Mean number of correct trials in this task was 6.92 (SD = 2.28, range 2-12): the higher the score, the better working memory this participant has.

Spatial working memory: Corsi block task
The visual Corsi block tapping task (Corsi, 1972) was used as a measure of visuospatial short-term memory performance. A computerized variant of this task was used here, in which participants saw a pattern of nine identical blocks, irregularly positioned on the computer screen. A number of these blocks are then highlighted at a rate of one block per second. The participant's task is to click the same blocks in their order of highlighting. The task gradually becomes more challenging as the length of the block sequences increases from 2 to 9, with two trials for each block sequence length (resulting in 16 trials in total). Individual performance on this task was coded as the total number of correctly imitated trials (out of the total of 16 trials): the higher the score, the better spatial short-term memory. Mean number of correct trials was 7.05 (SD = 2.24) and ranged from 0 to 12.
Processing speed: DSS Participants performed the DSS test, which is a penciland-paper subtest of the Wechsler Adult Intelligence Test (2004). Participants are provided with a key of 9 different symbols, each symbol paired with one of the numbers from 1 to 9. Participants then perform an assignment in which they have to substitute rows of numbers for their corresponding symbols, by writing the symbol below each number. Participants get 90 seconds to substitute as many digits for symbols as possible (the maximum number of digits to be substituted is 133). DSS test scores have been related to perceptual speed or processing speed (Hoyer, Stawski, Wasylyshyn, & Verhaeghen, 2004;Salthouse, 2000). Substitution time per symbol was calculated by dividing 90 seconds by the number of symbols the participant had coded, such that higher scores indicate poorer (i.e. slower) performance. Mean substitution time per symbol was 1.67 sec (SD = 0.41), and ranged from 0.97 to 3.21.
Processing speed: letter comparison This task was based on a paper-and-pencil task thought to index processing speed as described in Earles and Salthouse (1995) and Salthouse (1996). Participants were presented with two-letter strings (all consonants) on a computer screen: one centred in the top half of the screen, and one string centred in the lower half. Participants were asked to decide as quickly and as accurately as possible whether the two-letter strings were same or different by pressing buttons on a response button box labelled "same" or "different". The first experimental block consisted of letter strings made up of three letters (e.g. TZF). The second block consisted of six-letter strings (e.g. RNHKTG) to be compared. Letters were presented in a large black font (Arial 60) against a white background. Within each block, 12 trials contained identical strings and 12 trials contained different strings, making up 48 trials in total. If the strings were different, they would only differ in one letter in any of the three (for the three-letter strings) or six (for the six-letter strings) positions. Participants were first presented with six practice trials representing three "same" and three "different" trials. Each trial started with the presentation of a fixation cross, which stayed on the screen for 500 ms. After another 100 ms, the two-letter strings would be presented and stayed on the screen until the participant had responded. The next trial was presented after an inter-trial time of 1000 ms. Mean accuracy proportion over the 48 test trials was 0.95 (SD = 0.04, range 0.83-1). For all correct responses, an reaction time (RT) cutoff criterion of 3 SDs above the grand mean was calculated. This led to a further exclusion of 1.3% of the data points (for each participant at least 37 out of 48 trials were left; overall 93.3% of the original trials). On the basis of this dataset (incorrect and extremely slow responses excluded), each participant's mean RT was calculated and was entered as individual speed. Mean letter comparison RT was 1526 ms (SD = 285), and RTs ranged from 1002 to 2302 ms. A higher RT value corresponds to a lower processing speed.
Non-verbal intelligence: Raven's matrices Raven's advanced progressive matrices were administered as a measure of non-verbal intelligence. In the computerized version we used here, participants had to indicate which out of eight possible shapes completed a matrix of geometric patterns by clicking on it with a computer mouse. Target matrices were presented as large pictures in the centre of the screen, and below the target matrix were always two rows of four pictures each to represent the eight options. Items could be skipped to be presented again at the end. Participants were given 20 minutes to complete the 36 items. The time was indicated in the top right corner of the screen. Individual performance was the total number of correct responses. Mean score was 14.87 (SD = 5.95), and scores ranged from 3 to 28. All individual differences measures were normally distributed (according to Kolmogorov-Smirnov testing for normality).

Procedure
Participants were tested individually and seated at a comfortable distance from the computer screen. Eye movements were recorded with an SR Research Eyelink 1000 Tower mount system sampling at 1000 Hz. The system was calibrated using the standard Eyelink setup. The positions of the pictures were randomised. The spoken instructions were presented via headphones at a fixed mean presentation level of 70 dB SPL. At the beginning of each trial, a central fixation dot appeared allowing for drift correction. Then the visual displays appeared. Auditory presentation of the spoken sentences was initiated 4000 ms after the pictures appeared on the screen. After the onset of the spoken instruction, the display remained on the screen for another 5000 ms followed by a 500 ms blank screen. Participants were told to listen carefully to the instructions of the speaker (e.g. "look at the displayed piano"). They were reminded not to take their eyes off the screen if possible. The whole experiment lasted around 20 minutes. After participants finished the eye-tracking experiment, the individual differences measures were administered.
Data coding procedure Data for the eye-tracking experiment were coded as fixations, saccades, or blinks using the Eyelink algorithm. The timing of the fixations was established relative to the onset of the critical article ("de" or "het") in the spoken instructions. Gaze position was categorised by object quadrant. Fixations were coded as directed to the target objects, or the unrelated distractors.

Results
The data were analyzed using a magnitude estimation approach. Cumming (2014) points out that the field needs to change towards a cumulative quantitative science. Cumming argues (convincingly in our view) that to make progress researchers should strive to avoid invoking null-hypothesis testing and interpret results by using measures of effect sizes and confidence intervals. Indeed, Fidler and Loftus (2009) provide evidence that reporting confidence intervals leads to much better interpretation of the results than a research report based on null-hypothesis testing (see Cumming, 2012Cumming, , 2014, for extensive discussion). Figure 2 shows a time-course graph of the fixation proportions to the target objects specified in the instruction and the averaged unrelated distractors plotted from the acoustic onset of the spoken article ("de" or "het") until 3000 ms post-article onset. By-participant 95% confidence intervals were computed at every sampling step for target and averaged distractor proportions. The area shaded in grey in the graph represents the upper and lower bounds of the 95% confidence intervals for target and distractor looks across time. The graph shows that participants anticipated the target object (i. e. looks to the target objects diverged from the looks to the unrelated distractors) well before noun onset. Figure 3 shows a time-course graph of log-transformed difference scores between target and distractor fixations. Fixation proportions were transformed logistically, and zeroes and ones were replaced by 0.01 and 0.99 (cf. Macmillan & Creelman, 1991). The dependent variable is the ratio of the transformed target and distractor looks as a function of time using the following logratio (cf. Arai, Van Gompel, & Scheepers, 2007): where P (T ) refers to the probability of gazes on the target objects and P (D) refers to the probability of gazes on the averaged distractor objects. The measure is symmetrical around zero such that equal proportions of looks yield a score of zero, higher proportions on the targets result in a positive score, and higher proportions on the distractor result in a negative score. The area shaded in grey in the graph represents the upper and lower bounds of the 95% confidence intervals for the ratio scores. A Kolmogorov-Smirnov test showed that these ratio scores were normally distributed (Z = 0.86). Figure 4 shows the log-transformed difference ratios of anticipatory fixations over the course of the experiment. The figure suggests that there may have been some task familiarisation effect over the first few trials and perhaps some small learning effect across the experiment. However, note that there is no reason that prediction should not be sensitive to contingencies over time in the environment. Correlations among the individual differences measures, and between the individual differences measures and the measure of predictive looks to the target, are given in Table 1. Pearson correlation coefficients (with their confidence intervals, based on Fisher's z transform) are reported, rather than Spearman correlation coefficients, because all measures were normally distributed.
We used principal component analysis to derive one Working Memory construct underlying the three memory measures (spatial short-term memory, auditory short-term memory, and working memory), and one Processing Speed construct underlying the two speed measures (DSS and letter matching). Factor loadings on the Working Memory construct (unrotated factor solution) were 0.71 for the visual working memory measure (i.e. Corsi block task), 0.81 for the digit span working memory measure, and 0.75 for the NWR   working memory measure. Factor loadings on the Processing Speed construct (unrotated factor solution) were 0.90 for both speed measures. Individual scores on the extracted variables Working Memory and Processing Speed were saved as new predictors of prediction behaviour. Table 2 presents the correlations between the new construct variables Working Memory and Processing Speed and predictive looks. Table 2 shows that the Working Memory construct correlated positively and the Processing Speed construct negatively with anticipatory looks (note again that higher values on the Speed construct indicate slower processing).
Finally, we carried out multiple regression analyses to estimate the independent contribution of the variables and constructs to prediction performance in the eyetracking task. The dependent variable was the log-transformed ratio of predictive looks in the critical time region (i.e. the difference of the log-transformed target and average distractor looks between article onset and noun onset). In the first multiple regression analysis, the Working Memory and Processing Speed constructs were entered into regression models in addition to age, hearing loss, and Raven's performance. All variables were entered at once. Given the missing data for several of these predictors for 27 participants in total (mainly for the hearing loss measure), this first analysis was run on a subsample of 78 participants. This model, with a R 2 of 0.32, showed the following independent contributions to predictive eye gaze: Working Memory (unstandardised β = 0.23, SE β = 0.08; standardised beta = 0.36), Processing Speed (unstandardised β = −0.28, SE β = 0.09; standardised beta = −0.42), age (unstandardised β = 0.02, SE β = 0.01; standardised beta = 0.35), Raven's (unstandardised β = 0.01, SE β = 0.02; standardised beta = 0.12), and hearing loss (unstandardised β = 0.002, SE β = 0.01; standardised beta = 0.02).
In a second regression analysis, we excluded hearing from the set of predictor variables to allow for inclusion of more participants from our sample. This second regression analysis, in which the Working Memory and Speed constructs were entered together with age and Raven's performance (all entered at once), was run on data from 102 participants. This model, with a R 2 of 0.21, showed the following contributions to prediction: Working Memory (unstandardised β = 0.23, SE β = 0.07, standardised beta = 0.36), Processing Speed (unstandardised β = −0.20, SE β = 0.08, standardised beta = −0.30), age (unstandardised β = 0.01, SE β = 0.01, standardised beta = 0.20), and Raven's (unstandardised β = −0.001, SE β = 0.01, standardised beta = −0.01). Note the minimal amount of unique variance of the Raven's measure beyond what was already accounted for by working memory and speed (the correlation between Table 1. Correlations between different measures (Pearson correlation coefficients are provided, plus their 95% confidence intervals in brackets). Raven's and the Working Memory construct being r = .60, with 95% confidence interval from .46 to 71; and between Raven's and the Speed construct: r = −.58, with 95% confidence interval from −.70 to −.43). 1 We also compared explained variance in the criterion variable between a model containing both Working Memory and Speed (with an R 2 of 0.18, N = 102) to models containing either Working Memory or Speed only (with R 2 values of 0.15 for a Working Memory only model and 0.11 for a Speed-only model, based on the same N ). Thus, the portion of explained variance (change in R 2 ) uniquely attributable to Working Memory was 0.07 (0.18 -0.11, i.e. the difference between the WM+Speed model and the Speed-only model), and that of Speed was 0.03 (0.18 -0.15, i.e. the difference between the WM+Speed model and the Working Memory only model). Scatter plots depicting the relation between the criterion variable (ratio of predictive looks) and the Working Memory and Speed constructs are provided in Figure 5.

General discussion
Participants heard instructions such as "Kijk naar de COM afgebeelde piano COM " (look at the displayed piano) while viewing four objects. The Dutch articles ("het" or "de") were gender-marked such that the article agreed in gender only with the target, allowing for gender information from the article to be used to predict the upcoming target object. Participants fixated the target well before noun onset, which strongly suggests that they anticipated the target objects. Multiple regression analyses revealed that working memory and processing speed independently accounted for the largest amount of variance in participants' language-mediated anticipatory eye movements in the present study. We will discuss these mediating factors in turn.
Our results are consistent with the notion that working memory serves as the nexus in which long-term visual as well as linguistic representations (i.e. types) are bound to specific locations (i.e. tokens or indices). As such, working memory capacity plays an important role in languagemediated anticipatory eye movements (Huettig, Olivers et al., 2011). Our data are compatible with the view of working memory as the capacity to hold and bind arbitrary pieces of information. How may such a working memory mediate anticipatory eye movements? In line with Huettig, Olivers et al. (2011), we suggest that the objects in the visual display in the present experiment are first encoded in a visuospatial type of working Table 2. Correlations between construct scores (Working Memory and Processing Speed) and anticipatory looking behaviour (Pearson correlation coefficients are provided, plus their 95% confidence intervals in brackets). memory (cf. Cavanagh & Alvarez, 2005;Pylyshyn, 1989). The perception of the familiar objects triggers perceptual hypotheses in long-term memory. Activation of these visual representations leads to cascaded activation at "higher" levels of representation (e.g. activation of semantic and phonological representations) within a few hundreds of milliseconds (see Huettig & McQueen, 2007;McQueen & Huettig, 2014 for experimental evidence). This results in a representational conglomerate of object knowledge and associated linguistic knowledge (including representations of the gender of the object names), which is bound to the objects' locations in working memory. Hearing the article in the spoken instruction will then trigger a similar chain of events from the linguistic input, in which relevant representations (e.g. phonological, syntactic) will match up with those activated by the visual input. This activation is fed back to the object's location. This then increases the likelihood that a saccadic eye movement towards this location is triggered. We assume that the strength of activation of a particular representation translates into the probability of attending towards whatever shares those representations. In short, we suggest that language-mediated anticipatory eye movements require substantial working memory capacities to ground language in space and time, allowing for shortterm connections among objects and linking linguistic and visual-spatial representations. According to this account, better working memory abilities result in more anticipatory eye gaze.
There is no doubt that many aspects of the exact nature of working memory require further exploration. According to some working memory models, for example, working memory contains or consists of activated long-term memory representations (e.g. Cowan, 2005;Ericsson & Kintsch, 1995;MacDonald & Christiansen, 2002). Working memory in these models therefore is the ability to activate long-term memory representations and keep them active for online processing. Working memory then indexes an individual's ability to activate and keep active multiple long-term memory representations in order to efficiently resolve competition processes. In other words, working memory either relates to the activation of a limited set of candidates, or to keep this set active in such a way that the competition among them can be resolved successfully. Another important characteristic of these long-term memorybased models (Cowan, 2005;Ericsson & Kintsch, 1995;MacDonald & Christiansen, 2002) is that working memory capacity is thought to be domain-specific and mediated by an individual's expertise or experience. According to these accounts, working memory capacity does not just reflect processing resources available for any task at hand but is specifically operationalised as memory for (for example) verbal material (as reflected in verbal working memory measures such as digit span and NWR) and spatial information (as measured by spatial working memory tests). Our data are consistent with the view that both verbal working memory and spatial working memory influence language-mediated anticipatory eye movements (Table 1).
We stress that the influence of working memory abilities on anticipatory spoken language processing may be particularly strong in the situational context tested in the present study. As discussed in the introduction, the particular task situation tested here requires the building of a representational network linking visual objects to unfolding linguistic information, places, times, and each other. To what extent does the experimental set-up in the current study then reflect core, context-invariant predictive language processing? We believe there is no such thing as context-invariant predictive language processing. We conjecture that situational context determines at least partly which mechanisms of prediction are engaged and the influence of particular mediating factors of prediction. Working memory abilities may be particularly important for anticipatory processing in situations when spoken language is used in relation to a copresent visual environment. Note, however, that this holds for many everyday situations when people give or receive instructions for action or talk about realworld events. Indeed, predictive language processing in our daily interactions is often very similar to choosing among a set of pre-activated referents. It is also conceivable that there are situations in which the influence of working memory is actually greater than in the present study. Knoeferle, Urbach, and Kutas (2011), for instance, found that working memory contributed an R 2 variance of 0.13 to a verb-action congruence reaction time effect (i.e. considerably more unique variance explained by working memory than in the present study). In short, the influence of working memory on anticipatory spoken language processing in other situations remains to be explored. Future work could usefully (especially) explore the influence of relevant visual environments on working memory effects.
We do believe, however, that the visual world experimental set-up is a particularly ecologically valid paradigm for the investigation of prediction when we talk about objects and events in our surroundings. We also believe that prediction and learning are often (though not always) closely linked (cf. Chang et al., 2006;Chang, Kidd, & Rowland, 2013;see Huettig & Mani, 2015, for further discussion). We maintain that even if participants in the present study had learned to some extent that the determiner provided a reliable cue for prediction, such an explanation would still enhance our understanding of prediction and its mediating factors (for instance that working memory mediates such predictive learning). The influence of working memory on anticipatory spoken language processing in other situations of course remains to be explored. Our point is simply that predictive language processing occurs in many different situations and that, in order to obtain a comprehensive understanding of prediction, we should look at multiple situations.
Our findings suggest that processing speed is a second cognitive ability mediating predictive processing. Our results indicate that speed of processing does not just relate to predictive processing through its association with working memory, but explains variance in language-mediated anticipatory eye movements beyond what is already explained by working memory. This suggests that individual differences in how quickly information is processed play an important role for predictive processing. It has been suggested that processing speed is related to the speed with which neural signals are conducted along axons. Speed of neural transmission has also been linked to the degree of myelination (Gutiérrez, Boison, Heinemann, & Stoffel, 1995). This could potentially provide a neural mechanism underlying processing speed though this proposal requires further investigation. We do point out, however, that the regression analysis suggests that non-verbal intelligence (as measured by performance in Raven's progressive matrices) did account for very little unique variance in anticipatory eye gaze in the present study. In other words, there is no evidence that a simple "g-factor" (a psychometric construct that performance of individuals at any one type of cognitive task predicts performance at other cognitive tasks) explains much variance in language-mediated anticipatory eye movements beyond what is already accounted for by working memory and speed.
The role of age in predictive processing was found to be very small, relative to working memory and speed, and only surfaced in the regression analysis. It is still noteworthy, however, that, if anything, advanced age seemed to relate to more/better predictive processing, which may appear to be surprising given previous findings that older adults show smaller and delayed effects of contextual constraint in anticipating upcoming words (Federmeier & Kutas, 2005;Huang et al., 2012;Janse & Jesse 2014). Several differences between these studies and the present study may account for this. First, our sample covered quite a large age range (participants were between 32 and 77 years of age, mean age being 56), but "only" 29 of our 105 participants were over the age of 60, and only 10 were over 70. Hence, investigating age effects continuously in this age range may yield different results than when (very young) students are compared to older adults (all aged 60+). Second, the present study investigated anticipation of nouns on the basis of article gender, rather than on the basis of semantic context effects. Whereas the latter effects often build up over the course of the sentence, the cue for prediction in our study was a relatively local, or adjacent, one. It seems unlikely, however, that this difference in the cue for anticipation accounts fully for the differential results, as detrimental influences of age were also found on rapid integration of information in adjective-noun units (Huang et al., 2012). This suggests that influences of age do not just occur when information needs to be integrated over longer time windows. Rather, the reason why increasing age in our sample, if anything, seemed to play a neutral or positive role in predictive processing may have been that we have been able to disentangle age effects from the confounds of age-related effects on memory and speed. Table 1 shows how older age is associated with poorer memory and, particularly, with slower processing speed. It is only in the regression analysis, after having accounted for age effects on memory and speed, that age turns out to be (marginally) positively related to predictive processing. Thus, older adults may be more advanced language users, or at least not disadvantaged, compared to younger ones because of their lifelong experience (cf. Ramscar, Hendrix, Shaoul, Milin, & Baayen, 2014), even though this advantage may often be overshadowed by age-related cognitive decline. If age-related differences in processing speed and working memory as well as in sensory processing are accounted for, which is by no means trivial, age may play a neutral to positive role in predictive processing because of older adults' increased lifelong experience. Future studies could further explore this possibility.
To conclude, we investigated whether individual differences in working memory and processing speed are important predictors of language-mediated anticipatory eye movements. We sought to find evidence for such a role using an individual differences approach. We observed that working memory abilities and processing speed explain most of the variance (of the mediating factors tested here) in language-mediated anticipatory eye movements. Enhanced working memory abilities support anticipatory spoken language processing whereas a decreased general processing speed has the opposite effect. These findings demonstrate that models of predictive language processing have to be revised to take mediating factors such as working memory and processing speed into account. More generally, our results are consistent with the notion that working memory enables us to link language to the here and now, or, when anticipating things, the there and then (Huettig, Olivers et al., 2011).