The Effect of Student-Related and Text-Related Characteristics on Student’s Reading Behaviour and Text Comprehension: An Eye Movement Study

ABSTRACT The focus of the present study was on the mediation and moderation effects of reading processes as evidenced from eye movements on the relation between cognitive and linguistic student characteristics (word decoding, vocabulary, comprehension skill, short-term memory, working memory, and nonverbal intelligence) and text comprehension. Forty 4th graders read 4 explanatory texts and afterward answered text comprehension questions. During their reading, eye-movements of gaze, look back, and second pass duration were examined for the heading, first sentence, and final sentence. The result show differential effects of reader and text characteristics on skipping probability, driven by decoding and nonverbal intelligence. Regression probability and regression path duration are also influenced by decoding. Concluding, this study shows reading behaviour to be related to both students’ skills and text comprehension measures.

In educational settings, new information is often acquired by reading expository text. To learn from text, readers need to build a mental model (Kintsch, 1994). The result is a text representation that can be stored in memory. Previous studies have demonstrated that both the process and product of mental model building are related to children's abilities, such as word decoding (Huestegge, Radach, Corbic, & Huestegge, 2009;Verhoeven & Perfetti, 2008), vocabulary knowledge (Calvo, Estevez, & Dowens, 2003), reading comprehension skills (Blythe & Joseph, 2011;Rayner, 1985;Reichle, Rayner, & Pollatsek, 2003), memory capacity (Daneman & Merikle, 1996;McNamara & O'Reilly, 2009;Swanson, Zheng, & Jerman, 2009), and nonverbal intelligence (Tiu, Thompson, & Lewis, 2003). Also, text-related characteristics have been found to influence reading comprehension processes, such as word length, word frequency (De Leeuw, Segers, & Verhoeven, 2015;Joseph, Nation, & Liversedge, 2013), and position of the word within a sentence (Hirotani, Frazier, & Rayner, 2006;Rayner, Kambe, & Duffy, 2000) or a paragraph (Hyönä, Lorch, & Kaakinen, 2002). However, it is still far from clear if reading processes influence the relation between student abilities and reading comprehension in children. Therefore, the present study examined the role of student-related and text-related characteristics, as well as eye movements (i.e., a reflection of the process of mental model building) on predicting reading comprehension (i.e., the product of mental model building).
A prerequisite for text comprehension is the construction of a coherent mental model (Kintsch, 2004). Coherent mental models are constructed during reading by means of constant updating of the current model (Kintsch& Van Dijk, 1978;Van denBroek Young, Tzeng, &Linderholm, 1999), which results in a "network of propositions" (Kintsch, 1994, p. 295). Updating mental models is mainly done by creating links between the propositions with the help of inferences generated by either frequency, word position [sentence and paragraph]) and examined interrelations between student and text characteristics. Two research questions were addressed:

RQ1:
In what way are student-related and text-related characteristics associated with eye movements?
RW2: How are student-related characteristics, text-related characteristics, and eye movements associated with reading comprehension outcomes?
With respect to the first research question, it was hypothesized that both student-and text-related characteristic influence the reading process. We predicted large effects of word decoding efficiency on eye movements because word decoding is highly related to the speed of reading, as are eye movement durations. With respect to word position effects, we expected readers with higher skills to spend more time on text integration (i.e., sentences final words) and more salient text regions (i.e., heading and first sentence of a paragraph). Further, working memory was expected to predict the occurrence of regression behavior and reading comprehension outcomes because a small memory span limits the amount of information available for maintaining coherence (e.g., by generating inferences). With respect to the second question, no clear hypotheses were formulated, as very little research has focused on the effect of eye movements on reading comprehension outcomes in children.

Participants
Students from two fourth-grade classes from two Dutch primary schools participated. From the 48 participants, some (n = 5) were excluded due to unusable or missing fixation data or because the score on the text comprehension questions was more than two standard deviations from the mean (n = 3). In total, 40 students (13 girls, 27 boys; M age = 9;4 years; age range = 9;1-11;2) were included in the analyses. Participants had a normal IQ (M = 42.0, SD = 6.2, range = 26-52) compared to a norm group of their age, all scoring above the 10th percentile (Raven, 2006).

Apparatus
The experiment was conducted using a Tobii T120 eye tracker with a sampling rate of 120 Hz. Spatial accuracy of this eye tracker is 0.5°and spatial resolution is 0.2°. For this reason, careful calibrations were obtained, of which the quality was assessed by visual inspection. If the quality was poor, a recalibration procedure was conducted. All participants were sitting in a chair adjusted for their height. The eye tracker was placed on a monitor arm at a distance of 70 cm. The eye tracker was set at the proper height in accordance with the child's head position. A table with a button box and mouse was placed next to the participants.
Texts were presented on a 17-in. screen with a 1280 × 1024 resolution with a black background and white letters. Text margins were 200 pixels (px) from every sides of the screen. Font was Arial, 20 px, and line height 3 in roman style. Headings were presented in a similar font, but the headings were printed in bold, with 30 px, line height 2, and subheadings in 20 px, line height 2.

Materials
Student-related characteristics Decoding efficiency. Decoding efficiency was measured using a word reading task (Jongen & Krom, 2009) that is administered at Dutch primary schools each year. On the card, 120 words are presented, divided over four columns. Children were instructed to read aloud as many words as possible within 1 min. Every correctly read word was awarded 1 point. The internal consistency of the test is rated as good (α = .94; Egberink, Janssen, &Vermeulen, 2009 Vocabulary knowledge. Vocabulary knowledge was tested by a standardized passive vocabulary knowledge test (Leeswoordenschattaak; Verhoeven & Vermeer, 1999). This test consists of 50 multiple-choice items in which each word was presented in a short and uninformative context (e.g., "He sells vegetables"). The students were asked for the meaning of the underlined word. Four multiplechoice options were presented, including a synonym of the target word (e.g., "grass," "green soup," "salad," and "edible plants"). Two practice words were discussed prior to the test. Questions regarding the task were answered, though no hints to answers were given. Reported scores are the total number of correct answers with a maximum possible score of 50. Reliability of the test is considerably good (α = .87; Verhoeven & Vermeer, 1996).

Memory.
A forward digit span memory task (Wechsler Intelligence Scale for Children-III NL; Kort et al., 2005) was administered in which the researcher read aloud a string of digits using a falling intonation and pausing 1 s between the digits. The students were instructed to remember the digits in same order as presented. The strings started short (n = 2) with two attempts for each string length. Whenever children correctly remembered at least one of two strings, the researcher continued with a longer string, adding one digit until a maximum (n = 9) was reached. Each correctly remembered string was awarded 1 point with a maximum possible score of 16.
Second, a backward digit span memory task (Wechsler Intelligence Scale for Children-III NL; Kort et al., 2005) was administered. The researcher read aloud a string of digits using a falling intonation and pausing 1 s between the digits. The students were instructed to remember the digits in reversed order. The strings started short (n = 2) with two attempts for each string length. Whenever students correctly remembered at least one of two strings, the researcher continued with a longer string, adding one digit until a maximum (n = 8) was reached. Each correctly remembered string was awarded 1 point with a maximum of 14.
Third, a word span task (Verhoeven, Keuning, Horsels, & Van Boxtel, 2013) was administered. In this task, the researcher read aloud a string of high-frequency CVC words with a 1-s pause between the words. Two strings of each length-using different words-were presented, and thereafter the string was extended with a single word with a maximum of seven. Whenever the child repeated four subsequent strings incorrectly, the test was terminated. Each correctly recalled string was awarded 1 point, with a maximum possible score of 12.
Finally, a sentence repetition task was administered, which measures the memory of syntactical information (Verhoeven et al., 2013). The task consisted of sentences increasing in length and syntactic complexity. The researcher read aloud one sentence at a time and was instructed to repeat this sentence in the exact same order. In total, the test consisted of 12 sentences. An error-free answer accounted for 2 points. When one mistake was made, only 1 point was awarded; when two or more mistakes were made, no points were awarded. As soon as a student did not receive any points for four consecutive sentences, the test was terminated. Reported scores are the number of points on this test, with a maximum possible score of 24 points.
A principal component analysis with varimax rotation was run on all memory measures. To determine the number of factors, a parallel analysis was run (O'Connor, 2000). Two factors were found (eigenvalues = 1.365 and 1.085). The first factor showed high loadings on digit span forward (.706), word span (.841), and sentence span (.764), but not on digit span backward (.050). The second factor showed a high loading on digit span backward (0.994) but not on digit span forward (.085), word span (-.072), and sentence span (.093). Given these results, it can be concluded that the memory measures load on two factors-short-term memory (storage of information) and working memory (storage and manipulation of information). The loadings were used to calculate a weighted factor score for short-term memory and were included in the analysis.
Comprehension skills. Comprehension skills were measured using a standardized test for Grade 4 (Feenstra, 2008). This test consisted of two parts. The first part contained five texts and 25 multiplechoice questions, and the second part consisted of six texts and 30 multiple-choice questions. Texts were both narrative and expository texts. A mixture of text-based and inference-based questions were included. Item response theory models were constructed based on a calibration experiment assessing the difficulty of each item. This enabled adaptive testing; the second part of the test was adapted to the reading level obtained in the first part. Therefore, poor readers received an easier version and the good readers received a more difficult version. Item response theory models were used to transform the results of the two tests into one score that was related to each student's reading experience (months of formal reading instruction), which enabled across test comparisons. Reliability of the test is good; for the easy version, α = .84; for the difficult version, α = .85 (Egberink et al., 2009(Egberink et al., -2014. Nonverbal intelligence. To assess nonverbal intelligence, the Standard Progressive Matrices (Raven, 1960) test was administered. This multiple-choice test consists of 60 items that increase in difficulty. For each item, the student is asked to identify the missing element that completes the pattern shown in a specific figure. Items are divided over five sets (A, B, C, D, and E) with 12 items each. In set A and B, six answer options are presented, and in the other sets eight answers are provided. Prior to testing, the first and second items were discussed as an example. Every item was awarded 1 point, and thus the maximum possible score was 60.

Experimental materials
Texts. Four texts were adapted from NieuwsbegripXL (CED-Groep, 2011), which is a Dutch reading comprehension course that provides newspaper articles for children on a weekly basis. Topics of the target texts in this study were obesity, child labor, animal testing, and souvenirs. Each text consisted of five paragraphs, each presented on a separate screen. All paragraphs were preceded by a heading, which is standard for texts in this reading course. The number of headings in this text provided sufficient power for analysis.
Minor adjustments were made to ensure paragraph length was approximately similar. A summary of the characteristics for the reading material can be found in Table 1. In addition, one practice text was constructed and presented prior to the target texts. For each text, six subsequent multiple-choice questions on text likeability were administered in order to clear the students' working memory. An example is "How did you like this text?" For this item, students answered on a 5-point Likert scale ranging from 1 (e.g., not boring at all) to 5 (e.g., very boring). The labels of the scale differed for each question.
Text comprehension. To test text comprehension, six multiple-choice comprehension questions were constructed for each text. Four of these questions could be answered by reference to information explicitly stated in the text, for example, "In which area do we find child labor most frequently?" with answer options (a) agriculture, (b) industry, (c) stores, (d) healthcare. The other two questions required the generation of an inference using two or more sentences. An example is "Why do 60 children die each day? (a) They do not have enough money to eat, (b) They are being abused, (c) They do not go to school, (d) They breathe in dangerous dust." Reliability analysis showed one of the 24 questions to be unreliable, and the question was therefore deleted from further analysis. The overall reliability of the remaining comprehension questions is good (α = .799).

Procedure
In the first phase of the study, the following skills were measured: decoding ability, vocabulary, short-term memory (digit/word/sentence), working memory, reading comprehension, and nonverbal intelligence. The vocabulary and comprehension task were administered in the classroom within different sessions. The vocabulary test was administered in one session of about 15 min. The reading comprehension test was administered in two sessions. The first session lasted about 40 min and the second about 50 min. The nonverbal intelligence test lasted between 30 and 45 min. The decoding speed, short-term memory (digit/word/sentence), and working memory tasks were administered individually in one session of about 20 min. The second phase was one eye-tracking session. In a separate and quiet room, participants were positioned in front of the eye tracker, with their dominant hand on a button box. Participants were instructed to silently read the texts and answer questions afterward. All instructions were read aloud by the instructor, and the children read along. After instruction, the eyes were calibrated using nine red fixation dots on a black background. To get acquainted with the setup and navigation, an example text consisting of two pages was presented. Children were informed that they could navigate back and forth, though we must note that very few students actually navigated back. After reading, six likeability questions and two example multiple-choice text comprehension questions were presented on the screen. Each question was presented on a separate screen, and students were not allowed to navigate back to the text or to previous questions. After the instruction, the four target texts were read, starting with the calibration procedure prior to each text. After reading, the students were given six likeability questions and six multichoice text comprehension questions. The order of the texts was counterbalanced across participants. The entire eye track session approximately 45 min per participant.

Data analyses
Fixations were calculated with a minimum duration of 80 ms and a maximal dispersion of 1°. To analyze the eye movement data, every word within the text was considered as area of interest (AOI). Several characteristics of the AOI were included in the analysis, such as length (z scores of the number of characters), word frequency (log transformed), the position in the sentence (dichotomous; 0 = nonfinal, 1 = final) and paragraph (0 = remainder of the text, 1 = heading, 2 = first, 3 = final). Word frequency scores for every word was adapted from a Dutch child corpus (Tellings, Hulsbosch, Vermeer, & Van Den Bosch, 2014) containing 11.5 million words and 5 million unique words from reading material (42% text books and tests, 38% books and magazine, and 20% other media).
Fixations were deleted if they were associated with moving the eyes to the beginning of the text and whenever they were longer than 1,200 ms (0.72% of the data). Thereafter, four eye movement measures were calculated for each AOI (Juhasz & Pollatsek, 2011;Rayner, 1998): (a) Skipping probability (S%): the chance a reader skips a word (binomial: 0 = read, 1 = skipped); (b) Gaze duration (G): the sum of fixation durations in milliseconds on the first encounter; (c) Regression probability (R%): the chance that a reader regresses from the target word (binomial: 0 = no regression, 1 = regression); and (d) Regression path duration (R): the sum of all fixations in ms rereading previous words, sentences, or paragraphs (including rereading of previous screens and the target region), before progressing to the next word. No third or fourth passes were considered. All durational measures were log transformed, and scores were deleted that deviated 2.5 standard deviations from the mean.
Separate linear mixed-effects regression models (Baayen, 2008) were run for skipping probability, gaze duration, regression probability, and regression path duration. The durational measures (gaze and regression path) were analyzed using a linear mixed effect model, and the probability measures (skipping and regression) were analyzed with a logit-linear mixed-effect model. For the effects of skills and eye movements on text comprehension, a logit-linear mixed-effect model was run.
Analyses were run using the following procedure. First, a full model was created including all main fixed and random effects, as well as interactions among the fixed variables. A backward stepwise selection procedure was used, 1 deleting all interaction effects that did not reach significance at the level of 5% on the analysis of variance Wald test (car-package). In a next step, all nonsignificant main effects were deleted. Finally, random slope effects were added for the fixed effects (main and interaction) in the model, to account for intraindividual, intraword, and intratextual effects. Random slope structures were calculated by comparing unreduced and reduced models, based on log-likelihood ratio tests. The fitted model was reexamined and insignificant fixed effects were deleted if necessary. Z values are reported for all logit-linear effect models, and t values are reported for mixed linear effect models. Table 2 depicts the correlations, means, SDs, and range of the student characteristics and text comprehension. Although some variables were moderately correlated, all VIFs were below 1.482, which indicates no problems with multicollinearity. Mean skipping probability, gaze duration, regression probability, and regression path duration for each region are presented in Table 3. Similar results are found in a forward elimination procedure in which the reduced and full model were compared based on loglikelihood ratio tests.

Descriptives
Effects of student-related and text-related characteristics on real-time reading behavior skipping probability To determine the effect of student characteristics and text-related characteristics on eye movements, a logit-linear mixed-effects regression model analysis was run on the full data set, including 42,790 trials. The full model included random effects of participant, word, and text. Also, main fixed effects of student-related characteristics (word decoding, vocabulary knowledge, short-term memory, working memory, comprehension skill, and nonverbal intelligence) and text-related characteristics (length, frequency, word position [sentence and text]), as well as two-way interactions of studentrelated and text-related characteristics, were included. The results of the final model are presented in Table 4. First, a positive main effect was found for word frequency, indicating that higher frequency words were skipped more often. Furthermore, a main effect was found for word position within the paragraph: Wald test, χ 2 (3) = 40.544, p < .001. Exploration of this main effect showed that words within the headings were skipped less often (30.8%) than in the remainder of the paragraph (38.6%), whereas the likelihood of skipping words in the first (33.2%) and final (39.4%) sentences did not differ. Finally, a main effect was found for wrapup effects, indicating that words in the final position were skipped less often (24.2%) than sentence nonfinal words (46.9%).

Gaze duration
To determine the effect of student characteristics on eye movements, a mixed linear regression model analysis was run on the gaze duration of each word in the text. About 56.5% of all words were read, resulting in a data set of 24,201 trials. The full model was identical to the one described for the skipping rate. Results of the fitted model are presented in Table 5. Random main effects were found for participant, word, and text. Also, a random slope was found for decoding efficiency within words.
Main fixed effects were found for student-related characteristics decoding and vocabulary, an indication that higher decoding and vocabulary skills are related to shorter gaze durations. Furthermore, main effects of text characteristics length, frequency, and word position within the paragraph-Wald test, χ 2 (3) = 29.595, p < .001-were found. The effects showed that longer words had longer gaze durations, whereas more frequent words have shorter gaze durations. With respect to word position within the paragraph, the results showed that students spent additional time reading the words in the heading and final clause of the paragraph compared to the words in the remainder of the paragraph. However, the first clause did not show significant differences compared to the remainder of the paragraph.
Finally, an interaction of comprehension skill and word position within a paragraph was found: Wald test, χ 2 (3) = 10.007, p = .019. Further exploration of this interaction showed an interaction of skill and gaze duration for the heading and first region. This effect indicates that readers better at reading comprehension spent relatively more time on the heading and first sentence of a paragraph. The effect of skill was not found for the final region, indicating that both skilled and less skilled readers read this region in a similar way as they read the remainder of the paragraph.

Regression probability
To determine the effect of student characteristics and text-related characteristics on regression probability, a logit-linear mixed-effects regression model analysis was run on all words that were read, resulting in 24,201 trials. The full model included random effects of participant, word, and text. Also, main fixed effects of student-related characteristics (word decoding, vocabulary knowledge, short-term memory, working memory, comprehension skill, and nonverbal intelligence), text-related characteristics (position of the word within a sentence and paragraph), and two-way interactions of student-related and text-related characteristics were included. Note that main effects of the text characteristics of word length and word frequency effects were not examined, as regions that are related to regressive eye movements (looking back to previous text segments) can vary in length and frequency.
The results of the final model are presented in Table 6 and show random effects for participant, word, and text. A fixed effect was found for decoding, indicating that higher decoding efficiency was related to fewer regressions. With respect to text-related characteristics, a main effect was found for word position within a sentence; regressions were more often initiated for sentence final words. Finally, two interaction effects were found: Both decoding and nonverbal intelligence were found to be related to sentence position effects. Wrap-up effects were larger for children with higher decoding efficiency and nonverbal intelligence. Hence, children with lower skill in decoding or nonverbal intelligence spent a similar amount of time of reading words within a sentence and final words, whereas children with good decoding skills or higher nonverbal intelligence spent relatively more time on sentence final words.

Regression path duration
A mixed linear regression model analysis was run on the reading time for all words that triggered look back behavior (15.5% of the read words). In total, 3,746 trials were included in the analysis. The full model was identical to the one of regression probability. Results of the final model are presented in Table 7. Random main effects were found for participant and word. Furthermore, main effects for decoding and word position were found. The main effect of decoding indicated that look back times were faster when decoding efficiency were higher. The main effect of word position within the paragraph-Wald test, χ 2 (3) = 32.524, p < .001-showed longer reading times for the final region compared to the remainder of the paragraph. The heading and first sentence did not show significant effects. In addition, two interaction effects were found in relation to decoding. First, decoding was found to interact with word position within a sentence. The interaction showed that higher decoding efficiency was related to shorter regression path durations. Finally, the interaction of the region and decoding-Wald test, χ 2 (3) = 23.854, p < .001-showed that students with low decoding efficiency spent more time looking back to previous regions than students with high decoding efficiency, but this was true only for the paragraph final sentence.

Effects of student-related and text-related characteristics on text comprehension
The second research question involved the relation of students' ability and their eye movements with text comprehension. A log-linear mixed-effects regression model with text comprehension score (correct vs. incorrect) as a dependent variable was run on a data set of 203,728 trials, including random effects of participant, text, and question. Further, main fixed effects of student-related characteristics (decoding, vocabulary, short-term memory, working memory, reading comprehension, and nonverbal intelligence), text-related characteristics (word length, word frequency, word position [sentence and paragraph]), and eye movement measures (skipping probability, gaze duration, regression probability, and regression path duration) were considered, as well as interactions among these variables. The final model is presented in Table 8. Random effects were found for participant, question, and text, as well as a random slope effect for text and skipping probability. A fixed main effect was found for nonverbal intelligence, indicating that a higher nonverbal intelligence was associated with higher text comprehension scores. Furthermore, several interactions were found-an interaction of decoding and skipping probability and an interaction of nonverbal intelligence and skipping probability.
Further exploration of the interaction of decoding and skipping probability showed that skipping words negatively affected the results of the students with lower decoding skill, whereas no such effect for children with higher decoding skills was found (see Figure 1a). With respect to the interaction of nonverbal intelligence and skipping probability, a similar interaction was found; skipping words had a negative influence on text comprehension for students with lower nonverbal intelligence but not for their peers with higher nonverbal intelligence (see Figure 1b).

Discussion
The aim of this study was to determine the role of student-related and text-related characteristics on real-time processes, on one hand, and their association with text comprehension, on the other hand. Regarding processing effects, this study showed decoding, vocabulary knowledge, and text-related characteristics to be related to eye movement measures. The effects on text comprehension showed that skipping probability moderated the effect of language and cognitive skills on comprehension and that text-related characteristics were not important in this respect.
With respect to the first research question, predictions regarding student-related effects on eye movement outcomes involved large effects of word decoding efficiency on eye movement measures, especially in early reading. Other literacy skills were expected to be of lesser importance. Interactions with text structure were expected, as experienced readers are more involved in strategic reading behavior (McNamara & O'Reilly, 2009). The results indeed showed strong effects for decoding efficiency on gaze, regression path duration, and regression probability. Vocabulary was found to be related to gaze durations but not to other eye movement measures. Other student-related skills were not found to be related to eye movements. These results are in line with previous studies showing faster reading times for developing skilled readers compared to less skilled readers (Blythe & Joseph, 2011;McMaster et al., 2014).
Furthermore, we hypothesized working memory to be related to regression measures, as a small memory span limits the amount of information available for bridging inferences and hence regressions are expected to be longer (Cain et al., 2001;Cain et al., 2004;Van Den Broek et al., 2001). Nevertheless, we did not find evidence that regressions are dependent on working memory (Swanson et al., 2009). This finding is in line with recent research on text reading, in which working memory effects for regression path durations were also absent in younger readers (De Leeuw et al., 2015). Nevertheless, we found effects of short-term memory in gaze and skipping duration, indicating that memory is related to reading processes but only within initial word processing and not within regression paths. Further, we found that several text-related characteristics influenced real-time processing. First, word length and word frequency effects for gaze duration were evident, which is in line with research showing longer and less frequent words to have longer reading times (Joseph et al., 2013). Second, clause wrap-up effects were found for regression measures but not for gaze duration or skipping rate. This is partly in line with the literature, as Kaakinen and Hyönä (2007) did find word position effects within sentences for gaze duration. Third, effects of word position within paragraph were found for Figure 1. Exploration of the interaction of decoding and skipping probability: (a) skipping words negatively affected the results of the students with lower decoding skill, whereas no such effect for children with higher decoding skills was found; (b) skipping words had a negative influence on text comprehension for students with lower nonverbal intelligence but not for their peers with higher nonverbal intelligence. gaze duration and regression path duration. For paragraph headings and final sentences, longer gaze durations were found compared to the remainder of the text. The heading effect is similar to effects found for adults (Hyönä et al., 2002) and children (Van der Schoot et al., 2008), although the effect reported is related to children's comprehension skills in general and no interaction with text comprehension scores was found. The lack of an effect on text comprehension could be due to a limited set of questions used in this study. Although the questions grasp the level of text comprehension, they did not give a detailed insight in different aspects related to comprehension (e.g., inferences, summarizing) With respect to the second research question, effects of student-related characteristics, textrelated characteristics, and eye movements on text comprehension scores, as well as interrelations among these variables, were explored. Memory capacity was expected to predict text comprehension, because a relationship between memory and inference making, as well as text comprehension, has been demonstrated in previous research (Cain et al., 2001;Cain et al., 2004;Van Den Broek et al., 2001). Our study indeed shows memory to be important, although we only found effects for shortterm memory and not working memory. One explanation for this result might be our memory measures. For short-term memory we measured both verbal and nonverbal components of memory, whereas for working memory only one nonverbal component was included. Hence, it could be the case that the verbal component within the short-term measure loads high on comprehension and the lack of such a component in working limits its predictive value.
Furthermore, several interactions between skipping probability with skills (decoding and nonverbal intelligence) were found. Two conclusions can be drawn. First, children's reading comprehension processes are different from those of adults, as this study does not confirm the association of regression path durations and reading comprehension found in adults (Schotter et al., 2014). It seems that younger readers' comprehension is mainly regulated by initial processing and not by monitoring behavior reflected by regressions. Second, eye movements (skipping probability) moderate the effect of the students' ability on reading comprehension. The results suggest that some less skilled readers adjust their reading (i.e., spend more time), resulting in higher comprehension scores, whereas less skilled readers who fail to compensate for their lack of skill will obtain lower comprehension scores.
Several limitations of this study should be addressed at this point. First, this study aimed for an ecologically assessment and thus tested reading in a natural reading environment with an eye tracker that was portable. As a result, the temporal and spatial resolution of the data is limited. Following Andersson, Nyström, and Holmqvist's (2010) calculations, we are confident that the temporal sampling error is reduced to a similar level as a 1000 Hz eye tracker, taking into account the large number of data points that were included in the analyses. Nevertheless, problems related to the spatial resolution cannot be resolved. A low spatial resolution might result in suboptimal fixation detection, which in turn could result in fewer fixation points and hence higher skipping rates. This is especially important with respect to faster readers, as their number of data points is limited. This could be an explanation for our study to report skipping rates of approximately 50% compared to the 40% skipping rates in previous research (Blythe, Liversedge, Joseph, White, & Rayner, 2009). As we found relatively large skipping probabilities for less skilled readers, the length of the texts may have caused the large amount of skipping. Because our texts are long compared to other studies (e.g., Blythe et al., 2009), this might have caused reader fatigue or mind wandering (Nguyen, Binder, Nemier, & Ardoin, 2014), causing readers to skip more words. Nevertheless, the results reported in this studied should be taken with caution and need replication using more sensitive eye tracking equipment, especially as the effects reported on comprehension rely on skipping probability measures.
A second limitation is the text comprehension questions. The questions were limited in both number and diversity, and hence provide a limited insight in the product of mental model building. Further research should aim to disentangle effects of different product-related mental model measures, such as summary writing, recall tasks, or differences between implicit and explicit questions. As Lorch and Lorch (1996) pointed out, headings might affect free recall tasks and not summary writing. Third, the results of this study do not help us understand why skipping probabilities moderate the effect of student-related characteristics on text comprehension. Skipping could be a reflection of the skills of the readers, but could also be caused by reader fatigue or mind wandering. It is also a possibility that a strategy may be more or less useful, depending on the reader's skill. Future research should therefore focus particularly on the eye movements of poor readers to determine cause and effect relations in reading longer expository texts.
In summary, this study shows in what way student-and text-related characteristics are associated with eye movements and how these factors influence text comprehension scores. The most important implication of this result is that less skilled readers should not only solely train reading speed when they want to become better comprehenders. Increasing reading speed for this group could also lead to poor comprehension scores, and therefore these students should learn how to compensate for their lack of skill. Concluding, this study adds to the understanding of the process and product of reading of fourth graders in relation to student characteristics and provides suggestions for further work to understand the primary relations and direction of causation.