Learning and Instruction

For children who may face reading di ﬃ culties, early intervention is a societal priority. However, early intervention requires early detection. While much research has approached the issue of identi ﬁ cation through measuring component skills at single timepoints, an alternative is the utilisation of dynamic assessment. To this point, few initiatives have explored the potential for identi ﬁ cation through progress data from play in digital literacy games. This study explored how well growth curves from progress data in a digital intervention can predict reading performance after gameplay compared to measuring component skills at a single timepoint (school entry). 137 six-year-old students played the digital Graphogame for 25 weeks. Latent growth curve analyses showed that variation in trajectories explained variation in literacy performance to a greater extent than risk status at school entry. Findings point to a potential for non-intrusive reading assessment in the application of a serious digital game in ﬁ rst grade.


Rationale
Learning to read is one of the most important skills children will acquire in the early years of school and difficulties in acquiring this skill can have adverse educational outcomes (McLaughlin, Speirs, & Shenassa, 2014), vocational outcomes (McLaughlin et al., 2014;OECD, 2013) as well as a negative impact on both physical and mental health (DeWalt, Berkman, Sheridan, Lohr, & Pignone, 2004). Thus, for children who may face reading difficulties, early intervention is a societal priority. However, early intervention requires early detection. The importance of acting early is indicated by research showing the positive effects of early literacy interventions (Catts, Nielsen, Bridges, Liu, & Bontempo, 2015;Dion, Brodeur, Gosselin, Campeau, & Fuchs, 2010;Solheim, Fritjers, Lundetrae, & Uppstad, 2018).
Despite ever-increasing knowledge in the field of reading assessment, early detection of difficulty remains an error-prone process. Successful reading is dependent on the integrity of a number of different perceptual, cognitive and linguistic skills (Pennington et al., 2012) and so, typically, assessments that aim to identify the risk, or overt manifestation of a reading difficulty need to measure a number of component skills, including phonological awareness, letter knowledge or word decoding, verbal short-term memory, rapid automatised naming and oral language (Pennington & Lefly, 2001;Thompson et al., 2015). While interdependent, each of these skills will have a specific developmental timeline, potentially with uneven rates of changeletter knowledge, for example is an assessment measure that may only be fully sensitive to individual differences during the first few months of a child's literacy instruction. However, within that optimal time window measurements of letter knowledge may allow for strong prediction of word reading ability in subsequent school years (Georgiou, Torppa, Manolitsis, Lyytinen, & Parrila, 2012;Puolakanaho et al., 2008). This example also points to the fact that the relative predictive ability of certain measures in relation to others may change over time (Solheim, Torppa, Uppstad, & Lerkkanen, 2020). For these reasons, while our ability to predict children's risk of reading failure is arguably stronger than it has ever been, relying on measures recorded at a single point in time, to characterize a dynamic and constantly changing skill can still result in over-or under-identification of risk (see e.g. Speece, 2005).
Partially in response to this challenge of accurate detection, identification of specific reading disabilities in schools has moved towards a model in which detection of a difficulty is defined not in terms of assessments carried out at a single time point, but rather, in terms of an individual's response to intervention, or "RTI" (Gersten, 2009;IDEA, 2004). Considering a specific reading disability such as developmental dyslexia, a disability of neurobiological origin characterised by impairments in decoding, word reading accuracy and fluency (Lyon, Shaywitz, & Shaywitz, 2003), the definition provided by the DSM-5 https://doi.org/10.1016/j.learninstruc.2020.101348 Received 1 November 2019; Received in revised form 28 April 2020; Accepted 30 April 2020 (American Psychiatric Association, 2013) reiterates the need to consider a response to intervention in identification, stating that a diagnosis of dyslexia can only be made if difficulties have persisted for at least 6 months despite the provision of extra help or targeted instruction. This approach to identification of difficulties thus takes into account dynamic assessment data from multiple time points.
While research that has evaluated the effectiveness of this approach is generally supportive (Gersten et al., 2009;Gersten, Newman-Gonchar, Haymond, & Dimino, 2017) it is very dependent upon both the nature of the intervention used as well as reliable assessment measures that have the ability to sensitively and specifically capture the development in reading abilities brought about (Gersten et al., 2017). One way to increase the degree of alignment between intervention and assessment is to collect and analyse progress data from within the intervention itself in order to more directly observe response to intervention. Digital interventions arguably make within-activity progress data easier to automatically capture and researchers have started to successfully exploit this approach within education (Shute, Leighton, Jang, & Chu, 2016;Shute, Wang, Greiff, Zhao, & Moore, 2016); however, this approach has only to a small extent been implemented within literacy instruction. In this study we capitalised upon the progress data generated by a digital reading intervention called Graphogame to explore this methodology further. The novelty of the current study is validated by a recent review of the literature on this specific game showing that no existing study had taken advantage of the available progress data (McTigue, Solheim, Zimmer, & Uppstad, 2020).
GraphoGame is a play-like, internet based learning platform that provides children with training in phoneme awareness, letter-sound and early word decoding training. It was originally devised by researchers at the University of Jyväskylä in Finland with the aim of free delivery to the end user (Lyytinen, Erskine, Kujala, Ojanen, & Richardson, 2009;Lyytinen, Ronimus, Alanko, Poikkeus, & Taanila, 2007). Since its inception in Finland and promising initial findings, the game has subsequently been adapted for at least 10 alphabetic languages of varying orthographic depth, across more than 20 countries in four continents (Africa, Europe, North America, South America). The flexibility of the web-based platform means that while the basic game content remains constant across languages, researchers from countries adapting the platform can work with the Finnish developers to determine the educational/linguistic progression through letters, syllables, and words, as well as the level of challenge and adaptation.
The content adapts to the individual player according to actual performance in identifying letters, syllables or words matching auditory stimuli played through headphones. The adaptation algorithm of the game ensures a consistent balance in trials between challenge and mastery, based on the individual player's previous performance. At a certain proficiency level the algorithm provides timed target items and distractors, pushing the player to faster identification. Thus, during the course of game play, a child has the opportunity to progress to more difficult items, if and when, they demonstrate mastery of more foundational content. Graphogame is one of the minority of computerised reading interventions that has an emerging evidence-base exploring its efficacy (McTigue et al., 2020).

Study objectives
The intent of this study was to take advantage of the extensive progress data a digital intervention can provide and look at the utility of process data to predict future reading performance.
This study was carried out as part of the larger 'On Track' study (n = 1199), which investigated the effects of early intervention for children at risk for reading difficulty (Lundetrae, Solheim, Schwippert, & Uppstad, 2017). The study was located in Norway, where children start school when they are six years old and assessment for reading difficulties typically occurs at the end of the first year of schooling. In order to try and reduce the incidence of reading difficulties, the aims of the 'On Track' project were to develop screening tools to detect reading difficulty risk at an earlier stage -at school entry -as well as measure the effects of reading interventions carried out in the first year of schooling. Details of the screening measures, which included traditional predictors of reading risk such as letter knowledge, rapid automatized naming and phonological awareness, are provided in the Methods section below. Children's performance on the screening test was used to create an overall risk index, which was used within the current study as a variable with which to compare to game-play progress.
Game-play progress itself was captured from digital log data, obtained for a mixed-ability group of 137 six-year-old children playing Graphogame regularly over a 25 week period. The children played the game during their regular classroom literacy time at a similar level of frequency and intensity for all children in the class, thus it could be seen as equivalent to a Tier 1 intervention. It was predicted that initial risk status could explain variation in children's game progress trajectories. This study sought first, to validate this prediction, but then secondly and more crucially, to explore whether initial risk status or game progress data was the more accurate predictor of literacy skills measured at the end of the first year of schooling.
As part of the validation, the 'On Track' sample also allowed us to look at the explanatory power of other learner characteristics that could influence game progress trajectories: Firstly, the sample included a proportion of children who had parents who did not speak a Scandinavian language at home (Norwegian, Swedish or Danish), for whom Norwegian would be a second language (L2), as opposed to a first language (L1). Across both consistent and inconsistent alphabetic orthographies (Verhoeven, 2000) research suggests that children learning to read in an L2 typically have equivalent decoding skills to their L1 peers (August & Shanahan, 2017;Lesaux, Rupp, & Siegel, 2007), though often poorer oral language (Lervåg & Aukrust, 2010;Proctor, Carlo, August, & Snow, 2005). Given that Graphogame has a primary focus upon decoding skills, we predicted that the growth curves of game progression for children with different home language backgrounds would not differ. Secondly, wider research into engagement with computer games has demonstrated gender preference factors which can negatively impact levels of engagement and motivation to play for both female serious game players (Alserri, Zin, & Wook, 2018) as well as non-serious game players (Chou & Tsai, 2007). While many of the existing studies focus upon young adults, it was felt important in this study to explore whether student gender was a significant factor in explaining variation in game progression.
Accordingly, this study asked: 1. To what extent a) children's risk status, as measured at school entry, b) second language status (SLS) and c) gender, explain variation in growth curves of game progression? 2. In comparison to risk status, as measured at school entry, how well does variation in growth curves of game progression predict literacy performance measured after gameplay?

Participants
The children in this study were all enrolled as part of the larger 'On Track' study (n = 1199), which investigated the effects of early intervention for children at risk for reading difficulty (Lundetrae, Solheim, Schwippert, & Uppstad, 2017). Children were on average 6.2 years old when the study began at school entry (range 5.5-6.7). The schools were a convenience sample within close traveling distance of the region, and were recruited during the spring of 2014. Their scores on the national reading tests had been close to the national mean (1.5 ± 0.1 on a scale from 1 to 3) in two of the three previous years. The On Track sample included 19 schools, whereof 17 were included in a randomized controlled trial. The two remaining schools were included in the present study: the "On Track GraphoGame Extension". All first grade students from the two schools (7 classrooms) were invited to participate in the study. Parental consent was given for 97.7% of the students. Children with reported hearing difficulties, as identified by parent report, were excluded from the sample. The final sample included 137 students.

Procedure
At the beginning of the study, the reading readiness skills of all children were screened during the first four weeks after school entry in Grade 1. Parents answered a questionnaire relating to demographics, home literacy environment, familial risk of RD, the student's language background, and child health. Regarding language background, information was obtained regarding the languages each parent spoke at home and whether these were Scandinavian or non-Scandinavian. For the analyses reported here, given the small overall number of children exposed to a language other than Scandinavian at home, this variable was dichotomized whereby children were divided into those where either no parents or one parent who spoke a non-Scandinavian language at home (n = 119; 86.9%) versus children where both parents spoke a non-Scandinavian language at home (n = 18; 13.1%).
Risk status for reading difficulties was determined by combining screening scores on four individually-administered, tablet-based measures of pre-literacy skills, to generate a student risk index:

Letter-sound knowledge
Using a 15-item multiple choice format, pre-recorded letter sounds were presented and the student was to identify the corresponding upper-case letter. The student responded by pressing one of four letters appearing on the screen. Reliability in the overall 'On Track' sample as measured by Cronbach's alpha was .85.

Rapid automatized naming (RAN)
Children were required to name familiar objects presented simultaneously on a white background in random order. The stimuli were illustrations of the monosyllabic Norwegian words for 'sun', 'car', 'plane', 'house', 'fish' and 'ball'. Twenty stimuli were presented in a 4 × 5 matrix, with a unique matrix presented for each of two trials. The student was asked to name each stimulus as quickly and accurately as possible, working from left to right and top to bottom. A practice session ensured that the student could name all the objects and understood the task. For each trial, both the completion time (in 1/100ths of a second) and naming errors were recorded.

Phonemic awareness (PA)
PA was measured by means of eight phoneme-isolation and eight phoneme-blending tasks. Both tasks were ordered by difficulty (easiest first) and was automatically discontinued after two subsequent errors. Phoneme-blending required the student to blend a sequence of phonemes into a word. Pre-recorded stimuli were presented at a rate of one phoneme per second: "Here you see pictures of/ri/,/rips/,/ris/, and/ ring/[English: 'ride', 'redcurrant', 'rice', 'ring']. Listen carefully and press the picture that goes with:/r//i//s/". The tester pointed at the objects shown in the pictures as they were named. Students responded by pressing one of the four pictures. Reliability was = α 0.87. The phoneme-isolation task required the student to isolate and pronounce the initial phoneme in words. Students responded orally, and the tester scored the response on the tablet. Reliability in the 'On Track 'sample (Cronbach's α) was 0.92.

Determination of risk status for reading difficulties
Children who scored below the 30th percentile in any of these tests accumulated one risk point. The children also got an additional risk point if at least two close relatives reported having reading difficulties, resulting in a risk score of 0-5. Because the study sample was selected to be representative of the typical range of ability observed within primary school classrooms, over half the sample (51.8%) exhibited no risk behaviours, with increasingly smaller groups of children exhibiting cumulative risk scores, and no child receiving a score of 5. Given the small number of children scoring four risk points (n = 5), this group was combined with those scoring three risk points, to avoid having analysis subgroups with very small sample sizes. Table 1 documents the background variables for the study participants, in terms of gender, language background and risk status at school entry.

Graphogame intervention
Starting within the same school term, all children commenced upon a schedule of playing the early literacy serious game, Graphogame, 10 min a day, four times a week, over a 25 week period. Schools were provided with tablets, loaded with the Graphogame software by the research team. Teachers were advised to include children's Graphogame within regular classroom literacy time, and all game play was automatically logged. The version reported here is the Norwegian version of Graphogame, adapted by the researchers from both the University of Jyväskylä and the Norwegian Reading Centre, University of Stavanger. The Norwegian version of GraphoGame consists of nine mini-games with immediate feedback and a motivational reward system (each minigame presents the same content but in different play scenarios to maintain engagement). The reward system is managed via a personal avatar, created at the very start of the game. Further details about the technical specifications of the Norwegian Graphogame are reported in Njå (2019).
To operationalize progress through the game we first segmented the 25 weeks of game play into five measuring periods of five weeks (excluding holidays). While progress could be measured at even finer gradations, e.g. daily, the decision to use five-weekly intervals, provided enough time points to enable growth curve modelling, while at the same time accommodated the intensity of the data extraction process for aspects where manual input was needed. Given the individualized and adaptive nature of play within each of these periods, children's progression through subsets of content would vary.
Game data from five evenly spaced time intervals, 5 weeks apart was extracted for the purposes of this study. Firstly basic data on the amount of game play was exported from the website Grapholearn.comfor each child this included the number of days played, number of trials played and time spent playing trials in the game. The second set of data was a manual extraction from the database server, where additional aggregated data was available. This data extraction included children's progress in terms of items known by the last play session within each time period. Within the game, subsets of items -letters, syllables or wordsare incrementally added to game play in a consistent order. These subsets largely increase in difficulty over the course of game play and introduction of a new subset is contingent upon performance mastery of existing subsets. The letter content is organized in three subsets of eight letters each (total letters = 24), the syllable content is grouped into 22 subsets (total syllables = 272) whilst the word content is grouped into 90 subsets (total words = 434). For each child, items known at end of each time period were indexed in terms of content type J.M. Thomson, et al. L e a r n in g a n d I n s t r u c t io n 6 8 ( 2 0 2 0 ) 1 0 1 3 4 8 -letter, syllable or wordas well as subset number. For the purposes of looking at game progression the three content types are kept separate. This is due to a) the significantly different number of subsets for each content type and b) differential probabilities of receiving letter, syllable or word content due to parameters set into the original game design. For example, until a player demonstrates mastery of at least 40 percent of the letter content, they will not be exposed to syllable content. Once exposed to syllable content, the level of mastery here will influence the probability of receiving either letter content or (if doing well), word content.
In this article we are reporting the analyses that focus upon progression with word level content. The majority of children moved very quickly through the letter level of the game and so this data was deemed less informative for measuring change over time. All the analyses reported here were subsequently carried out at both syllable and word levels, yielding the same pattern of findings for both. Given that the word level of content has the largest item pool and allows us to observe the most advanced level of progress children make within the game as a whole, it was decided to focus the current analysis on word level progress. Specifically, progress is operationalized as number of 'words known' by the end of each 5 week measurement period. This variable is determined by the proportion of correct responses accrued for specific word targets presented in the game sequence, representing within-game skill mastery. To reduce skewness and kurtosis in word level scores, the raw scores were rescaled into thirteen levels of withingame reading proficiency, ranging from level 1 to level 13. We refer to these reading variables as W 1 to W 5 , W i refers to reading level at the ith wave of measurement. In addition to reading level as measured by … W WW ,,, 12 5 we also include in Table 2 descriptive statistics for number of hours spent playing GG in each of the five periods. We refer to these variables as … TT T ,,, 12 5 . Table 2 indicates, as expected, that reading scores improve over time. In contrast, the time spent playing GG does not increase or decrease over time, which was also expected. The median/mean total time spent playing GG was approximately 8.5 h, while the first and third quartiles were, 7.5 and 9.5 h, respectively. The children therefore spent on average 1.5 h less time playing GG (10 min four times a week for 25 weeks totals 10 h of playing time) than expected from the instructions. Table 2 also contains statistics for the post game literacy (PGL) measure, further described in the next subsection. The excess kurtosis and skewness values reported in Table 2 suggest that the distribution of each of − W W 15 , − TT 15 and PGL may be considered to approximately follow a normal distribution. We also tested the multivariate vector comprised by these variables with Mardia's test for multivariate kurtosis, and found no evidence for a departure from multivariate normality ( =− z 0.97, p-value = 0.33). Pearson correlations among the longitudinal variables, in addition to At-risk, are given in Table 3. Note that time spent playing GG is overall weakly positively associated with increasing reading scores, which suggest that we should ultimately control for time spent playing GG when considering the longitudinal development of reading levels.

Reading assessment at the end of grade 1 (post game literacy -PGL)
At the end of grade 1, the children's word reading was assessed using a subtest from the Norwegian National assessment test. The subtest consisted of 14 items, with a time limit of 2 min. Each item consisted of a picture followed by four visually similar words, whereof one corresponded to the picture. Following a practice item, the child was asked to read the words as fast as possible and to check the word that matched the picture. E.g. a picture of a fish ('fisk' in Norwegian) followed by 'fiske', 'fikse', 'fiks' and 'fisk'. The correct stimuli was presented in a random order. Number of correct words was measured (maximum = 14).

Analysis
The central concern of longitudinal research revolves around the description of change over time, and to find determinants of this change. That is, we want to understand interindividual differences in intraindividual change, and latent growth curve modeling (Bollen & Curran, 2006) is well-suited for this. This approach uses latent variables (e.g, α and β 1 ) to account for variation within and between individuals. Time is accounted for by fixing certain parameters in the model. Variation in individual starting point is accounted for by α, whose effect on the observed scores is fixed to 1 across time. Variation in growth between individuals is accounted for by latent variables β 1 (linear growth) and β 2 (quadratic growth). The effect of β 1 and β 2 on the observed scores are fixed to values that reflect linear and quadratic growth, re- flectively. To answer our research questions, we therefore fit a series of latent growth models using the package lavaan (Rosseel, 2012) in the R software environment.
Given the lack of evidence for multivariate non-normality reported in the previous section, we employed normal-theory maximum likelihood estimation for all our models, while fit statistics were based on the normal-theory based chi-square statistic.
First, in order to establish whether a linear growth trajectory is sufficient to describe the data, we fit the unconditional linear growth model, referred to as M 1 , depicted in Figure 1a. In this model, the trajectories are assumed to follow a linear trend, entirely explained by the random coefficients latent intercept (α) and slope (β) variables. The second model is the unconditional quadratic growth model, referred to as M 2 , and depicted in Fig. 1b. This model is an extension of the base model M 1 , where an additional random coefficient β 2 is included to allow the trajectories to follow a quadratic curve. The relative model fit of M 1 and M 2 may be compared statistically with a nested chi-square test in order to establish whether M 2 is a significant improvement over M 1 .
After establishing whether a linear or a quadratic form is most suitable for the observed trajectories, 1 we fit a conditional model, in which the random coefficients are predicted by the time-constant variables At-risk, gender and SLS, see Figure 2a . We refer to this model as M 3 . Finally, to take into account the variation in time spent playing GG, time-varying covariates … TT , 15 were embedded into model M 3 , and we refer to the resulting model as M 4 , see Fig. 2b. Whether M 4 fits the data substantively better than M 3 may then be decided with a nested chi-square test. In models M 3 and M 4 , of primary interest are the effects of At-risk, gender and SLS on the random coefficients α and β, since these relate directly to our first research question.
To answer our second research question, we added a measurement of post-game literacy performance to the best fitting model of M 3 and 1 Cubic trajectories were also estimated, but did not lead to improved model fit compared to the quadratic model. J.M. Thomson, et al. L e a r n in g a n d I n s t r u c t io n 6 8 ( 2 0 2 0 ) 1 0 1 3 4 8 M 4 , referred to as M 5 . In this model the growth curve coefficients (α and β) are specified as predictors of post-game literacy performance, and of primary interest is the effects that these have on post-game literacy. A simplified path-diagram of M 5 is presented in Fig. 3.
To assess the goodness-of-fit of the sequence of models − M M 15 in order to choose the best-fitting model, we rely on comparing fit indices like RMSEA, CFI and SRMR across models. In addition, we also took note of Akaikes Information Criterion (AIC). Formal tests of the equality constraints imposed when moving from one model to another were conducted using the chi-square test of nested models.

Results
Our first step was to compare the fit of the linear unconditional model M 1 to the quadratic unconditional model M 2 .I nTable 4 are   J.M. Thomson, et al. L e a r n in g a n d I n s t r u c t io n 6 8 ( 2 0 2 0 ) 1 0 1 3 4 8 presented fit measures for all the estimated models. Clearly, the quadratic model fits the data substantially better than M 1 . Also, the chisquare difference test of nested models rejected the linear model relative to the quadratic model ( = χ ∆ (4) 174.44 2 , p-value <0.001). We therefore decided to proceed to the conditional models using a quadratic trajectory model. Next, we estimated the conditional quadratic model M 3 , see Table 5 for model estimation results. Neither SLS nor gender predicts any of the random coefficients α β , 1 and β 2 at the 5% level of significance. In contrast, At-risk is a significant predictor of all three coefficients. In the following models, we therefore exclude SLS and gender as predictors, and retain At-risk, for parsimony. A formal chi-square test of nested models confirmed that this removal did not diminish model fit( = χ ∆ (6) 6.03 2 , p-value = 0.42). The resulting modelimplied growth trajectories, one for each level of At-risk, are plotted in Fig. 4. The Figure suggests that the more at risk a student is for reading difficulties, the lower the expected trajectory starts out, and the slower the progress is expected to be.
In order to take into account the amount of game playing, we next fitted model M 4 . The fit statistics in Table 4 suggest that M 4 has a fit similar to that of M 3 , since two of the statistics, RMSEA and AIC, favours M 4 , while the other two statistics, CFI and SRMR, favour M 3 . Model estimation results for M 4 are given in Table 6. The regression coefficient γ relating T i to W i for =… i 1, ,5 is highly significant and implies that an increase in playing time of 1 h during the interval between any two measurements will on average be associated with an increase in reading score of 0.65 units. Note that taking into account the effect of time spent playing GG, the effect of At-risk on the random growth coefficients does not substantively change compared to M 3 .
To shed light on our second research question, we estimated model M 5 . This model assesses the effect of growth trajectory on post-game literacy. The model estimates are presented in Table 7. It is seen that the form of trajectory is significantly related to post-game literacy. For instance, an increase in initial GG score of one unit will on average be associated with an increase in post-game literacy score of 1.19. Likewise, the slope and quadratic coefficients are significant predictors of post-game literacy. Importantly, the R 2 of the dependent variable PGL in model M 5 was 0.56. That is, the model accounts for more than half of the variation in post-game literacy. We also estimated a linear regression model with post-game literacy score as dependent variable and atrisk status as independent variable. In this model R 2 was 0.20. That is, while at-risk status can explain 20% of the variation in post-game literacy scores, the GG growth trajectory accounts for 56% of the variation in post-game literacy scores.    J.M. Thomson, et al. L e a r n in g a n d I n s t r u c t io n 6 8 ( 2 0 2 0 ) 1 0 1 3 4 8

Discussion
This study set out to explore the extent to which variation in growth curves of Graphogame progression could predict literacy performance measured post-gameplay. Through the application of growth curve modelling, it was found that variation in trajectories predicted literacy performance post game-play to a much greater extent than did risk status as measured at school entry. Furthermore, as part of an initial validation of the relationship between risk status and game play progress, it was found that while neither second language status, nor gender explained significant variation in the growth curve parameters, children's risk status was a significant predictor of all three growth coefficients: the more at risk a student is for reading difficulties, as measured at school entry, the lower the expected trajectory starts out after an initial five weeks of play, and the lower the subsequent progress is expected to be.
To our knowledge, this is the first study to apply growth modelling to digital literacy instruction data, in order to better understand the trajectories of children's progress through the game, and factors that may influence this. Digital reading programmes are increasingly being used internationally to support early reading instruction for children both with and without risk for reading difficulties. Findings from the present study point to a potential for an additional use of gameplay, i.e. utilizing data on children's progress from within a game for assessment purposes. A non-intrusive assessment like this could reduce the time currently spent on assessing students and leave more time for their learning. However, more research will be needed to fully explore this possibility.
Digital learning programmes themselves are often treated like a "black box" (Latour, 1987), with attention paid to the outcomes of the play (Boyle et al., 2016;Connolly, Boyle, MacArthur, Hainey, & Boyle, 2012), rather than how the games work and interact with users (Gaydos, 2015;Lämsä, Hämäläinen, Aro, Koskimaa, & Äyrämö, 2018;Njå, 2019). The data presented here offers an initial glimpse into the black box of a specific game. Graphogame has been designed as a preventative tool and in its original inception was designed for children at risk of dyslexia (Lyytinen et al., 2009). In the present study the children designated as most at risk of reading difficulties made the least progress within a 25 week period. Such a finding raises many questions and prompts us to actively consider what a successful response to intervention is for struggling readers. It is firstly important to acknowledge that without more fine-grained investigation, the direct relationship between game progress and generalisable literacy learning is not fully quantifiable. However, this slower progress nonethelesss provides noteworthy information with alternative interpretations available. One possibility is that this is an encouraging finding -for a group of children with demonstrated difficulties in reading-related skills, they have been able to progress through the game and reach the word level of the game which requires a certain level of mastery of both letters and syllables. Alternatively, we can ask, could game parameters in terms of e.g. the challenge level, rate of lexical progression and type of feedback be further optimised for this group. We hope that the analysis here can act as a catalyst for subsequent interrogation of the game data at a microlevel in order to yield further answers. This is an explicit and novel example of using the 'big data' that serious games yield to clearly document children's response to intervention, and again, goes one step beyond approaches that rely more solely on more isolated measures of pre-and post-intervention performance. The trajectories used in the analysis spanned 25 weeks of playing time, yet scrutiny of the growth curves (see Fig. 4) suggests that group differences could potentially be determined within a shorter interval of play. A future combination of screening for literacy skills at school entry, alongside a focused period of serious game play could provide new sensitivity and specificity to the identification of reading difficulty risk, as well as providing reinforcement and practice of essential early literacy skills.
We turn now to the variables that did not predict children's growth trajectories. Regarding second language status, previous research using single time-point assessment data supports the notion that children with L2 and no other risk factors typically have equivalent decoding skills to their L1 peers in the face of potential vulnerabilities in reading comprehension e.g. (August & Shanahan, 2017). This dataset goes one step further and suggests that the trajectory of progress for L2 children as a group, through an instructional game, is also not distinguishable from an L1 child. Most crucially, this observation is in contrast to findings for L1 or L2 children with distinct risk factors, as measured by school-entry assessments of letter-sound knowledge, phonological awareness and rapid naming, as well as indicators of familial risk; as Fig. 4 shows, cumulative risk status significantly, and deleteriously impacts a child's progress trajectory (in the sample reported here, 66.67% of the L2 group were in risk groups 0-1, 16.67% were in risk group 2 and 16.67% were in risk group 3).
The study further found that while reading risk status had a significant impact on children's progress through the game, their gender did not. This is taken as a positive finding in terms of equality of  J.M. Thomson, et al. L e a r n in g a n d I n s t r u c t io n 6 8 ( 2 0 2 0 ) 1 0 1 3 4 8 learning outcomes. No equivalent data is available looking at young children's progress through a literacy serious game, though studies of older youth have reported that boys can be more motivated to play computer games (Chou & Tsai, 2007) and what gender differences are present in the type of games that appeal to boys versus girls, with boys tending to prefer action/fighting games, with girls more drawn to social games and virtual worlds (Alserri et al., 2018). Potentially Graphogame is well-placed between these extremes and so does not necessarily play explicitly towards the playing preferences of one gender over another. Or alternatively, the playing context here, where there was not a choice of an alternative game, and periods of play were managed overall by the lesson time allocated, meant that any possible gender preferences that were present within the children did not have an opportunity to manifest in the data collected. It is also important to note that within this sample, there were no significant gender differences in post-play literacy performance, measured at the end of grade 1.

Limitations
One limitation of current study is the relatively small sample size (n = 137), and in addition the small proportion of children for whom both parents spoke a non-Scandinavian language (n = 18). The ability of children's at risk status to significantly predict all three growth coefficients within the current sample size potentially points to the robustness of the effect, however a further study with a larger population is warranted. Regarding the second language status of the sample, the current design was a convenience sample, with the proportion of children living in homes where both parents spoke a non-Scandinavian language equivalent to local norms. However, in order to more systematically validate the findings reported here, it would be important to try to actively recruit more children exposed to non-Scandinavian languages at home, to allow comparison of more equally sized groups. It would also be valuable to look more specifically at the role of oral language ability on game progress, across the ability spectrum.
In addition, as noted above, a challenge for any intervention research is the issue of inferring consolidated, generalisable learning from the successful completion of game activities. The variable of game progress used in this study was that of the number of 'words known' by the end of each 5 week measurement period, determined by game algorithms from the proportion of correct responses accrued for specific word targets presented in the game sequence. A further step in this work would be to see how reading performance for the same words outside of the game was impacted by within-game progress.

Conclusion
This study provides a first attempt to use the extensive progress data a digital intervention can provide, to predict future reading performance. The progress data reported here yielded critical new insights into the impact of reading risk status on progress through a digital literacy intervention in Grade 1. It also confirmed the predictive role of response to intervention in understanding trajectories of learning to read.