Computer-supported early literacy intervention effects in preschool and kindergarten: A meta-analysis

Abstract A meta-analysis was conducted on the effects of computer-supported early literacy interventions (strict phonological awareness training, combined phonological awareness and letter training, and use of e-books) on phonological-awareness (syllabic awareness, word blending, rhyme, phoneme awareness) and reading-related skills (concept about print, letter knowledge, decoding, spelling) across different languages in preschool and kindergarten since 1995. A total of 59 studies were identified with a total amount of 339 effect sizes, involving 6786 preschool and kindergarten students. A multilevel approach was followed to estimate the average effect size and to examine the moderation of effects. On average, a small positive effect size of 0.28 with 95% CI (0.21, 0.35), was evidenced across treatments and across outcome measures. Large variation in effect sizes was observed between studies and especially between comparisons within studies. If the intervention was part of an integrated learning system in the classroom, the effects were better. The present analysis also shows the importance of methodological rigor of study designs being used in that effect sizes were higher in cases of absence of randomization and comparisons with classroom teaching as usual instead of an active control group.

2002). Although implicit sensitivity to language starts already early in life, there is a spurt in the development of phonological awareness in preschool and kindergarten (Torgesen & Mathes, 2000). This is often promoted by playing games that help children to focus on the phonological constituents of language. Given the intrinsic relation between phonological awareness and literacy, it is assumed that a combined training of letters and sounds can be even more effective (Bryant & Bradley, 1985). It has also been argued that early literacy training should be offered to children in a meaningful context. Research has indeed shown that interactive activities during storybook reading may help children to get insight into the functions and structure of written language and to uncover the written code (see Mol, Bus, & de Jong, 2009).
In the past two decades, there has been an enormous flux of computer-based programs to facilitate children's early literacy. Review studies have shown that such programs are promising in that a tailored approach can be offered in response to the needs of the individual child (Bus & van IJzendoorn, 1999;Mol et al., 2009). Although technology-enhanced learning has become mainstream in many schools, it is by no means clear under what conditions software programs are most effective. It is also not clear whether effects of early-literacy treatments are sensitive to the language involved. In older studies, the focus was mainly on literacy learning in English, which can be considered an opaque orthography, whereas in more recent studies technology-enhanced early literacy learning in more transparent languages was also focused on. Therefore, in this meta-analysis an attempt will be made to evaluate the differential outcomes of a broad variety of computer-based training programs fostering children's early literacy across different languages in preschool and kindergarten.

Development of early literacy
When children are engaged in an environment with literates, they have the opportunity to learn that print carries meaning, that written texts may have various forms and functions, and that ideas can be expressed with conventional writing. In the case of alphabetic languages, children learn that words consist of phonemes which can be represented by letters. There is general agreement that in the case of alphabetic writing systems the acquisition of literacy involves the rediscovering of the principles of phonological recoding with phonological awareness as key component (Castles, Rastle, & Nation, 2018;Ehri, 2005Ehri, , 2014. Phonological awareness refers to the access to and the understanding of the sound structure of oral language (e.g., Caravolas, Lervåg, Defior, SeidlováMálková, & Hulme, 2013). Phonological awareness requires children to reflect consciously on the phonological segments of spoken words and to manipulate them in a systematic way. As such, phonological awareness depends on the capacity to focus attention on the perceptual representations of speech (Tallal, 2004). It can be assessed by tasks measuring segmentation, blending, and manipulation of speech sounds (Hogan, Catts, & Little, 2005;Vloedgraven & Verhoeven, 2007).
Research has shown that across languages children follow more or less the same sequence of phonological awareness development from access to larger phonological units to smaller units of sound (Anthony & Francis, 2005;Ziegler & Goswami, 2005). Indeed, it has been evidenced that the development of phonological awareness progresses from the syllable level and the onset-rime level to the phoneme level (Shankweiler & Liberman, 1989;Vloedgraven & Verhoeven, 2009). Awareness at the phoneme level, i.e., phonemic awareness, concerns the awareness of phonemes, the speech sounds that are used to build spoken words and to distinguish meanings (Goswami, 2000;Nagy & Scott, 2000). It is rather difficult for children because of the fact that phonemes are acoustically evanescent and thus hard to detect (Goswami, 2001). Letters may facilitate the perception of phonemes and are considered to be essential for the transfer to word reading and word spelling (NRP, 2000;Scarborough, 1998;Share, 2004). Therefore, it is important to consider the effects of training phonological awareness with and without the inclusion of letters.
Previous research has shown that the awareness of sounds (syllables, onsets, rimes) in a word is associated with children's literacy development (Melby-Lervåg, Lyster, & Hulme, 2012;Nagy & Scott, 2000). Numerous studies have indeed evidenced a relation between measures of phonological awareness administered in preschool or kindergarten and tests of word recognition among the same children in first or second grade (Blachman, 2000;Swanson, Trainin, Necoechea, & Hammill, 2003). However, the relation can also be reversed in that literacy experiences facilitate phonological awareness (Lerner & Lonigan, 2015;Perfetti, Beck, Bell, & Hughes, 1987). Support has also been provided that lack of phonological skills can cause difficulties with the acquisition of reading and writing. Poor readers have been found to be less precise in phoneme discrimination and phoneme segmentation (Elbro & Scarborough, 2003a,b;Høien & Lundberg, 2000;Snowling, 2000). To conclude, research has abundantly shown that phonological awareness is related to later literacy outcomes and that later literacy problems are associated with persistent problems in phonological awareness. The predictive power of phonological awareness for later literacy has prompted curriculum specialists to develop phonological awareness interventions in preschool and kindergarten.

Intervention studies
Early interventions for phonological awareness have focused on explicit instruction in the perception of sound constituents in words and complementary drill and practice. In some cases, it was taught without making any connections to literacy, in other cases phonological awareness was taught in connection with letters. Bus and van IJzendoorn (1999) conducted a meta-analysis on experimental training studies on phonological awareness in perspective of reading gains with all studies following a randomized or matched design. The overall effect sizes for phonological awareness and reading in these experimental training studies were 0.73 and 0.70, respectively. In long-term studies on the influence of phonological awareness training on reading the combined effect size was much smaller. There was also research evidence that phonemic awareness can be seen as a critical component in understanding the alphabetic principle. Programs that combined phonological and letter training obtained larger effects than purely phonological programs.
Another meta-analysis evaluating the effects of phonemic awareness instruction on learning to read and spell was conducted by the National Reading Panel (Ehri et al., 2001). There were 52 studies published in peer-reviewed journals, and these contributed 96 cases comparing the outcomes of treatment and control groups. Analysis of effect sizes revealed that the impact of instruction on helping children acquire phonological awareness was large and statistically significant (d = 0.86) and also exerted a moderate, statistically significant impact on later reading (d = 0.53) and spelling (d = 0.59). It helped various types of children: normally developing readers as well as at-risk and disabled readers; preschoolers, kindergartners, and first graders; low socioeconomic status children as well as mid-high SES. Importantly, phonological awareness instruction was more effective when it was taught in combination with letters and written words than with speech sounds and spoken words only.
More recently, the National Early Literacy Panel (NELP) examined the effectiveness of preschool and kindergarten interventions aimed at early literacy and conventional literacy in three separate meta-analyses. To begin with, Lonigan, Schatschneider, and Westberg (2008) evaluated the effectiveness of 78 code-based instructional practices to enhance early literacy skills. They found moderate to large effects for code-focused interventions designed to teach the alphabetic principle, usually in combination with phonological awareness. In addition, Lonigan, Shanahan, and Cunningham (2008) investigated the effects of 19 shared book reading studies focused on early literacy outcomes. They found moderate effects on print knowledge and oral language. Finally, Molfese and Westberg (2008) did a meta-analysis on 33 early literacy intervention studies. Overall, the interventions yielded substantial effects on reading readiness measures and small effects on spelling. Twelve studies also showed effects on oral language and nine studies on reading. The overall conclusion from NELP was that early literacy interventions are especially good in enhancing children's reading readiness and thus in preparing children for school entry and learning to read.
Aside from strict phonological awareness and/or letter training, storybook reading can be seen as an effective means to enhance young children's early literacy development. In an eye-tracking study, Evans, Williamson, and Pursoo (2008) aimed to show how pointing to words during storybook reading by the teacher may facilitate early literacy. They evidenced that teachers' pointing to the words increased print-looking time and print target recognition in four-year-olds. Bus, van IJzendoorn, and Pellegrini (1995) conducted a review of 29 studies on the effectiveness of shared reading in preschool or kindergarten. They concluded that individual differences in preschool exposure to shared reading explained approximately 8 percent of the variance in later formal reading outcomes studied. In a more recent meta-analysis, Mol et al. (2009) reviewed 31 experiments (n = 2,049 children) in which educators were trained to encourage children to be actively involved before, during, and after joint book reading. Although teaching print-related skills was not part of interactive reading programs, a substantial portion of the variance in kindergarten children's alphabetic knowledge could be attributed to the intervention.

Computer-supported early literacy interventions
Although there is a wide-spread belief that computers can support the early literacy development of children, the number of metaanalyses on the effects of computer-supported early literacy interventions is small. The meta-analysis by Blok, Oostdam, Otter, and Overmaat (2002) was the first with a focus on computer-assisted instruction in support of literacy throughout primary education. In the analysis, an overall positive effect over 42 different experimental comparisons was evidenced. Effect sizes were higher when the experimental group displayed an advantage at pretest and when the language of instruction was English. More recently, the effects of interactive storybooks and other contextualized methods were examined in a systematic way. In a meta-analysis on the effects of interactive storybook reading studies, Mol et al. (2009) evidenced a moderate effect size for oral language skills, along with an effect on kindergarten children's alphabetic knowledge but the role of technology-enhanced storybook reading was not emphasized in this analysis. It was concluded that the interaction with electronic storybooks may not only help children to learn that print carries meaning, but that it may also foster phonological awareness and literacy skills (see De Jong & Bus, 2002;Korat, 2009).
It is important to note that the methodological rigor of early literacy intervention studies in the literature has been challenged. In intervention studies, methodological flaws can be observed related to nonrandom assignment of participants to conditions, failure to control for Hawthorne effects by providing alternate interventions to control groups, insufficient or nonexistent assurance of fidelity of treatment, poor measurement sensitivity and inadequately described samples (see Troia, 1999). In a meta-analysis on the effectiveness of information and communication technology on the learning of written English, starting at age five (Andrews, Freeman, McGuinn, Robinson, & Zhu, 2007), only nine studies with rigorous design were selected showing mixed results. As regards the role of technology in interventions for young children, it was concluded that there is so much variation in the design of the interventions that no conclusions about outcomes could be made. Examining the effectiveness of technology use in classrooms from previous metaanalyses, Archer et al. (2014) concluded that only a small number of studies met most of the required criteria for adequate study design, that program focus and program outcome measures have not been clearly specified, and that most studies largely neglected fidelity of implementation.
Taken together, there is not yet clear research evidence on the effects of computer-based early literacy interventions in preschool and kindergarten. In the studies conducted so far, the scope of technology-enhanced interventions and their hypothesized effects on different outcome measures across different languages and writing systems have not always been clear. As a consequence, it is by no means clear to what extent interventions focusing on phonological awareness, letter knowledge and storybook reading lead to differential learning outcomes on phonological awareness and reading-related measures. It is also not clear what the impact is of study characteristics, related to the intervention or the child, or to the orthographic depth of the language being involved. Therefore, the present meta-analysis aimed to investigate the effects of computer-based early literacy training taking into account the focus of treatment and outcome measures and the role of study characteristics.

The present meta-analysis
The focus of the present meta-analysis was on the effects of computer-supported early literacy interventions in preschool and kindergarten on children's phonological awareness and reading-related skills in alphabetic languages, published during and after 1995. The year of 1995 was considered as a cut point since it was by that time that computers with higher quality software became available to a large proportion of schools (Zawacki-Richter & Latchem, 2018). In the present meta-analysis, only studies were selected that reported empirical research with comparisons between experimental and control conditions (both randomized trials and quasiexperimental studies) in which each condition had at least nine participants and that included a pretest and a posttest on the same outcome measure for both conditions. Three types of treatments (strict phonological awareness training, combined phonological awareness and letter training, and storybook reading using e-books) and two types of outcome measures, i.e., phonological awareness (syllabic awareness, word blending, rhyme, phoneme awareness) and reading-related skills (concept about print, letter knowledge, decoding, spelling) were distinguished. Although in some early literacy models language is seen as constituent part, in our analysis specific treatments on language abilities (comprehension, vocabulary) were left out for reasons of parsimony.
In addition to focus of treatment, a number of relevant study characteristics were taken into account. To begin with, we were interested in the effect of the treatment being part of an integrated learning system. In an ILS, the educational technology materials are fully integrated into the reading curriculum which makes the use of educational technology consistent with the reading curriculum of the kindergarten classroom (Cassady & Smith, 2003), and adaptive to the educational needs of the children (Becker & Hativa, 1994). Next to the classification of a study as ILS, the effect of treatment duration in total period of time and intensity of treatment was examined. Two participant characteristics were also considered: age and at-risk status. Next, the role of orthographic transparency was highlighted by making a distinction between language with high and low orthographic transparency. Previous research has shown that the addition of letters to phonological awareness training is more facilitative in transparent orthographies (Landerl et al., 2019). Furthermore, the effect of research design characteristics was investigated as regards the randomization of samples, the presence of an active control group, and the use of follow-up (retention) measures. Finally, the role of criterion variables was further explored by taking into account whether the measure required letter knowledge, whether speed was involved, whether it involved transfer, and whether there was a match between the criterion and target variable.
In the present study, an attempt was made to find an answer to the following questions: 1. How large were the effects of computer-assisted early literacy treatments in preschool and kindergarten across different types of treatments (strict phonological awareness training, combined phonological awareness and letter training, and storybook reading), and different types of outcome measures (phonological-awareness and reading-related skills)? 2. To what extent do study characteristics (type of treatment, integrated learning system or not, treatment duration, participants, orthographic transparency, research design, criterion variables) explain the differences in effect sizes?
To answer these questions two sets of multilevel analyses were performed. The first set included an overall analysis using all available data (all treatments and all outcomes) plus analyses for each of the three types of treatments, across all outcomes; the second set included separate analyses for the two types of outcomes measures, across all treatment types.

Selection of studies
Studies published in the years between 1995 and 2017 were selected in three steps. In the first step, a keyword search was conducted in the Psychinfo and ERIC databases, using the following keyword combination: (multimedia OR technology OR computer) AND (reading OR literacy OR spelling OR dyslexia OR "specific reading disorder") AND (instruction OR intervention OR treatment). Only empirical studies published in the English language in a peer-reviewed journal were eligible for the meta-analysis.
To be included, studies should focus on the effectiveness of computer-supported early literacy interventions (phonological awareness, letter knowledge, storybook reading) at the preschool and kindergarten level. Effectiveness of interventions was assessed in terms of phonological awareness and reading-related measures (e.g. letter knowledge and word decoding, not reading comprehension). The use of modern media technology in classrooms started to rise in the early nineties. Therefore, we started our meta-analysis with studies published in 1995. The restriction to articles in peer-reviewed journals was based on the idea that such articles satisfy minimum standards of methodological quality.
In the second step, some of the most important education-related peer-reviewed journals in the field of literacy research were browsed in order to ensure that all major publications in the field would be included. The following ten journals were In the third step, the references of identified articles and references in previously published meta-analyses (Blok et al., 2002;Bus & van IJzendoorn, 1999;National Reading Panel, 2000;Ehri et al., 2001;Mol et al., 2009;Troia, 1999) were used to check our selection and to further identify relevant studies.
In addition to this stepwise selection procedure, the following criteria were used to include studies in our data base: Table 1 Characteristics of the studies organized by type of treatment. L. Verhoeven, et al. Educational Research Review 30 (2020) 100325 • The intervention should be a program developed for the purpose of enhancing children's early literacy and should not solely consist of the use of computers in general.
• The study must use both a pre-test and a control group; the control group received no-treatment or is an active control group receiving either a traditional early literacy treatment without computers or a non-literacy computer treatment. At pre-test the control group should be approximately equivalent to the experimental group (i.e., no major differences in age or socio-economic background).
• Case studies and studies with very small samples (less than 9 students per condition) were excluded. • Studies written in English but may deal with other languages.
• Studies specifically dealing with bilingual readers or with second-language learners were excluded.
• Studies specifically dealing with students with specific language impairments or with other disabilities (e.g., autism, ADHD) were excluded; but studies on students at-risk for dyslexia were included.
• Available electronic or paper copy of the article.
• Availability of means and standard deviations or enough other information to calculate effect sizes.
These criteria were checked on the basis of the abstracts or, if necessary, on the basis of reading the whole article. In total 59 studies from 57 articles out of 32 scientific journals were identified for inclusion in the current meta-analysis, resulting in 339 effect sizes. In two articles, two studies were identified for inclusion and treated as separate studies. Table 1 lists the included studies with their main characteristics.
Multiple effect sizes per study resulted from the use of multiple criterion variables and multiple contrasts between experimental conditions within a study. Contrasts used were contrasts between a treatment condition and a control condition. A study could have more than one treatment condition or more than one control condition. The 59 studies involved a total of 6786 students. The mean age of the students varied per study from 49.0 to 75.8 months, with an average of 65.03 months. The students were at preschool or kindergarten level; in most studies (n = 36) the children were on average between five and six years old, and in 12 studies between four and five; in six studies grade-1 students were also present. In 16 studies students at risk for reading problems were dealt with, two studies involved both regular students and students at risk. In 41 studies only regular students were involved. Student characteristics are shown for each study in Table 1. The included articles mainly concerned students with English as their mother tongue (25 of the 59 articles). The other languages were Dutch (n = 17), Hebrew (n = 13), Turkish (n = 2), Arabic (n = 1), and Finnish (n = 1). The language is shown for each study in Table 1.
The interventions studied in the articles could be classified into three categories: treatments focusing on phonology (11 studies with 104 effect sizes), on letter knowledge and phonology (28 studies with 103 effect sizes), and storybooks (20 studies with 132 effect sizes). Table 1 has been organized by these three types of treatment and holds short descriptions of the treatments.

Coding of studies
All effects were individually coded on program, outcome variable and design factors. The coded program variables included type of intervention (phonological awareness, letter knowledge and phonological awareness, storybook reading), use of integrated learning system (ILS) or not, treatment duration (program length in number of weeks, intensity of treatment in minutes per week). The coded outcome variables were characterized by dummy variables (letter knowledge required or not, speeded or non-speeded test, test of trained or untrained items (transfer), does the criterion variable match the skills taught in the treatment or not). Finally, the coded study design factors involved participant characteristics (average age of participants in months, students at risk for reading problems or not), orthographic transparency (transparent or not), and research design characteristics (random allocation to conditions or not, active control condition or not, follow-up measurement or immediate posttest).
ILS was considered to be an important program characteristic. A dummy variable was constructed to indicate whether or not a program was considered to be an ILS. Non-ILS interventions included interventions specifically directed at phonological skills and ebook reading. In an ILS, the educational technology materials are fully integrated into the reading curriculum with instructional activities for each student based on recorded achievements of the student on previously assigned activities. In only seven of the 59 studies the treatment was identified as an ILS, for instance the Waterford Early Reading Program (Cassady & Smith, 2003;Hecht & Close, 2002). The ABRA web-based literacy system (Savage et al., 2013) was also treated as an ILS. With ABRA, though, the teacher and not the computer program selects activities appropriate for the student.
The treatments lasted on average for 10.3 weeks, but there was a broad variation in treatment duration from 1 to 35 weeks. The intensity of treatment was on average 50.9 min per week, varying from 3.5 to 160 min and with one outlying value of 300 min. Treatment duration is shown for each study in Table 1 as the average hours of treatment. Quality and quantity of treatment has been described in most studies only as intended not as actually delivered. We found only limited attention for fidelity of implementation (exceptions included Davidson, Fields, & Yang, 2009;Anthony, 2016).
The languages were categorized according to orthographic depth. One dummy variable was constructed: relatively transparent orthographies (33 studies, mainly Dutch and Hebrew) versus relatively opaque orthographies (26 studies, English).
Research design characteristics varied not only between studies but also between comparisons within studies. Random assignment to conditions was present in 47 studies (269 contrasts). So, in almost 80% of the cases some form of randomization was used. We classified as random both random assignment of individual students and random assignment of whole classrooms to experimental conditions. In about 20% of the cases with random assignment the conditions were assigned at the classroom level. In only three studies matching was used, which was counted as non-random. In about 50% of the contrasts the control condition received no special treatment; in these cases, the regular classroom teaching was used as control. In the 169 comparisons with an active control condition, the alternative treatment for the controls was in 59% of the cases a reading treatment. In 52% of the cases the active control involved the use of a computer. For type of control condition, we used only one dummy variable in the models: active control versus regular classroom teaching used as control. Table 1 summarizes the contrasts used in each study. As final research design variable the post-treatment measures were divided into immediate posttests and follow-up tests. A dummy variable was created to indicate whether a post-test was a follow-up test or an immediate posttest. Only 29 follow-up tests were identified, which amounts to only 8.6% of the contrasts. Moreover, follow-up measurements occurred in only three studies (Chera & Wood, 2003;Kartal, Babür, & Erçetin, 2016;Van de Sande, Segers, & Verhoeven, 2016). The follow-ups that occurred were mostly timed not far from the first posttest. In 14 of the 29 cases the follow-up was five months after the first posttest. In the other cases the follow-up was after one month or after only two weeks. It seems that not much attention was given to long-term effects of computer-assisted instruction.
The criterion variables (see brief descriptions in Table 1) were classified into nine types of outcomes (see Table 3). These variables fell into two broad categories: phonological-awareness measures and reading-related measures. The first set of variables included syllabic or sub-syllabic awareness, word blending, rhyme, phoneme awareness and global phonological awareness. This last category included tests concerning various phonological skills. The phonological variables applied to 184 of the 339 contrasts. The readingrelated variables included concept about print, tests of early literacy and letter knowledge, word decoding and spelling accuracy. This applied to 155 contrasts.
The criterion variables were further characterized using some dummy variables. A dummy variable indicated whether or not letter knowledge was required for the test (yes: 133, no: 206). This is almost equal to the distinction between phonological measures and reading-related measures. The only difference concerns the tests of concept of print; these do not require letter knowledge but were considered as reading-related tests. The next dummy variable indicated whether the test was speeded or not. In very few cases (n = 10) the test was speeded. Furthermore, a distinction was made between tests with trained or untrained items. The latter was interpreted as transfer and this occurred in 71.7% of the contrasts. The last dummy variable indicated whether or not the criterion variable matched the target level of the treatment. In 83.2% of the contrasts a match was observed. For instance, when the treatment was targeted at phonemic awareness and the criterion variable measured phonemic awareness then the corresponding contrast was coded as a match. But when in such case the criterion variable was word reading then there was no match. Some studies reported total scores of tests as well as subtest scores; in such cases only the subtest scores were included in the analyses and not their total score.
All studies were coded by one of the authors. Inter-coder reliability was initially assessed for two coders on a sample of 10 studies with 65 contrasts between a treatment condition and a control condition on a single criterion variable. For all categorical variables used in the analyses the proportion agreement between coders was between 80 and 100%. For the treatment duration variables, the correlations between coders were 0.86 and 0.91. Disagreements between coders were discussed and occasionally the coding scheme was adapted.

Calculation of effect sizes
In order to make the results from different studies comparable a standardized measure of effect size was calculated using the following formula: (1) Effect size g = (Gain Exp -Gain Control)/pooled standard deviation of pre-test means.
Because the effect size g has a small upward bias, it was transformed in the unbiased effect size d using formula 2 (Hedges, 1982).
(2) d = (1-3/4N-9))g where (N = total sample size of the study) The distribution of the effect sizes has been summarized per study in Table 1. To be included studies needed to have a pretest as well as a posttest. Nevertheless, we included in the data some contrasts with a missing pretest. This happened when the pretest was thought too difficult given the age of the children. In such cases the pretest was fixed at a value of zero in computing the effect size. In such studies other pretest measures were available to check the equivalence of the experimental conditions.

Multilevel analysis
A multilevel approach was used to analyse the data, distinguishing variance between studies and variance within studies (Borenstein, Hedges, Higgins, & Rothstein, 2009;Hox, 2010). This approach allows for the inclusion of multiple effect sizes from the same study. In the present dataset, we had variation between effect sizes within a single study because of the use of multiple outcome measures, the use of multiple experimental groups, and the use of multiple control groups. In the multilevel approach the data form a hierarchical structure with the studies at the top level (level 3) and the effect sizes within studies at level 2. In addition, at level 1 the standard error of effect sizes was included as a predictor in the random part with a coefficient fixed equal to 1 (Hox, 2010;Marsh, Bornmann, Metz, Daniel, & O'Mara, 2009). The sampling variance of d was assumed to be known from statistical theory and was calculated using the following formula (see Borenstein et al., 2009): (3) Sampling Variance = (n exp + n control )/(n exp n control )+ d 2 /2(n exp + n control ) The standard error of the effect size was calculated by taking the square root of the sampling variance. The variance at level 1 was constrained; there was only variance to be estimated at levels 2 and 3. In this way, it became possible to estimate the variance of the effect sizes at two levels, between and within studies, and the effect of explanatory variables could be estimated separately for each level to explain differences between studies and differences between comparisons within studies. The explanatory variables included variables defined at the study level (characteristics of the studies) and variables that characterized the contrasts within studies.
MLwiN 2.36 (Rasbash, Charlton, Browne, Healy, & Cameron, 2016) was used applying Restricted Maximum Likelihood (RML) to estimate the variances in the model. RML rather than ML was used because of the fact that the sample sizes were relatively small (Hox, 2010, p. 215).

Descriptives of effect sizes
First, the descriptive statistics, means and standard deviations of the effect sizes are presented, as well as schematic graphs of the distributions. Second, the results of the multilevel analyses are shown to answer the research questions. Table 2 shows the descriptive statistics of the effect sizes for code-related skills per type of treatment. The average effect size was 0.29. Treatments focused on phonological awareness had on average a somewhat higher effect size and storybook treatment a somewhat lower effect size, but mean differences were small and not significant. Standard deviations were all relatively high. In Fig. 1, a boxplot is given of the distribution of effect sizes for the three types of treatment. It is shown that there were some outliers in effect sizes for each type of treatment, especially for storybook treatments. Six extreme outliers, identified by graphical inspection of the effect sizes, have been winsorized so that the largest effect size became equal to 2.10. For phonology treatments the outliers were all at the positive side. The large negative effect sizes were mainly associated with comparisons of experimental treatments with active controls (specific reading treatments). After disregarding such comparisons, the average effect size increased from 0.29 to 0.34 (phonology 0.42, letter knowledge and phonology 0.36, storybook reading 0.28) for in total 240 comparisons.
In Table 3, the effect sizes for code-related skills per type of outcome are presented and Fig. 2 gives a boxplot of the distributions of effect sizes for the types of outcome. The mean effect sizes were moderately positive for all types of outcome measures, but for word decoding a rather low average effect size appeared.
In Table 4, the average effect sizes are presented as a function of type of treatment on the one hand and type of outcome on the other hand. Note that the n's per cell may be very low here. For instance, the high value of 0.88 for global phonological awareness was based on only one comparison. For phonological awareness treatments also concept about print, word decoding and spelling accuracy were only measured in a very few cases. Phonological awareness treatments resulted on average in moderately positive effect sizes on all types of criterion variables. For treatments focusing on both letters and phonology the effect sizes were also moderately positive,   Verhoeven, et al. Educational Research Review 30 (2020) 100325 except for rhyme and concept about print. The highest averages appeared for global phonological awareness, spelling accuracy, and early literacy and letter knowledge. Finally, for storybook reading the effects were generally moderate except for word decoding which showed only a low average. The clearest positive effect of storybook reading appeared on concept about print. In general, moderately positive effect sizes have been observed with broad variability, ranging from negative to highly positive values. On average, differences between the three types of treatment appeared rather small and not significant. The same was true for the different types of outcome variables, except for word decoding because of its low average effect size. In studies about phonology awareness treatments, the outcome variables mostly were phonology-based but also reading-related variables were used (in 17% of the cases). The average difference in effect size between these two types of outcomes appeared small (0.33 for phonology and 0.30 for reading-related). In studies on treatments with a combined focus on phonology and letter knowledge more reading-related outcomes were assessed (in 65% of the cases). Average effect sizes for these two types of outcome were both equal to 0.29. Finally, in studies on electronic storybooks the two types of outcome appeared about equally frequent, with an average effect size for phonology outcomes equal to 0.27 and for reading-related outcomes equal to 0.23.

Association between effect size differences and study characteristics
Multilevel analyses were performed to answer the two research questions for different types of treatments and different types of outcome measures. The associations of effect sizes with various subsets of predictors were tested. Using all predictors simultaneously would have resulted in a too large number of predictors given the number of studies (m = 59). The analyses were done for the whole set of effect sizes (n = 339) as well as separately for the three types of treatments and the two types of outcome variables.
First, the overall analysis and the analyses by treatment type are presented. Tables 5a and 5b shows the results of the multilevel analyses for seven subsets of predictors: focus of criterion variable (the two types of criterion variables), ILS, treatment duration, participant characteristics, language (transparent or not), characteristics of the research design, and of the criterion variables. Table 5a holds the fixed-effects parameter estimates (the regression coefficients) and their standard errors (within parentheses). Statistically significant coefficients (p < .05) are in boldface type. Table 5b holds the random part of the models, the between-and within-studies variance components. In the column Total the results are given for the analyses across the total set of effect sizes.
The tables start with the results for the empty model, the model without any predictors. The empty model provides answers to the first research question, overall in the column Total and by treatment type in the other columns. In this model the average effect size across studies (intercept) is estimated. Overall, across all treatments and all comparisons, the average effect size was a small positive number, 0.28 with 95% CI (0.21, 0.35). In all cases the average effect size was statistically significant (compare the estimate with its L. Verhoeven, et al. Educational Research Review 30 (2020) 100325 standard error). The largest average effect size was obtained for phonology awareness treatments, the smallest average was found for the storybook treatments, but the average differences between the three types of treatment were not statistically significant, see Model 1 in Table 6a. The variance of effect sizes was clearly larger within studies than between studies (see the empty model in Table 5b). The overall intraclass correlation was 0.20, thus for the whole set of effect sizes 20% of the variance was between studies. Table 5b shows that the distribution of the variances for phonology treatments differs from that of the other two types of treatments.   Note. The n's vary per cell between 0 and 44.
For phonology treatments the intraclass correlation was high, namely 0.41. The within-studies effect sizes seem clearly more homogeneous for phonology than for the other two types of treatments. This may be due to less diversity in outcome measures used. In Table 5b (and also 6b) results of significance tests are reported for the between-studies variances. Between-studies variances significantly (at the 5%-level) larger than 0 have been printed in boldface type. This is the test for random intercepts performed by comparing the deviance of the random intercept model with the deviance of the fixed intercept model (Snijders & Bosker, 2012, pp. 97-98). The empty models provide answers to the first research question on the effects of computer-assisted early literacy intervention across types of treatments and outcome measures. All other models in the tables deal with aspects of the second research question to which we turn now, examining treatments effects as a function of study characteristics.
Model 1 tests the average difference between the two main categories of outcome variables, phonological awareness measures versus reading-related measures as distinguished in Table 3. One dummy variable was used to make this distinction. Table 5a shows only minimal and non-significant differences overall as well as for the three types of treatments separately. The estimated intercepts (Table 5a) and variance components (Table 5b) of Model 1 are highly similar to the estimates in the empty model. So the average effects of the treatments appeared nearly the same, regardless the focus of the outcome.
Model 2 concerns the nature of the treatment: an Integrated Learning System (ILS) or not. The ILS occurred only when the focus of the treatment involved both letter knowledge and phonological awareness. The ILS treatments achieved a somewhat higher effect size but the difference was not statistically significant (see Table 5a).
Model 3 is the model for treatment duration. Two variables were identified to represent treatment duration: program length and intensity of treatment. Program length was expressed as the number of weeks a program lasted. And intensity of treatment was the number of minutes the treatment took per week. Both variables were centered at their grand mean. None of them appeared in a statistically significant way related with the effect sizes (see Table 5a). Nevertheless, the between-studies variance reduced by 22% in the overall analysis (from 0.027 to 0.021, see Table 5b), but was still statistically significant.
In Model 4 participant characteristics were controlled for. The average age of the participants in a study, centered around the mean age, appeared not associated with the average effect size obtained in the study. In 18 of the 59 studies were participants identified as being at risk for reading problems. This difference, however, was not significantly associated with the average effect size of the study. Also, the estimates of average effect size (intercepts) only marginally changed after controlling for these participant Table 5a Relationships of between-and within-study characteristics with effect sizes, separately for focus of treatment. characteristics (see Table 5a). The estimated variance components for the treatment duration model stayed about the same as those in the empty model (see Table 5b), except for a clear reduction in the case of phonology treatments. There the between-studies variance reduced to non-significance. Model 5 concerns the role of orthographic depth of the language involved, which in our data was highly correlated with the difference between English and other languages (mainly Dutch and Hebrew). Only one language predictor in the form of a dummy variable was used (transparent or not). Table 5a shows that this predictor was hardly related with the effect sizes, not overall and not for any of the types of treatment. Consequently, the estimated variance components for the language model hardly differed from the variance components in the empty model (see Table 5b).
The previous models, except Model 1, concerned predictors at the study level. These predictors, of course, could not explain variation between effect sizes within studies. In the next two models, predictors are involved at the level of effects within studies. Model 6 in Table 5a concerns characteristics of the research design. The average effect size of a study appeared on average lower in studies with random assignment of participants to conditions (b = −0.184, p = .025). This effect was not statistically significant in the separate analyses for types of treatment mainly due to the smaller numbers of studies per type of treatment. A statistically significant effect was also observed for the type of control condition in comparisons within studies (b = −0.184, p = .024). When the experimental condition was compared with an active control condition the effect size tended to be lower than for comparisons with education as usual. But for the subsets of electronic storybooks and phonology treatments this effect was not statistically significant. In a few studies next to immediate posttests also follow-up tests were used. Effect sizes at follow-up appeared not significantly lower than at the immediate posttest. The average effect sizes appeared almost twice as large in cases of nonrandom assignment and comparisons with non-active control conditions (compare in Table 5a the intercepts of Model 6 with those of the empty model). The treatment model hardly reduced the within-studies variance (see Table 5b). The between-studies variance reduced by 30% in the overall analysis. This reduction was 53% for letters treatments and 33% for phonology treatments, but no reduction for storybooks.
Finally, Model 7 in Table 5a shows the results when characteristics of the criterion variables are used as (within-study) predictors. Four dummy variables were created to characterize the criterion variables: whether or not letter knowledge was required, whether or not the test had a speed character, whether or not the test materials differed from the training materials (transfer), and whether or not the dependent variable matched the goals of the treatment. Only the speediness of the test made a significant difference (b = −0.378, p = .012). On speeded tests lower effect sizes were obtained than on non-speeded tests. This effect, however, could only be found for treatments focusing on phonology. In studies on storybooks speeded tests did not occur. And for combined letters and phonology treatments this effect was not statistically significant. As shown in Table 5b the variance components in the overall analysis were hardly affected by the predictors in the criterion model. In addition, a second set of analyses was performed, distinguishing types of outcome across all treatments. Two types of outcomes were distinguished: phonological-awareness variables versus reading-related variables. The results of these analyses are presented in Tables 6a and 6b. Note that the column Total in these tables amounts to the same as the column Total in Tables 5a and 5b Only Model 1 differs between the two sets of analyses. Model 1 in Table 6a tests the average differences between the three types of treatment while in Table 5a it concerns the two types of outcome variables.
The empty model deals with the first research question. The empty model shows that the average effect sizes are practically the same for both types of outcomes (Model 0 in Table 6a, for the corresponding significance test see Model 1 in Table 5a). The variance components, however, differed (see Model 0 in Table 6b). For phonological-awareness outcomes a much higher intraclass correlation was obtained than for reading-related outcomes, namely 0.53 versus 0.21. This result parallels the estimates for phonology treatments versus the other two treatments. The phonological-awareness measures were more homogeneous within studies than the reading-related measures.
Next, we present the results for the second research question, separately for types of criterion variables. Model 1 in Tables 6a and 6b deals with the focus of the treatment, namely the three types of treatment distinguished before (see Tables 5a and 5b). As already noted, no statistically significant differences between the three types of treatment were found, and this appeared true for either type of criterion variable.
Model 2 again concerns the nature of the treatment: an ILS or not. ILS treatments showed higher average effect sizes, especially on phonological awareness where the difference was statistically significant (z = 2.05, p = .040). For phonological awareness the average effect size of ILS was more than twice as large, namely 0.254 + 0.270 = 0.524, than for other treatments (see Model 2 in Table 6a). The between-studies variance in Model 2 was reduced for phonological awareness by 9.8% compared with the empty model (see Table 6b).
Treatment duration again showed no significant relationship with the average effect sizes, for both types of criterion variables (see Model 3 in Table 6a). The between-studies variance, however, reduced by 9.8% for phonological awareness and by 25% for readingrelated variables compared with the empty model (see Model 3 in Table 6b).
Participant characteristics had no impact on average effect sizes (Model 4 in Table 6a) or on their variances (Table 6b). Effect sizes appeared on average largely the same whether the children were at risk for reading problems or not and were not associated with the children's age.
The results for the language model (Model 5) were the same as in the previous analyses. The average effect sizes seemed not affected by the orthographic depth of the language and the results were the same for the two types of criterion variables.
Model 6 concerns characteristics of the research design. Like in the first set of analyses the average effect size appeared on average lower in studies with random assignment of participants to conditions. This was the same for both types of criterion variables but not statistically significant for either of them, because of the increased estimates of standard errors. And also, when the experimental condition was compared with an active control condition the effect size tended to be lower than for comparisons with education as usual. This effect was statistically significant overall and in case of phonological awareness (z = 2.99, p = .003) but not in case of reading-related measures. Effect sizes at follow-up were not significantly lower than at the immediate posttest, the same for both types of criterion variables. The average effect sizes appeared considerably larger in cases of nonrandom assignment and in cases of comparisons with non-active control conditions, around 0.46 for reading-related measures and 0.51 for phonological-awareness measures.
Finally, Model 7 in Table 6a shows the results for characteristics of the criterion variables. The effect of speediness of the test appeared statistically significant only for the reading-related measures.

Discussion
The focus of the present meta-analysis was on the effects of computer-supported early literacy interventions in preschool and kindergarten in alphabetic languages, published during or after 1995. The first research question was about the size of the effect of computer-assisted early literacy intervention on phonological awareness and reading-related skills. This question was answered by multilevel modeling of 339 effect sizes from 59 studies. In this model the average overall effect size was estimated to be 0.28, a small positive value that was statistically significant (z = 8.21, p < .001). A similar (significant) average effect size was also evidenced separately for strict phonological training, combined phonological awareness and letter training and electronic storybook intervention. The average effects of these interventions on phonological-awareness and reading-related variables were about the same. The latter outcome corresponds to the finding in the literature that the development of phonological awareness and early reading is bidirectionally related (Snowling, 2000). With respect to characteristics of the criterion variables only one statistically significant effect was found. Using speeded tests resulted on average in a lower effect size than using non-speeded tests. This outcome can be explained from the fact that early literacy interventions by and large focus on accuracy not speed (see Ehri et al., 2001).
The average effect size in the present meta-analysis is small when compared to findings in earlier meta-analyses on phonological awareness interventions established by Bus and van IJzendoorn (1999), Ehri et al. (2001), and Troia (1999). However, it is important to note that these interventions were mostly not computer-based but carried out by teachers. Interestingly, in our technologyenhanced treatment studies we found similar (moderate) outcomes for the three types of treatment, whereas in the non-technology studies higher main effects were found for specific phonological awareness training than for storybook reading, and a beneficial effect of combined phonological awareness and letter training was also evidenced. It can tentatively be concluded that teachers outperform electronic devices in facilitating young children's phonological awareness and insight in the alphabetic principle. This result is fully commensurate with previous findings on a broad variety of children's learning processes (cf. Zawacki-Richter & Latchem, 2018). It is also interesting to note that the present average effect size is about the same as in the meta-analysis of computer-assisted interventions by Blok et al. (2002). In contrast to Ehri's meta-analysis, we found no advantage for the combined phonological awareness and letter training as compared with strict phonological awareness training. Neither did we find a difference between these two conditions and electronic storybook reading. This can tentatively be explained from the fact that since the past two decades an early literacy curriculum has become standard throughout preschool and kindergarten outweighing possible effects of strict phonological awareness and/or letter training (Neuman & Dickinson, 2011).
Next to the average effect size, we also looked at the variability of the effect sizes. Variability appeared very high; the effect sizes ranged from −0.92 to 2.10. And this was true for most subcategories of effect sizes as well. The lowest amount of variability was found for effects on spelling accuracy and on global phonological awareness. In the multilevel models, the variances were decomposed into between-studies and within-studies variance. Overall, twenty percent of the variance was between studies which shows that there is high variability not only between studies but also within individual studies. Partly, this can be explained by the fact that different criterion variables and varying experimental conditions were introduced within studies which may have led to differential outcomes.
The second research question focused on possible moderators of the treatment effects. First, the impact of treatment characteristics and study design factors on the intervention effects was examined by estimating multilevel models for separate treatment characteristics and design factors in order to have models with not too many predictors. Focus of treatment (phonology, letters, storybooks) showed no evidence for differential intervention effects. Using an integrated learning system (ILS) appeared beneficial, especially evidenced for phonological awareness of the children. This can be seen as an interesting finding. It shows that technologyenhanced early literacy interventions obtain better outcomes as they are better integrated in the school curriculum. This insight has previously been articulated by Grabe and Grabe (2003) examining the general effectiveness of technology use in actual classrooms. However, our present finding should be evaluated with great caution since ILS was only involved in 12 percent of the studies. More research is needed to uncover the effects of the integration of computer-support in the standard early literacy curriculum.
Furthermore, no clear evidence was found for effects of treatment duration or participant characteristics (age of the students and being at risk for reading problems). However, the size of the intervention effects was found to be significantly associated with characteristics of the research design. In studies with random assignment to conditions on average a lower effect size was found. The same was true for comparing the intervention condition with an active control condition (including a special reading treatment) rather than education as usual. Importantly, it should be acknowledged that in the vast majority of cases technology-enhanced intervention studies lack data on treatment fidelity and implementation which was also concluded by Archer et al. (2014). It should also be mentioned that no effects of follow-up measurements were evidenced on the treatments in our analysis. This can primarily be seen as a shortcoming of the studies being conducted since retention measures were involved in less than eight percent of the effects being reported.
To examine the impact of the orthographic depth of the language on the intervention effects, both the mean effect sizes and their subdivisions across categories of programs and outcome measures were related to the distinction between studies focusing on English, on the one hand, and studies focusing on more transparent orthographies, on the other hand. The results showed no evidence for any effect of the transparency of the orthography on the average effect size, nor for the subdivisions of effect sizes. This result shows that notwithstanding the orthographic depth of the language under consideration, substantial effects of technology-enhanced early literacy interventions can be attained. It should be noted that the primary focus of early literacy programs is on the development of phonological awareness which turns out to be highly universal across languages (Ziegler & Goswami, 2005).
The present meta-analysis yields some important guidelines for teachers in early childhood education. Overall, the research in this meta-analysis lends support to the finding that computers can have a beneficial effect on young children's learning in the domain of early literacy. Substantial gains in the domains of phonological awareness, letter knowledge and early reading and spelling can be evidenced via computer-supported early literacy interventions. However, it should be noted that the intervention outcomes stay behind those of classroom-based early literacy interventions. Importantly, the present research indicated that the use of educational technology can be more effective when the materials are better integrated and consistent with the curriculum and provide ongoing scaffolding for each learner. Teacher training can be seen as highly relevant to help them integrating computer-based learning devices in early childhood education (cf. Cassady & Smith, 2003;Neuman & Dickinson, 2011).
To conclude, the present meta-analysis shows that technology-enhanced early literacy interventions on average yield small positive effects across program types and outcome measures and across languages differing in orthographic depth. If the intervention is part of an integrated learning system in the classroom, the effects are even better. The present analysis also shows the importance of methodological rigor of study designs being used in that effect sizes were higher in cases of absence of randomization and comparisons with classroom teaching as usual which is in line with previous research (cf. Andrews et al., 2007;Troia, 1999).

Appendix A. Supplementary data
Supplementary data to this article can be found online at https://doi.org/10.1016/j.edurev.2020.100325.