The potential of digital tools to enhance mathematics and science learning in secondary schools: A context-specific meta-analysis

A B S


Introduction
There has been a growing body of research on the use of digital tools in school settings, recently (e.g., Ma, Adesope, Nesbit, & Liu, 2014;Steenbergen-Hu & Cooper, 2013). Numerous studies examined the effect of using digital tools on students' achievement. However, the media debate (Clark, 1994;Kozma, 1994) strengthened the argument that it might not be the mere medium that has an effect on learning outcomes. In consequence, contextual factors-such as teachers' views of digital media for teaching and learning or specific instructional design features of digital tools-came into focus. In the International Computer and Information Literacy Study (ICILS), for instance, teachers of 8th-grade students across 12 countries were asked for their attitudes toward digital tools (Fraillon, Ainley, Schulz, Friedman, & Duckworth, 2019, p. 183). The ICILS study shows that 87% of teachers across the participating countries think that ICT helps students to work at a level appropriate to their learning needs, and 78% state that ICT enables students to collaborate more effectively. Additionally, 91% of teachers agreed with the statement that information and communications technology (ICT) helps students develop greater interest in learning (Fraillon et al., 2019, p. 184). However, at least 23% of the participating teachers agreed that ICT also impedes concept formation by students and 37% state that ICT distracts students from learning (Fraillon et al., 2019, p. 185).
This ambiguity is also apparent when we take a closer look at studies investigating if and how digital tools impact teaching and learning. Despite the wide range of research studies and meta-analyses published in the last years (e.g., Al-Balushi, Al-Musawi, Ambusaidi, & Al-Hajri, 2017;Bayraktar, 2001Bayraktar, /2002€ Ozyurt, € Ozyurt, Güven, & Baki, 2014;Perry & Steck, 2015;Van der Kleij, Feskens, & Eggen, 2015), the impact of using digital tools is not yet fully clear, as numerous studies differ in their findings. However, research shows that the use of digital tools can especially enhance learning in the context of technology-related subject matters within mathematics and science courses Lesh, Post, & Behr, 1987;Ozdemir, Sahin, Arcagok, & Demir, 2018;Steenbergen-Hu & Cooper, 2014;Sung, Yang, & Lee, 2017; Van der Kleij et al., 2015). Today mathematical literacy, as well as science literacy, can be considered fundamental for social participation and "necessary for finding solutions to complex (real-world) problems" (OECD, 2016a, p. 6). However, international large scale studies as-for instance-PISA show that a substantial amount of students worldwide struggle with learning mathematics (e.g., OECD, 2016b, p. 192;OECD, 2019a, p. 106) and natural sciences (e.g., OECD, 2016b, p. 71;OECD, 2019a, p. 114). Here, the use of digital media yields high potential for the teaching and learning of mathematics (e.g., Gunbas, 2015), and science (e.g., Buckley et al., 2004;Chang, Chen, Lin, & Sung, 2008;Frailich, Kesner, & Hofstein, 2009, see Section 2.2.2). Therefore, this meta-analysis examines if and how the use of digital tools can especially enhance secondary school mathematics, biology, chemistry and physics. To this end, we analyzed existing published primary studies investigating the impact of using digital tools compared to instruction methods without the use of digital tools.

Theoretical rationale for using digital tools in teaching and learning
In Section 2, we first describe theoretical rationales for learning with interactive digital tools (Section 2.1), and second, how different types of interactive digital tools-and their corresponding characteristics-may enhance student learning (Section 2.2). In addition, we give an overview of prior research on digital tool use within mathematics and science learning as well as the influence of contextual factors on computer-supported student learning (Section 2.3).

Learning with interactive digital tools
With the three assumptions underlying the cognitive theory of multimedia learning, Mayer (2014) described why learning with digital tools can be beneficial: According to the dual-channel assumption, learners can organize information into two different cognitive structures, namely the visual and the auditory channel. The second assumption is the limited capacity of information processing in one channel. Therefore, it is favorable if learning environments stimulate the activation of both channels, the visual and auditory channel, in order to prevent a cognitive overload. This is possible, for example, by presenting sound images or spoken texts in combination with written texts or visual images. The third assumption is that learners need to engage actively with learning content in order to comprehend new information (Mayer, 2014). This is possible by the use of interactive learning environments, where the learner can actively and directly influence their own learning processes. In other words, "the defining feature of interactivity is responsiveness to the learner's action during learning" (Moreno & Mayer, 2007, p. 310).
Such interactivity can be further categorized into dialoguing, controlling, and manipulating: Dialoguing means that the learner receives additional information on demand or feedback on his or her entered solutions. Interactivity by controlling occurs when the learner determines his or her individual learning pace or the preferred order of presentation. Finally, the learner can interact with learning environments by manipulating the presented information. This means that he or she "can control aspects of the presentation, such as setting parameters before a simulation runs, zooming in or out, or moving objects around the screen" (Moreno & Mayer, 2007, p. 311). Thus, in contrast to other instruction methods without these interactive features-where the learner passively receives information-an interactive learning environment enables learners to act as sense-makers constructing their own knowledge. Because "deep learning depends on cognitive activity" (Moreno & Mayer, 2007, p. 312) interactive tools are supposed to support student learning by offering specific characteristics such as the previously described dialoguing, controlling or manipulating.

Different types of interactive digital tools
In the following section, we describe different types of interactive digital tools with their corresponding characteristics (Section 2.2.1) and outline in greater detail how interactive features can enhance student learning of mathematics and science (Section 2.2.2).
Different types of interactive digital tools vary in regard to the instructional design features they provide and can hence be expected to differ in their impact on student learning. Therefore, research on the effectiveness of using digital tools in teaching and learning should focus more sharply on different types of tools (e.g., Higgins, Huscroft-D'Angelo, & Crawford, 2019). We follow the characterization by Nattland & Kerres, 2009 and divide digital tools into five categories-which are commonly used in educational settings-bearing different instructional design features: drill and practice programs, tutoring systems, intelligent tutoring systems, simulations, and hypermedia systems. Common to all of them is the availability of interactive features, however, interactivity occurs in different ways: Whereas drill and practice programs serve to strengthen content knowledge acquired before by giving the learner the opportunity to practice at his or her own pace, and to repeat certain exercises as often as needed, tutoring systems can provide new knowledge content in small units and additionally offer opportunities for the learners to practice. The common characteristic of these two types is that learners get immediate feedback (Hattie & Timperley, 2007) on the correctness of the submitted solution. Intelligent tutoring systems additionally have adaptive features: they can present new content in consideration of the learners' prior knowledge. Moreover, they allow individual adaptation of task difficulty or pace of presenting new content to learner needs. Moreover, they provide differentiated feedback or hints in order to support optimal learning processes (Nattland & Kerres, 2009). Especially tools with adaptive features, like intelligent tutoring systems, can be expected to have a positive impact on student performance (Ma et al., 2014). Hypermedia systems are not designed to teach learning units in a structured way but serve as non-linear hypertext systems that are mostly used as encyclopedias within educational settings, providing information connected through hyperlinks. Simulations represent complex real-world situations that can be changed by manipulating different parameters. In general, simulations can be used to apply or expand knowledge by explorative learning. Prominent examples of such simulation programs are virtual laboratories. In addition, computer algebra systems-such as GeoGebra (see for instance Hohenwarter, Jarvis, & Lavicza, 2009)-also enable to learn in a comparable explorative way and can be considered as one kind of simulation (Hegedus, Dalton, & Tapper, 2015;Lichti & Roth, 2018). However, we use a separate category for these tools within our meta-analysis, because "a genre of software development called 'Dynamic Mathematics' has created a suite of tools to construct and interact with mathematical objects and configurations" (Hegedus et al., 2015, p. 205). For the purpose of this study, we divided the simulation category into virtual reality and dynamic mathematical tools. The latter stands for simulation programs that allow the manipulation of mathematical expressions.

Instructional design features within interactive learning environments
In the following section, we describe in greater detail how three instructional design features (i.e., feedback, pacing, and guided activity, see Moreno & Mayer, 2007) can enhance learning with interactive tools: Feedback can be implemented into digital tools (e.g., Van der Kleij et al., 2015). There is empirical evidence that feedback can have a positive effect on students' learning, yet, explanatory feedback is more beneficial than corrective feedback alone (Hattie & Timperley, 2007). Corrective feedback provides the correct answer only, whereas explanatory feedback provides information about why students' answers are correct or incorrect and therefore can help overcome exisiting misconceptions (Moreno & Mayer, 2007). Regarding digital tools, Van der Kleij et al. (2015) investigated the impact of different methods of providing feedback in computer-based learning environments, such as providing the correct answer, providing the correctness of the answer or providing an explanation why the answer is correct or incorrect. They reported positive effects across all types of feedback; however, elaborated feedback, which e.g. provided explanations, showed largest effect sizes. Feedback providing the correct answer yielded larger effect sizes than feedback that only showed the correctness of the answer. In line with these findings, studies suggest that using intelligent tutoring systems-which give explorative feedback-can have a positive impact on learning outcomes (Belland, Walker, Kim, & Lefler, 2017;Ma et al., 2014;Van der Kleij et al., 2015), whereas negative effects for drill and practice programs-which only give corrective feedback-were found (Bayraktar, 2001(Bayraktar, /2002. There is some evidence that pacing as well as guided activity can be beneficial features for student learning. Pacing as an example of controlling interactivity can be favorable for students, because they can take control over their own learning speed: Moreno (2006) found in her study that learning with the opportunity for pacing resulted in lower ratings of difficulty compared to learning without pacing. In a recent meta-analysis, Belland et al. (2017) analyzed the impact of scaffolding, which is one type of guided activity. They reported a positive overall effect of computer-based scaffolding on learning outcomes. More general, "guided activity enables students to interact with a pedagogical agent who guides their cognitive processing during learning" (Moreno & Mayer, 2007, p. 315).
Whereas all mentioned types of digital tools do offer control of speed by the learner, drill and practice programs as well as hypermedia systems do not offer guided activity. Intelligent tutoring systems combine all mentioned features (feedback, pacing and guided activity). Thus, they can be expected to yield high potential for improving learning.
Empirically, several meta-analyses focused on intelligent tutoring systems or on the impact of specific features of this type of tool, such as adaptivity, scaffolding or feedback. Ma et al. (2014) as well as Steenbergen-Hu and Cooper (2014) found positive overall effects of using intelligent tutoring systems (g ¼ 0.41 and g ¼ 0.37, respectively).

How the use of digital tools can enhance mathematics and science learning
Several studies show that the use of digital tools can especially enhance learning and teaching within technology-related subjects such as mathematics (e.g., Gunbas, 2015), physics (e.g., Chang et al., 2008), biology (e.g., Buckley et al., 2004) or chemistry (e.g., Frailich et al., 2009). We illustrate the potential of digital tools for the teaching and learning in these subjects with a focus on mathematics, exemplarily, bearing in mind that there is a certain degree of distinctiveness in each of these domains regarding learning as well as teaching processes.
The use of digital tools can support skills and strategies that are highly relevant in the scientific and mathematical content area, such as real-world problem solving (Greefrath, Hertleif, & Siller, 2018;Huppert, Lomask, & Lazarowitz, 2002) or visualizing complex relationships (e.g., Koklu & Topcu, 2012). It can support learning through interactive and scaffolded activities (e.g., Reinhold et al., 2020). In addition, manipulating representations in computer simulations can support model-based learning-as students may understand mathematics and science concepts more elaborately because they observe direct consequences of the changes they make (Buckley et al., 2004). Furthermore, it can help students overcome "cognitive constraints originating from various misconceptions" (Jimoyiannis & Komis, 2001, p. 184). In mathematics, for instance, dynamic tools such as GeoGebra (Greefrath et al., 2018;Hohenwarter et al., 2009) enable students to learn abstract subjects, such as geometry, algebra and calculus in an interactive and explorative manner (Bhagat & Chang, 2015;Lichti & Roth, 2018;Shadaan & Leong, 2013). Such dynamic mathematical tools-as well as computer algebra systems which are still used remarkably low in schools (OECD, 2015, p. 56, p. 56)-can support mathematical learning, such as "understanding algebraic reasoning, finding patterns, and reflecting on the solution process" (Kramarski & Hirsch, 2003, p. 250). For instance, Bhagat and Chang (2015) found that the use of GeoGebra enhanced reasoning and visualization skills of students.
Regarding more general features, adaptive digital tools allow students to receive content according to their individual learning style (Section 2.2.1), which can especially be fruitful when students learn new and abstract mathematical concepts (Reinhold, Hoch, Werner, Richter-Gebert, & Reiss, 2020;€ Ozyurt et al., 2014). In addition, digital tools can also provide opportunities for students to practice content knowledge acquired before, which is important-for example-for fostering mathematical principles at a more basic level (Soliman & Hilal, 2016;Tienken & Wilson, 2007). By providing individual feedback to the learner immediately, specific tools aim to avoid developing typical misconceptions (Reinhold et al., 2020), which are often a problem in learning mathematics (e.g., Lortie-Forgues, Tian, & Siegler, 2015;Obersteiner, Van Hoof, Verschaffel & Van Dooren, 2016) and science (Jimoyiannis & Komis, 2001).
With a focus on not mere cognitive, but affective learning outcomes, there is evidence that the use of digital tools in teaching and learning mathematics can increase student motivation (e.g., € Ozyurt et al., 2014;Turk & Akyuz, 2016). One commonly used argument for this positive effect is derived from self-determination theory (Deci & Ryan, 2010, pp. 1-2): the opportunity to make own choices during the learning process (i.e., autonomy) and experiencing tasks as challenging but not overly complicated (i.e., competence) can be achieved via implementing educational features-such as feedback, pacing, and guided activity (Section 2.2.1)-into digital tools (Wouters, van Nimwegen, van Oostendorp, & van der Spek, 2013). Here, € Ozyurt et al. (2014) showed that students were more satisfied when they learned mathematics with the use of an intelligent tutoring system, and they were of the opinion that it facilitated understanding mathematics. In the same line, other researchers found a similar result when they asked students for their attitudes toward learning with the dynamic geometry system GeoGebra: "drawing in paper pencil environment could require drawing the shapes again and again. However, in computer it is easier and enjoyable" (Turk & Akyuz, 2016, p. 100).

Prior meta-analyses on learning with digital tools in mathematics and science subjects
In the following, we first illustrate the potential of meta-analysis in the field of educational research (Section 2.3.1). Second, we describe general findings of prior meta-analyses investigating the effects of digital tool use in secondary school mathematics and science subjects (Section 2.3.2). Third, we specifically describe the results of prior meta-analyses regarding contextual factors of computer-supported learning (Section 2.3.3). To interpret the effect sizes reported in prior studies, we apply a rule of thumb that is used frequently in other meta-analyses within the field (e.g., Bayraktar, 2001Bayraktar, /2002Belland et al., 2017;Sung et al., 2017): values for Cohen's d and Hedges's g between 0.20 and 0.50 can be considered small effects, values between 0.50 and 0.80 medium, and values over 0.80 large effects (Cohen, 1988). Although many different rules of thumb for effect-size interpretation exist, we restrict to this one in order to warrant better readability and comparability. Regarding the derivation of practical implications, categorizing differently sized effects should be done with caution (Lipsey et al., 2012). Thus, we always report the exact values of effect sizes, too.

The potential of meta-analysis
Due to a large number of studies and great interest in the impact of using digital tools on teaching and learning, as well as in the efficacy of the mentioned contextual factors, different researchers have conducted meta-analyses on the effects of computer-based mathematics and science learning. Research syntheses offer the opportunity to describe the status quo of a certain research field, which additionally allows for detecting research gaps (Borenstein, Hedges, Higgins, & Rothstein, 2009). Furthermore, meta-analyses can improve the possibility of generalizing the results, since different studies focus on different samples or were conducted in diverse settings at different times (Borenstein et al., 2009). In addition to contributing to developing and verifying theories, e.g., by testing for potential moderating effects, meta-analyses can also have an impact on political and practical decision processes by disseminating relevant research findings (Cumming, 2012).

Overall effects of using digital tools within mathematics and science learning
A comprehensive search for prior meta-analyses in the field of mathematics and science learning shows that they either investigate the general effects of digital tool use versus instruction methods without digital tool use or they focus on the impact of specific characteristics of computerized learning. Prior meta-analyses that compared the outcomes of technology-supported learning to those of other instruction methods found overall effects of different sizes. Bayraktar (2001Bayraktar ( /2002 investigated the effect of computer-assisted instruction on student achievement in science subjects such as biology, chemistry or physics, and found a small positive overall effect of d ¼ 0.27. An earlier meta-analysis by Kulik and Kulik (1991) showed the advantage of computer-based science and mathematics instruction compared to other methods as having a moderate but significant effect of d ¼ 0.30. Sung, Chang, and Liu (2016) focused on the effect of integrating mobile devices in teaching and learning in comparison to instruction methods without digital tools and also found a moderate but slightly larger effect of g ¼ 0.52.
Two other recent meta-analyses investigated the potential of computer use to support collaborative learning. Sung et al. (2017) found an effect of g ¼ 0.47 for computer-supported collaborative learning compared to instruction methods without computer use. A significant positive effect of g ¼ 0.45 for computer use in collaborative learning environments on knowledge gain was also reported by Chen, Wang, Kirschner, and Tsai (2018).

Contextual factors of learning with digital tools in science and mathematics
Such positive overall effects of digital tool use in mathematics and science classrooms may be one reason for the effort to improve computer equipment at schools today. However, evidence shows that "accessibility does not always imply an improved learning environment" (Higgins et al., 2019, p. 285, see also the media debate; Clark, 1994;Kozma, 1994). Thus, several contextual factors of learning with digital tools should be considered as well because they can influence the effectiveness of digital tool use on student achievement (e.g., Bayraktar, 2001Bayraktar, /2002Steenbergen-Hu & Cooper, 2013;Sung et al., 2017). Here, prior meta-analyses considered the student-to-computer-ratio, compared different grade levels and school subjects or replacing versus supplemental use of digital tools.
The student-to-computer-ratio and its beneficial impact on student achievement-which can be understood as an indicator for collaborative learning within computerized learning environments-was reported by Chen et al. (2018) as well as Sung et al. (2017). Frailich et al. (2009) stated that the use of digital tools in groups resulted in more productive collaborative work among students. However, Bayraktar (2001Bayraktar ( /2002 found that the effect was larger when students used digital tools on their own compared to pairwise use or use in larger groups. Regarding different grade levels, prior meta-analyses yield diverse results: in one prior meta-analysis different effect sizes between grade levels were reported (Sung et al., 2017), whereas other meta-analyses found no differences in the effect of digital tools on student learning between different grade levels (Bayraktar, 2001(Bayraktar, /2002Steenbergen-Hu & Cooper, 2013).
Comparing the effects of computer use in different school subjects, the effects for mathematics and sciences was found to be largest at g ¼ 0.89, respectively g ¼ 0.77 (Sung et al., 2017). However, Cheung and Slavin (2013) only detected a small effect of the use of digital tools on student achievement in mathematics, d ¼ 0.16, and Steenbergen-Hu and Cooper (2013) an overall effect of g ¼ 0.09, which can be considered negligible.
Regarding replacing versus supplemental use, there is evidence that a supplemental use of digital tools in the regular classroom produced greater learning outcomes than the complete replacement of other instruction methods (Bayraktar, 2001(Bayraktar, /2002.

The present study
Previous meta-analyses investigating the effects of computer-supported learning either focused on one type of digital tool, such as intelligent tutoring systems (Ma et al., 2014;Steenbergen-Hu & Cooper, 2014), were limited to only one school subject , or focused on studies from one specific country (Bayraktar, 2001(Bayraktar, /2002. Moreover, there is a rapidly developing market of educational software and appropriate hardware. Thus, the consistently increasing number of studies investigating the impact of learning with digital tools make it difficult to maintain a current overview of this topic. Here, it is important for researchers and practitioners to understand "what the comparative effects of different approaches are" (Chen et al., 2018, p. 804). In addition, the diverse results for contextual factors of learning with digital tools in science and mathematics (Section 2.3.3) suggest that there is need for further clarification regarding the impact of student-to-computer-ratio, grade levels, school subjects, or replacing versus supplemental use of digital tools. Moreover, other contextual factors-such as the duration of using digital tools (Jeno, Vandvik, Eliassen, & Grytnes, 2019), or the level of guidance by teacher or peers (see Fraillon et al., 2014 , p. 22) during learning with digital tools-are of equal importance but seem underrepresented in recent meta-analyses.
Hence, there is a need for an updated comprehensive meta-analysis on the effects of using digital tools particularly in mathematics, chemistry, physics and biology on secondary school students' academic performance that is not limited to certain countries or particular types of tools and that considers a broad variety of contextual factors.

Potential moderators considered
The current study does not only analyze the overall effect of the use of digital tools in mathematics and science classrooms, but also takes into account several specific conditions, which are expected to be more or less favorable for student learning. These contextual factors are treated as potential moderators affecting the described overall effect.
First, in order to analyze the impact of tool-specific features, type of digital tool is treated as a potential moderator. As portrayed in Section 2.2, the scheme used in this study is built upon the classification scheme of Nattland & Kerres, 2009, discriminating the types of digital tools most commonly used in schools. This allows an overview and direct comparison of the effects regarding frequently used digital tools that were divided into (a) drill and practice programs, (b) tutoring systems, (c) intelligent tutoring systems, (d) dynamic mathematical tools (e) virtual reality, and (f) hypermedia systems. Dynamic mathematical tools include computer algebra systems or dynamic geometry software to manipulate geometric constructions. The category virtual reality covers, for example, virtual chemistry laboratories representing complex real world situations, which can also be changed by manipulating parameters (Nattland & Kerres, 2009). The six types of digital tools are not always exclusive because they partially overlap due to similar features (Nattland & Kerres, 2009). Thus, they were classified according to their dominant function. Game-based learning was not considered in this study, since educational games can be very different from each other in terms of their concept or features and there is a lack of well-designed boundaries within the educational research field (Gros, 2007). Therefore, an exact categorization into our scheme would be difficult, because the features always overlap with at least one other type of digital tool. Yet, recent meta-analyses cover the topic of gamification of learning (e.g., Sailer & Homner, 2019).
As mentioned earlier, contextual factors of learning environments are considered as potential moderator variables within the current study.
The first contextual factor is the student-to-computer-ratio. We investigate whether the impact of learning with digital tools is more beneficial if students use tools on their own, pairwise or in groups.
The second contextual moderator variable is school subject. To gain more insight into the divergent results between school subjects (mathematics, physics, chemistry, and biology) within prior studies, we also consider this aspect in the analyses.
Another potential moderator is the school level in which digital tools are used. To clarify the empirical inconsistency within prior studies, grade level is considered within the current analyses, too. Since we focus on the advanced primary and the secondary level, studies were ranked into even categories of grade levels 5 to 7, 8 to 10, and 11 to 13.
A further interest lies in the importance of support for students when using digital tools in class. As technology constantly develops and computer-based instruction in schools is still rather unfamiliar for many teachers (Fraillon, Ainley, Schulz, Friedman, & Gebhardt, 2014, p. 207) as well as for students (Fraillon et al., 2014, p. 22), it follows that specific support for using media is pivotal. Thus, we explore the impact of support for students provided by teachers and/or peers versus no support as another potential moderator variable.
In order to investigate whether a supplemental use of digital tools in the regular classroom has greater impact on learning outcomes than if they completely replace methods without digital tools, the type of computer use is considered within moderator analyses in this study.
The teacher's competence in using digital tools in class is a central determinant for successful student learning. However, teacher self-assessments show that a considerable proportion of teachers do not feel sufficiently educated in this area (Fraillon et al., 2014, p. 207). For this reason, providing teacher training as a potential factor for positive student learning outcomes is also investigated here. Providing specific teacher trainings are recognized as a study quality issue and means a larger expense for the researchers. It is assumed that if teacher training is available, that the authors would also report it. Thus, if a study does not report on the presence of a specific training, it is coded as no teacher training.
Furthermore, there are some methodological features that were already identified by several prior meta-analyses as potential moderators (e.g., Chen et al., 2018). These study characteristics are duration of the interventions, sample size, study design and the instructor effect.
The duration of interventions moderated the effect in prior studies, as shorter interventions resulted in higher learning outcomes (Bayraktar, 2001(Bayraktar, /2001Sung et al., 2017). In the present study, duration is categorized into interventions with durations of one day to six days, one week to four weeks, four weeks to six months, and more than 6 months, because prior studies found significant differences between these categories (Chen et al., 2018;Sung et al., 2017).
Empirically, studies with smaller sample sizes produce larger effects (Chen et al., 2018;Slavin and Smith, 2009). Therefore, in order to explore differences between sample sizes within the current study, we categorized the studies into sample sizes of 100 or fewer, sample sizes between 101 and 500, or more than 500. For further analyses, all studies were additionally categorized into only two categories of more or fewer than 100.
In prior meta-analyses considering study design, such as in Belland et al. (2017) and Chen et al. (2018), the type of randomization was reported as a significant moderator. They found larger effects for quasi-experimental studies than for randomized studies, which is thus considered within our study as well.
The last potential moderator analyzed in our study is the so-called instructor effect, focusing on whether the intervention in the experimental and control groups were conducted by the same or by different persons. Prior studies found larger effect sizes when the treatment and the control group were instructed by different people rather than by the same teacher or researcher (Bayraktar, 2001(Bayraktar, /2002Kulik & Kulik, 1991).
Some studies additionally reported findings on student outcomes such as emotional and motivational orientations, which were considered in an additional explorative analysis within the current study. The term attitude was used to summarize these student outcomes since it generally describes "different psychological concepts with diverse theoretical backgrounds, such as emotional and motivational orientations, self-related cognitions, strategies, and value beliefs" (Schiepe-Tiska, Roczen, Müller, Prenzel, & Osborne, 2017, p. 306). Motivational student outcomes such as interest in science and mathematics or instrumental forms of motivation can influence student career choices later in life (Dowker, Sarkar, & Looi, 2016;Schiepe-Tiska et al., 2017). Because of the low student interest in sciences in many OECD countries found in PISA 2015 (OECD, 2016b, p. 125), and the current need for qualified employees in the field of science and engineering (OECD, 2016b, p. 110), it seems important to investigate the potential of digital tool use to have a positive effect on students' attitude in these fields.

Research questions
The purpose of this meta-analysis is to clarify the impact of learning mathematics and sciences with digital tools on the performance (and attitudes toward the taught subject) of secondary school students. It addresses the following research questions: � Do secondary school students learning with digital tools in mathematics and science classes have different learning outcomes (and attitudes) compared to students learning without the use of digital tools? � Which conditions of learning with digital tools in mathematics and science classes are favorable with regard to student learning outcomes?

Method
In the following section we provide a detailed overview of the inclusion criteria (Section 4.1) and describe the method of the literature search (Section 4.2), the process of coding studies (Section 4.3) including measures for interrater reliability (Section 4.4), and data analysis (Section 4.5). For a detailed introduction to meta-analysis see-for example- Borenstein et al. (2009).

Inclusion criteria
All studies had to meet the following pre-defined inclusion criteria in order to ensure maximum quality standards: � Primary data were reported in the study. � Digital tools (computer, tablet, smartboard, mobile phone, notebook, or CAS computer) were used during mathematics, physics, chemistry, biology or science class in general (and were not additionally used at home). � The sample consisted of secondary-school students (grade levels 5 to 13). � The sample did not consist only of students with special educational needs (e.g., only gifted or only disabled students). � The dependent variable was student performance and optionally student attitudes in addition. � The study had a pre-post-control-group design. � The study did not investigate the effects of computer games. � Effect sizes or data necessary for effect size calculation were reported in the study. � The control group consisted of students taught with instruction methods not using digital tools. � The study was published between 2000 and October 5, 2018. � The study was published in a peer-reviewed journal and a full text was available in English or German.

Literature search
The literature search was conducted in three relevant major databases, specifically, Web of Science, Scopus and ERIC. The databases were chosen to cover different disciplines, with Web of Science providing studies in the field of social sciences, arts, and humanities, Scopus providing studies in the field of natural sciences and technology, and ERIC providing studies in the field of education research.
We searched for studies published between 2000 and October 5, 2018. This timeframe was chosen because the databases used in this study show the first significant increase in the number of publications on the effectiveness of learning with digital tools between 2000 (e.g., 5 results in Scopus) and 2005 (e.g., 43 results in Scopus). In the Scopus database, for example, the literature search for studies published between 1975 and 2000 yielded one to a maximum three results per year.
With the following syntax, k ¼ 6572 studies were found in all three databases: "study" OR "empiric*" OR "research" AND "digital media" OR "tablet" OR "computer" OR "whiteboard" OR "smartboard" OR "ipad" OR "pc" OR "cas" OR "ict" OR "netbook" OR "software" AND "stem" OR "math*" OR "mint" OR "physic*" OR "chemistry" OR "biology" OR "science" AND "secondary school" OR "high school" OR "secondary education" OR "middle school" NOT "computer science" NOT "informatics" NOT "engineering." The 6572 studies were then limited to articles published in peer-reviewed journals, since peer review is "a generally accepted criterion to ensure scientific quality" (Van der Kleij et al., 2015, p. 502). The terms were searched within titles, abstracts or keywords. Three filters were used to refine the results, namely document types (limited to articles), publication years (between 2000 and October 5, 2018), and research areas, such as psychology, education, mathematics, chemistry, biology and physics.
After initially screening titles and abstracts as well as removing duplicates, 474 articles remained that met the inclusion criteria. During the detailed coding process using the full texts, additional studies were excluded, because they did not fit the inclusion criteria that are stated above. Therefore, the final dataset consisted of 92 studies from 91 articles (in one of the articles, two independent studies were reported) with a total of N ¼ 14,910 students. The entire study selection process is described in a flowchart in Fig. 1, following the guidelines of The PRISMA Group (Moher, Liberati, Tetzlaff, & Altman, 2009).
Most studies were conducted in Turkey (k ¼ 22), followed by studies conducted in the US (k ¼ 21), Taiwan (k ¼ 10), Two articles did not report where the study was conducted.

Coding strategy
The coding form included the following items: type of digital tool, grade level (5-7, 8-10 or 11-13), subject (mathematics, biology, chemistry, physics or science in general), student-to-computer-ratio, sample size, duration of the study, whether digital tools were used supplemental to other existing methods in class or if they replaced them, student support by teacher and/or peers, provision of teacher training, randomization (quasi-experimental or experimental design), and whether the interventions in the experimental and control group were conducted by different persons or by the same person. All 92 studies were coded independently by at least two raters.

Interrater reliability
The detailed coding form developed for this synthesis was piloted by coding six studies that were excluded from the final data set. As a value for interrater reliability, Cohen's Kappa (Cohen, 1968) was calculated for each variable separately, because it was expected that some of the variables would yield lower values than others because of heterogeneous underlying theoretical concepts used in the analyzed studies. Indeed, values for Cohen's Kappa ranged from κ ¼ 0.24 (type of digital tool), which is considered a fair agreement, to κ ¼ 0.87 (subject), which represents nearly perfect agreement (Landis & Koch, 1977). The mean of all values for Cohen's Kappa was κ ¼ 0.43 (SD ¼ 0.23). The agreement for variables without any scope for interpretation, such as data for effect size calculation, was almost perfect. Each particular disagreement was discussed in regular team meetings in which three of the authors took part until a consensus was reached. For this purpose, the relevant full texts were considered.

Effect size calculation and data analysis
After the data set was completed, the data was imported to the Comprehensive Meta-analysis (CMA) program for further analyses (Borenstein et al., 2009). In order to calculate the overall effects, all available effect sizes were transformed to the effect size Hedges's g (Hedges, 1981). If studies did not report any effect size, relevant data was used for calculations. In addition to the calculations of the weighted mean effect size of Hedges's g with its standard errors and 95% confidence intervals around each mean, CMA was also used to test for homogeneity by calculating Q, p, and I 2 . The Q statistic was inspected to examine whether all analyzed studies share a common population effect size. Q is approximately chi-square-distributed with k -1 degrees of freedom. If Q exceeds the critical value of the distribution, this indicates that the effect sizes significantly vary between the studies (Shadish & Haddock, 2009). In addition, I 2 was used as a descriptive statistic that represents the amount of real variance within the observed variance of the effect sizes. An I 2 of over 75% can be considered high variance. In the latter case, subgroup analyses are appropriate for finding reasons for the variance of the effect sizes (Borenstein et al., 2009).
Hedges's g is comparable to Cohen's d and therefore represents the standardized mean difference, which is calculated by dividing the difference between experimental and control group means by the pooled standard deviation (Hedges, 1981). Although the two effect sizes are closely similar, the difference between them is that "Cohens's d uses N for the denominator of the estimated variance to obtain the standard deviation, whereas Hedges's g uses N -1, that is, the pooled within-sample unbiased estimate of the population variance to obtain the standard deviation" (Rosnow & Rosenthal, 2003, p. 223). Thus, because Cohen's d tends to overestimate the standardized mean differences in smaller samples (Borenstein et al., 2009), and we only have access to the standard deviation of the sample and not to the population standard deviation, it is more evident to use Hedges's g within the current analyses. The interpretation of Hedges's g is equivalent to Cohen's d (see Section 2.3).
Some studies were special in that they compared, for example, two experimental groups with only one control group, and therefore the data was not independent. In case of multiple dependent comparisons within a study, effect sizes were combined by calculating their mean. Moreover, the correlations between the subgroups were computed based on the exact group sizes. These correlations were taken into account when calculating the variance of the composite effect size (see Borenstein et al., 2009).
The second case of dependent data was the report of multiple outcomes per study and therefore of one and the same group. In this case, the correlations of the outcomes were set to one (r ¼ 1). This approach tends to overestimate the standard error and to underestimate precision, which is therefore a more conservative method for combining dependent outcomes than to assume completely independent outcomes (see Borenstein et al., 2009).
For all analyses, a random-effects model was used. Differences in the effect sizes among studies under the use of a random-effects model are not only due to sampling error but also due to systematic variance between the studies (Borenstein et al., 2009). Because the 92 studies were conducted by independent researchers with different ways of implementing interventions and of using digital tools, one cannot assume that heterogeneity among studies might not be influenced by these different circumstances. Moreover, different variables and study designs were used in the analyzed studies. Therefore, a random-effects model seems more appropriate than a fixed-effect model (Borenstein et al., 2009;Cooper, 2010;Lipsey & Wilson, 2001). Furthermore, the results of this meta-analysis should be generalizable to students of various grade levels and in various school subjects, hence, this is why a random-effects model is the more obvious choice.
Moderator analyses were conducted to investigate whether different study features (e.g., type of digital tool or randomization) have a differently sized impact on the overall effect. Therefore, each study was classified into one of the corresponding categories of every potential moderator variable. If there was no information available, the category not reported was used. The difference between the respective subgroups was tested with Q B , which stands for the heterogeneity among the groups and is equivalent to the F-value within ANOVA. Thus, a significant Q B means that there is a statistically significant heterogeneity among the subgroups and that the moderator variable can partially explain the heterogeneity among the effect sizes (Hedges & Pigott, 2004). For all analyses, the significance level was set at α ¼ 0.05.
To analyze whether a publication bias could have influenced the results of the study, fail-safe N (Rosenthal, 1979) was computed. Furthermore, we applied a rank correlation test, which analyzed the correlation between the standard error and the effect size. This inverse correlation is expected because studies with smaller sample sizes are more often included in meta-analysis if they show large treatment effects (Begg & Mazumdar, 1994). The funnel-plot-based method of trim-and-fill was used as a third approach to assess the threat of potential publication bias. With this method, unmatched values in the distribution can be trimmed to obtain a more symmetric funnel plot and missing values are imputed. Then, the overall effect can be recalculated by taking the additional values into consideration (Duval & Tweedie, 2000).

Descriptive statistics
In total, the 92 studies yielded 117 effect sizes regarding the student learning outcomes of N ¼ 14,910 students. All studies with their characteristics and effect sizes are presented in Table 1. We combined multiple outcome measures per study as described above and therefore used one effect size per study as independent units for the analyses. These 92 effect sizes range from g ¼ -0.33 to g ¼ 2.46, representing the minimum and maximum effect sizes, respectively. Eighty of the effect sizes (87%) are positive, whereas twelve studies (13%) yielded negative effect sizes, indicating that digital tools adversely affected student learning outcomes. Analyses regarding the effects of digital tool use on student attitudes revealed a total of 16 effect sizes ranging from g ¼ -2.24 to g ¼ 1.59. The minimum effect size of g ¼ -2.24 is the only negative one, whereas 15 of the 16 effect sizes indicate a positive impact of digital tool use on student attitudes toward the subject. Fig. 2 shows the forest plot of the 92 effect sizes regarding student learning outcomes with the standardized mean difference for each study and their corresponding 95% confidence intervals.

Outlier analysis
Based on the one-study-removed-approach (Borenstein et al., 2009), the outlier analysis in CMA shows that all 92 studies have effect sizes related to student learning outcomes that fall within the 95% confidence interval of the average effect size. Thus, no study was identified as an outlier. The analysis of the 16 effect sizes related to student attitudes also revealed no obvious outlier.

Overall effect on student learning outcomes
The first research question focused on the benefits of using digital tools in science or mathematics classes on student learning outcomes in comparison to classes learning without digital tools. The overall effect shows that the use of digital tools had a medium positive and statistically significant effect on student learning, g ¼ 0.65, 95% CI [0.54, 0.75], p < .001. Hence, secondary school students who learned with the use of digital tools in science or mathematics classes had significantly greater learning outcomes than students that were taught without the use of digital tools.
The test for heterogeneity showed that the effect sizes varied significantly between the studies with Q (91) ¼ 757.92, p < .001, and with I 2 ¼ 87.99, indicating large heterogeneity.

Publication bias
To test for potential publication bias, fail-safe N was computed (Rosenthal, 1979). The value of fail-safe N shows how many non-significant studies are missing in the analyses to nullify the positive effect found: 8891 non-significant studies are therefore needed to invalidate the observed overall effect. The limit of 5k þ 10 studies suggested by Rosenthal (1979), which is 470 in the current study, was therefore far exceeded. Additionally, we used the rank correlation test as another method for analyzing the threat of potential publication bias by analyzing the rank correlation between the study size and the effect size (Begg & Mazumdar, 1994    result of the test with Kendall's τ ¼ 0.18, p < .01, indicated a significant correlation between the treatment effect and the standard error, and thus suggested that the data basis of this meta-analysis might be skewed by publication bias. Furthermore, we applied the trim-and-fill method (Duval & Tweedie, 2000) as a third approach to assess the possible effect of publication bias. First, a funnel plot was used to visualize the distribution of the effect sizes around the mean, which is presented in Fig. 3. The analysis indicated that twenty-eight studies on the left side of the distribution were missing to make it symmetrical. Next, the overall effect size was computed again by considering the 28 additional values, and an adjusted effect size of g ¼ 0.35, p < .05 was found. The funnel plot with the imputed studies is presented in Fig. 4. Although the adjusted effect size is smaller than the observed effect, there is no evidence that the positive overall effect size found in this study was affected significantly by publication bias. In other words, the observed effect size in this meta-analysis is robust, because the adjusted effect is still significantly positive.

Conditions of learning with digital tools
The second research question focused on the conditions of learning with digital tools in mathematics or science classes, and which one is more or less favorable relative to student learning outcomes. Because the test for heterogeneity was significant, moderator analyses were conducted to find out if potential moderating variables were responsible for the significant variance among studies and which conditions are more favorable for student learning outcomes than others.
Most of the studies analyzed the effects of using digital tools supplementary to non-digital material (k ¼ 53), in mathematics classes (k ¼ 33), and in grade levels 8 to 10 (k ¼ 37). Simulations (dynamic mathematical tools and virtual reality) were most often used as type of tool (k ¼ 42). In most of the analyzed studies, students used digital tools on their own (k ¼ 29) and were supported by their teachers when learning with digital tools (k ¼ 28). All coded variables (including the number of studies k in each category) and the results of the moderator analyses are presented in Table 2. The table shows the results of using a random-effects model, including the number of students (N), the number of studies (k), the weighted mean effect size (g), standard errors (SE), 95% confidence intervals, and the tests for heterogeneity (Q B ). Studies that did not report on the respective variable were excluded from the moderator analyses.
We tested whether learning outcomes of students were greater if their teachers had received specific training before they used digital tools in their class. The variable teacher training produced statistically significant between-levels variance, Q B (1) ¼ 5.53, p < .05. Hence, the positive effect on student learning outcomes is significantly larger if teachers received specific training for using digital tools before the intervention was conducted (g ¼ 0.84, p < .05) compared to interventions without specific teacher training (g ¼ 0.56, p < .05).
Moderator analyses were conducted with all coded variables and revealed that teacher training was the only variable that significantly moderated the overall effect in the current meta-analysis when using a random-effects model. Although the betweenlevels variance was not statistically significant for the remaining variables, the following results show tendencies of more or less beneficial learning conditions and methodological study features.
Regarding the use of different types of digital tools, we expected a positive effect for tools with adaptive or feedback features. The analyses show that dynamic mathematical tools (g ¼ 1.02, p < .05) and intelligent tutoring systems (g ¼ 0.89, p < .05) produced the largest and statistically significant effect sizes. Smaller but still medium effects resulted for the use of virtual reality (g ¼ 0.63, p < .05) and the use of tutoring systems (g ¼ 0.55, p < .05). The use of drill and practice programs also produced a medium effect size (g ¼ 0.58, p < .05). The smallest effect size was found for studies that investigated the use of hypermedia learning systems (g ¼ 0.40, p < .05). Pairwise comparative analyses showed significantly greater learning outcomes for the use of dynamic mathematical tools than for hypermedia systems (p < .05) and also significantly greater outcomes for intelligent tutoring systems than for hypermedia systems (p   Note. CI ¼ confidence interval. Studies that did not report on the moderator variable were excluded from the analyses. *p < .05. < .05). Hence, students had the greatest learning gains when they used simulation programs, such as dynamic geometry software or adaptive tools, and the effects are considered large when compared to learning gains of students taught without the use of digital tools. The investigated use of digital tools was either supplementary or other existing methods have been completely replaced. The supplementary use (g ¼ 0.64, p < .05) comes along with a larger effect size than if other existing instruction methods were substituted completely (g ¼ 0.51, p < .05), however, there is no significant difference.
Digital tools were used either by students individually, by students in groups, or pairwise. The effect size is largest if digital tools were used pairwise (g ¼ 0.72, p < .05), but no significant differences were found. All effect sizes are presented in Table 2.
In most of the studies, students were supported by the teacher and/or their peers while they learned with digital tools. Learning with support by peers (g ¼ 0.63, p < .05), by teachers (g ¼ 0.61, p < .05) or by teachers and peers (g ¼ 0.54, p < .05) yielded larger effect sizes than learning without any support by peers or by teachers (g ¼ 0.37), but the differences are not significant.
Secondary school students in grades 5 to 13 were considered in the current meta-analysis. The use of digital tools shows a statistically significant positive effect across all grade levels 5 to 7 (g ¼ 0.61, p < .05), 8 to 10 (g ¼ 0.55, p < .05), and 11 to 13 (g ¼ 0.73, p < .05). Although the effect is larger for the latter than for the lower levels, there is no significant difference between them.
The following variables relate to methodological features of the studies. The first methodological feature analyzed in the current study is the duration of the intervention. The analyses show that the shortest interventions, that is, one day (there was no study with a duration of two to six days), produced the largest effect size (g ¼ 0.86, p < .05), whereas the longest interventions with use of digital tools for over six months produced the smallest effect size (g ¼ 0.47, p < .05). Effect sizes are significant across all other durations, as shown in Table 2, however, there is no significant difference among the categories.
Type of assignment of students to either the experimental or the control group is the second methodological feature analyzed in the current meta-analysis. Most of the studies used a quasi-experimental design (k ¼ 83), whereas only five studies used an experimental design with randomized assignment to the different groups. The effect size for experimental design studies (g ¼ 0.77, p < .05) was larger than for quasi-experimental design studies (g ¼ 0.65, p < .05), but was not significantly different.
Effect sizes were slightly larger if interventions in experimental group and control group were conducted by the same researcher (g ¼ 0.79, g < 0.05) than if interventions were conducted by different external researchers (g ¼ 0.70), and the latter effect is not significant. Conversely, the effect of learning with digital tools was slightly larger if interventions were conducted by different teachers (g ¼ 0.64, p < .05) than by the same teacher (g ¼ 0.60, p < .05). There was no significant difference between the characteristics of this potential moderator.
The last methodological feature analyzed in this study is the sample size. Studies with sample sizes of more than 500 (g ¼ 0.28) showed non-significant and smaller effect sizes than studies with sample sizes of 100 or fewer (g ¼ 0.72, p < .05). The between-levels variance was not significant, however, there is a significant difference between large sample sizes of more than 500 and small sample sizes of 100 or fewer (p < .05).

Overall effect on student attitudes
Additional explorative analysis of the analyzed studies focused on the benefits of using digital tools in science or mathematics classes on student attitudes toward the subject taught, in comparison to classes learning without digital tools. Sixteen of the 92 studies provided effect sizes regarding the attitude of students (N ¼ 1639). It is important to note that these 16 studies investigated attitudes in addition to student learning outcomes. The overall effect shows that the use of digital tools had a small positive and statistically significant effect on student attitudes toward the subject taught, g ¼ 0.45, p < .05. Hence, secondary school students who were taught using digital tools in science or mathematics classes had significantly more positive attitudes toward the subject taught than students who learned without the use of digital tools. The small number of studies that investigated student attitudes (k ¼ 16) did not allow for further moderator analyses.

Summary of results
This meta-analysis examined how the use of digital tools influenced student learning outcomes in mathematics and science classrooms, as well as student attitudes related to the subject. The overall effects show that the use of digital tools had a medium, significantly positive effect on student learning outcomes and a small, significantly positive effect on student attitudes. Moderator analyses were conducted for student learning outcomes. The variable teacher training moderated the overall effect significantly: Interventions that provided teacher training in the digital tool used in class produced significantly larger effects than studies that did not provide specific training. Looking at the different types of digital tools, the meta-analysis shows that dynamic mathematical tools and intelligent tutoring systems tended to yield larger effect sizes than drill and practice and were significantly more effective than hypermedia learning systems. Although the sample size did not significantly moderate the overall effect within the current study, comparative pairwise analyses showed the significantly larger effect for smaller samples than for large samples.
All further analyses yield no significant differences, however, they show that the effects were significantly positive across all analyzed subjects and grade levels, although the benefit seemed greatest for physics classes and for grade levels 11 to 13. Students tended to profit most if digital tools were used pairwise, and in addition to other instruction methods. Moreover, the effect was larger for students who received support from their teachers or peers when learning with digital tools, as compared to those who did not receive support.
The small number of studies in most subgroups and the use of a random-effects model resulted in reduced statistical power, which may partly explain non-significant effects and between-levels variance in the moderator analyses (Jackson & Turner, 2017). Apart from that, reporting effect sizes is important to interpret research findings with regard to practical relevance (Sink & Stroh, 2006) whereas "statistical significance does not necessarily imply that the result is important in practice" (Ellis & Steyn, 2003, p. 51 In general, the overall effect of learning with digital tools found in the current meta-analysis is in line with prior studies that have also found advantages in favor of learning with digital tools (Ma et al., 2014;Steenbergen-Hu & Cooper, 2014). The somewhat larger overall effect size in comparison to the prior meta-analyses like Bayraktar (2001Bayraktar ( /2002 Ma et al. (2014) with g ¼ 0.41, Steenbergen-Hu and Cooper (2014) with g ¼ 0.37, Steenbergen-Hu and Cooper (2013 with g ¼ 0.09, and Cheung and Slavin (2013) with d ¼ 0.16 could result from further development of digital tools and learning programs in recent years.

Duration of digital tool use
One reason for the relatively small impact on student learning outcomes for longer interventions could be that researchers are usually more involved in shorter interventions than in long-term use of digital tools in schools. Therefore, "the degree of implementation might have impacted the effectiveness" (Steenbergen-Hu & Cooper, 2013, p. 984). Another explanation could be the occurrence of a novelty effect (Clark & Sugrue, 1990). Among others, the fact that students were taught with the use of digital tools, and therefore with new technology and different methods than usual, could be responsible for short-term increases in motivation or interest, and as a consequence, indirectly lead to better performance. The results are in line with the findings of Bayraktar (2001Bayraktar ( /2002 and-except for interventions lasting only one week or less-with those of Sung et al. (2017). Conversely, the low overall effect found by Cheung and Slavin (2013) could be explained by the fact that they only analyzed studies with intervention durations greater than 12 weeks. The assumption of a novelty effect is not in line with the result of Ma et al. (2014), who found that interventions with durations of less than four weeks denote less-positive student learning outcomes. However, the opposite result of Ma et al. (2014) could be linked with the fact that their analysis was limited to intelligent tutoring systems, which are intended to systematically build up new knowledge, and therefore may not result in positive learning outcomes in a relatively short period of less than four weeks.

Grade level
On a descriptive level, the current study shows that the learning outcomes for students using digital tools are slightly greater for students in grade levels 11 to 13 than for students in grade levels 5 to 10. Hence, although the difference is not significant, this result is in line with the assumption of Steenbergen-Hu and Cooper (2014): They argued in their discussion that the use of intelligent tutoring systems might be more beneficial for older students who have greater self-regulation skills and computer literacy as well as greater prior knowledge and learning motivation. However, the results of Ma et al. (2014) did not comply with this assumption. Further research is needed to find answers to this open question.

Type of digital tool
With regard to different types of digital tools, the study shows the largest effect sizes for simulations, such as dynamic mathematical tools and for intelligent tutoring systems, which conforms to the results of Bayraktar (2001Bayraktar ( /2002, since he also found that simulations and tutorials were more beneficial than drill-and-practice programs. The small effect size regarding hypermedia systems found in the current study might be due to the less structured way of learning with such systems and is therefore in line with the result that learning without guidance is also less beneficial. Interactive features such as feedback, activation of relevant knowledge, and adaption of learning content to prior student knowledge, which are part of intelligent tutoring systems, could be responsible for the stronger effect of intelligent tutoring systems. This is in line with the findings of studies by Belland et al. (2017) or Van der Kleij et al. (2015) and also with the guided discovery principle within the cognitive theory of multimedia learning, which emphasizes the beneficial effect of feedback and hints provided by digital learning environments (Mayer, 2014). Simulation programs such as dynamic geometry software also show large effect sizes, and simulations like virtual laboratories show medium positive effect sizes. This result is in accordance with both the modality principle and the multimedia principle within the cognitive theory of multimedia learning, which imply that the presentation of information in terms of sounds and words as well as words and pictures lead to better learning results than solely using words (Mayer, 2014). Furthermore, the greater extent of learner control and learning through discovery and exploration might also be responsible factors for beneficial learning with simulation programs (Karich, Burns, & Maki, 2014). One reason for the smaller effect size regarding drill-and-practice programs could be that such programs do not impart new knowledge, but seek to strengthen already learned content. This could, for instance, result in redundant information, especially for learners with above-average prior knowledge, and lead to less efficient learning according to the redundancy principle of the cognitive theory of multimedia learning (Mayer, 2014). Drill-and-practice programs do not adapt to prior knowledge, which could be one reason for the smaller effect sizes and could explain why Bayraktar (2001Bayraktar ( /2002 even found negative effects for these programs.  Unlike the results of Bayraktar (2001Bayraktar ( /2002, the current study shows on a descriptive level that pairwise use of digital tools by students yields larger effect sizes than if they use media on their own. The greater learning gains when students worked together in pairs might have been a consequence of more interactive and communicative learning. Frailich et al. (2009) reported in their study that students were more cooperative and discussed difficulties more often when they used digital tools in small groups. However, the results of the current meta-analysis did not yield greater learning outcomes when digital tools were used in groups compared to individual use, unlike the results of Chen et al. (2018). Yet, the latter did not differentiate between two or more students within their analyses. One reason for our result could also be that computer-based collaborative learning can only be effective if there are specifically designed tasks for more than one student (Chen et al., 2018). Further research is needed to examine which student-to-computer-ratio works best, and especially whether this ratio is correlated with certain types of digital tools.

Teacher training
Studies that provided specific teacher training before the interventions were conducted produced significantly larger effect sizes than studies that did not provide such training. Although this result seems obvious, it emphasizes the importance of particular trainings for teachers to successfully use digital tools. Given the fact that a large number of teachers do not feel competent in the area of digital tool use (Fraillon et al., 2014, p. 207), this finding might be a consequence of lacking educational and pedagogical content during teacher training at university and should be considered in further research and teacher education (Mishra & Koehler, 2006).

Missing information in primary studies
A large number of studies was not considered in the current meta-analysis because necessary statistical data was not reported. Out of 474 potentially relevant journal articles found in the three databases, only 92 studies could finally be used for the analyses. Studies were excluded not only because they lacked statistical data, but also because of other missing information that is important for metaanalyses, and because these studies lacked sufficient quality criteria. As already stated by Ma et al. (2014), this result emphasizes the need for greater transparency in reporting standards, which is especially important in an interdisciplinary research field such as the area of technology-supported learning. Moreover, the low values of Cohen's Kappa for some coded variables show how important it is not only to use consistent reporting standards but also to use coherent definitions of equivalent theoretical concepts and to describe study features as detailed as possible.

Threat of publication bias
This study focused on peer-reviewed publications, because they ensure high quality standards. Apart from that, there is "no way of objectively retrieving (unpublished) work" (Van der Kleij et al., 2015, p. 502). Therefore, the threat of publication bias was taken into account by calculating and reporting fail-safe N, the rank correlation test, and the trim-and-fill method, which partly indicated that the exclusion of unpublished studies could have resulted in a biased overall effect of learning with digital tools. One limitation of the rank correlation method is that a significant correlation can indicate the existence of publication bias but does not say anything about its direct impact on the study results. Therefore, we also applied the trim-and-fill method, which showed that the adjusted effect size was still significantly positive. Moreover, there is empirical evidence that the threat of publication bias is frequently overestimated (Hunter & Schmidt, 2004). The fact that "most studies examine multiple hypotheses, and, hence, there are multiple significance tests … reduces (and may eliminate) the possibility of publication bias based on statistical significance, because the probability that all such tests would be significant is quite low" (Hunter & Schmidt, 2004, p. 497). Indeed, there are 31 studies within the current meta-analysis that examined more than one hypothesis, and which yielded non-significant or negative results for at least one research question.

Impact of digital tool use on student attitudes
The low number of studies that investigated student attitudes did not allow for moderator analyses to find out more about beneficial learning conditions with regard to attitudes. For the current meta-analysis, only studies that investigated student attitudes in addition to student performance were considered. The promising explorative findings on student attitudes reported in this meta-analysis, however, call for future research on favorable aspects of using digital tools with regard to student attitudes.

Heterogeneity among analyzed primary studies
The heterogeneity among studies could not be completely explained by the results of the moderator analyses. Although the analyses revealed several interesting tendencies regarding more or less favorable conditions of using digital tools, there was only one variable that significantly influenced the overall effect due to differences between content-related categories, which was teacher training. The use of a random-effects model can increase statistical power regarding the overall effect, but can also lead to power loss within moderator analyses, especially if the heterogeneity among studies is large (Jackson & Turner, 2017). Moreover, additional factors that could not be considered within the current study could have been responsible for the differently sized effectiveness between studies. Such potential variables are different types of learners, different kinds of learning content (procedural, conceptual, or declarative knowledge), or the quality of the assessment of student performance. As already stated by Van der Kleij et al. (2015), studies often do not report on the reliability of test scores. The features mentioned could not be examined within the current study because of insufficient information provided in the primary studies. Therefore, the results of the study should be interpreted in consideration of these limitations. However, this finding is important for future research because it calls for greater consideration of the potential moderator variables that are mentioned.

Characteristics of analyzed primary studies
Taking a closer look at the countries in which the analyzed studies were conducted, we have to notice that some countries are overrepresentated, such as Turkey (22 studies) or Taiwan (10 studies), whereas European countries as well as Canada are quite rare, and Latin-American countries are not represented at all. This may be linked to the fact that in some countries this specific type of comparative educational research studies is funded more frequently (see OECD, 2019b) and is more highly valued by officials in charge of education compared to other countries. This could have resulted in a biased data basis and shoud be taken into account when interpreting the findings of the present study.
We focused on quantitative primary studies with experimental research design. Numerous studies used the same experimental design, because they compared learning outcomes of one group using digital tools to those of a control group taught without digital tools, and mostly within regular school lessons. However, to investigate whether the didactic processing of the material used per se leads to the better learning outcomes of students using digital tools, it is important to involve a second experimental group using exactly the same material as the first experimental group but without computer support (see Reinhold et al., 2020). We recommend considering this aspect in further research studies.
Furthermore, it seems noteworthy that most of the studies considered within this meta-analysis were published in interdisciplinary educational and psychological journals. With the subject-specific focus of our study, we expected to integrate more studies published in subject-specific journals. The inclusion criteria of our meta-analysis-considering a very specific type of research design-might be responsible for that: Subject-specific journals often have a stronger focus on didactic aspects and comparative research designs may be not that common. It should be further examined whether other types of research designs-published in subject-specific journals-bring more insight regarding aspects that could not be considered within this meta-analysis (e.g., different types of learners, see Section 6.4.4).
Thus, the results of this meta-analysis are limited to a certain extract of the whole research area investigating the impact of learning and teaching with digital tools. In order to complete the whole picture, further research syntheses that could also consider qualitative research designs would be highly rewarding.

Practical implications
According to Hattie (2012), the effectiveness of educational interventions in school settings should be interpreted in relation to the hinge point of d ¼ 0.40, which he found to be the average effect size in his second-order meta-analysis. Slavin, Cheung, Holmes, Madden, and Chamberlain (2013) recommended considering interventions with effect sizes of at least d ¼ 0.20 as meaningful results for practical implications. Although the overall effect of learning with digital tools in the current meta-analysis is greater than the suggested hinge points of d ¼ 0.40 (Hattie, 2012, p. 3) and d ¼ 0.20 Slavin et al. (2013), the results should not lead to premature decisions in school practice or completely replace other existing teaching methods. In fact, digital tools show the largest positive effects on student learning outcomes if they are used in addition to non-digital material. Despite the potential of using digital tools in mathematics and science classes, teachers should always assess additional benefits in regard to the context they want to use it in, and learning content should still take center stage (e.g., Clark, 1994;Kozma, 1994). Nevertheless, the results of the current study can be used as an orientation guide as well as for informational purposes for teachers or developers of learning environments (see Hillmayr, Reinhold, Ziernwald, & Reiss, 2017). Politicians as well as other decision-makers should be aware of the fact that different contextual factors within a computer learning environment can have differently sized impact on student learning outcomes. Practitioners and software developers should consider time aspects of using or developing digital learning tools, the level of guidance during computer-supported learning and the respective subject content. Considering the current efforts of an evidence-based practice within the educational field, teachers can use this research review in order to reflect their own work routine and to refine the way and extent of computer usage within teaching and learning.
Because "digital technologies are ever-changing, not always predictable, and can take on many forms" (Hamilton, Rosenberg, & Akcaoglu, 2016, p. 2), continuous teacher training is important. In the same line, the current meta-analysis shows how important specific teacher training is in supporting student learning with digital tools. However, the international average of teacher participation in digital tool use training is 68% for internal school training and 39% for external school training (Fraillon et al., 2014, p. 187). For that reason, acquiring media competency during teacher training at university should be addressed more comprehensively. Furthermore, school principals are urged to offer appropriate training for teachers, and the latter are encouraged to take part proactively.
All in all, regarding the current state of published research, there is a need for future studies, and to examine in greater detail how digital tools can specifically enhance different aspects of learning mathematics and science subjects. Because the use of intelligent tutoring systems or simulation programs was more effective than drill and practice, the added value of using digital tools in these domains might be related to learning complex ideas or abstract concepts rather than solely repeating and strengthening already learned subject content.

Conclusion
Despite the heterogeneous research situation on the use of digital tools in educational settings, the current study brings insight into the potential of technology-supported teaching and learning. After a comprehensive literature research, 92 primary studies that investigated the effects of using digital tools in secondary school science and mathematics classes were analyzed. The resulting overall effect of digital tool use on student learning outcomes and attitudes toward the taught subject was significantly positive. This shows the potential of learning with digital tools, especially because students often struggle to understand mathematical (OECD, 2016b, p. 192) or scientific subject matter in schools (OECD, 2016b, p. 71). Moreover, the meta-analysis indicated the importance of teacher training before using digital tools in class, and, on a descriptive level, that complementary use of digital tools is more beneficial than replacing other instruction methods completely. Although these results are an essential contribution to the current state of research on the effectiveness of using digital tools in mathematics and science subjects, further studies are needed to gain additional insights about more or less beneficial learning conditions. In particular, future studies should directly compare certain features of interactive digital tools, such as different forms of feedback, and therefore greater consideration of the learning material used by control groups is recommended. Because analyses within the current study indicated that the data might have been biased because of missing unpublished studies, there is further need of a meta-analysis that also searches for gray literature, such as dissertations. Apart from that, future studies should be published more often, regardless whether they do or do not report statistically significant results, since this could counteract the potential problem of publication bias.
Overall, the fact that digital learning environments will be used increasingly often in educational settings is undoubted. Consequently, researchers, teacher educators, and politicians need to be constantly informed so that they can provide best conditions for maximizing the potential of learning with digital tools in school.

Funding
This research was supported by grants from the Stiftung Mercator, Essen, Germany.

Declaration of competing interest
None.