The impact of Covid-19 on student achievement: Evidence from a recent meta-analysis


 This work attempts to synthetize existing research about the impact of Covid-19 school closure on student achievement. It extends previous systematic reviews and meta-analyses by (a) using a more balanced sample in terms of country composition, (b) considering new moderators (type of data and research design), and (c) including studies on tertiary education students in addition to primary and secondary education students. Our meta-analysis findings show that the pandemic had, on average, a detrimental effect on learning. The magnitude of this learning deficit (about 0.19 standard deviations of student achievement) appears to be roughly comparable to that suffered by students who have experienced a significant disruption in their schooling due to a major natural disaster (e.g., Hurricane Katrina). Students are also found to have lost more ground in math/science than in other subjects. Additionally, one year or more after the first lockdown, students seem to have been unable to catch up on unfinished learning from the pandemic. This result suggests that more efforts should be made to ensure students recover their missed learning in order to avoid negative long-term consequences for them and society.



Introduction
The Covid-19 pandemic caused a major disruption in the schooling system around the world. In most countries, educational institutions had to close for several weeks or months in an attempt to reduce the spread of the virus (UNESCO, 2020). Students had to continue their schooling from home using different learning tools such as video conferencing, radio and TV.
However, the outbreak of Covid-19 was so sudden that there was little or no time for many schools to design and implement learning programs specifically designed to support children's learning while at home. A significant proportion of teachers were unprepared for online learning as they lacked appropriate pedagogical and digital skills (School Education Gateway, 2020). Similarly, many students also struggled to adjust to the new format of learning. In addition to problems in accessing appropriate technology (computers, reliable internet connection, etc.), not all students had a home environment free of disturbances and distractions, hence conducive to learning (Pokhrel & Chhetri, 2021). A large number of parents had serious difficulties in combining their work responsibilities (if not joblessness) with looking after and educating their children (Soland et al., 2020). Moreover, there is evidence showing that Covid-19 and the related containment measures have had a detrimental effect on children's wellbeing (Xie et al., 2020). Longer periods of social isolation might have adversely affected students' mental health (e.g., anxiety and depression) and physical activity (Vaillancourt et al., 2021). This is also likely to have contributed to negatively impact their academic performance given the close association between mental and physical health and educational outcomes (Joe et al., 2009).
While in the literature there is already a relatively large consensus that student learning suffered a setback due to Covid-19, as pointed out by several researchers (e.g., Donnelly & Patrinos, 2021;Patrinos et al., 2022), more research in this area is still needed.
Findings from new studies are important given that, as stated in a recent article published in the World Economic Forum, the full scale of the impact of the pandemic on the education of children is "only just starting to emerge" (Broom, 2022). Not only is a better understanding of the educational impact of Covid-19 needed, but special attention should be paid to investigate the legacy effects of the pandemic. As argued in several papers (e.g., Psacharopoulos et al., 2021;Hanushek & Woessmann, 2020), there is the risk that the disruption in learning caused by Covid-19 may persist over time, having long-term consequences on students' knowledge and skills as well as on their labour market prospects.
It is therefore very important to determine if and to what extent those children whose schooling was disrupted by Covid-19 subsequently got back on track and reduced their learning deficits 1 . Similarly, it is relevant to gain a more solid understanding of how the educational impact of Covid-19 varies across students and circumstances. This would help educators and policymakers identify those groups of students who may need extra support to recover from the learning deficit caused by the pandemic. This paper uses meta-analysis in an attempt to synthetize and harmonize evidence about the effect of Covid-19 school closures on student learning outcomes. Meta-analysis, which is widely employed in education as well as in other fields, combines the findings of multiple studies in order to provide a more precise estimate of the relevant effect size and explain the heterogeneity of the results that have been found in individual studies. A total of 239 separate estimates from 39 studies are considered. We extend previous systematic reviews and meta-analyses 2 in four main ways. First, compared to previous meta-analyses, this study covers a larger number of countries (i.e., 19). Not only are several new countries considered in the analysis (e.g., Slovenia, Egypt), but US and UK studies do not dominate the collected empirical evidence. For instance, while in Betthäuser et al. (2023) about 71.1% of the effect sizes are derived from these studies, in our paper the corresponding figure is approximately 32.2% 3 . This makes our results of more general relevance 4 . Second, the current meta-analysis adds to previous meta-analyses by including also studies looking at the impact of Covid-19 among tertiary education students in addition to primary and secondary education students. This is important because, as individuals progress through the education system, academic challenges increase and so does the pressure to perform well. Several studies from various countries (e.g., Bratti et al., 2004;Dabalen et al., 2001;Koda & Yuki, 2013) show that the final grade awarded to students successfully completing university is an important predictor of their labour market prospects. Third, while some relevant moderator variables have already been noted (e.g., subject, level of education, geographical area), the present meta-analysis adds several new ones including type of data and research design. The relevance of these factors in explaining the heterogeneity of results across studies is wellknown in the meta-analysis literature. For instance,  indicate that researchers conducting meta-regression analysis in economics should consider data types.
2 Previous meta-analyses include König and Frey (2022) who extracted 109 effect sizes nested in 18 studies, Storey and Zhang (2021a) who synthetized 79 effect sizes from 10 studies, and Betthäuser et al. (2023) who considered 291 effect sizes from 42 studies. The reviews by Patrinos et al. (2022), Moscoviz and Evans (2022), Donnelly and Patrinos (2021), Hammerstein et al. (2021) and Zierer (2021)  Similarly, Stanley and Jarrell (1989) suggest that variables capturing differences in methodology need to be included among moderators in meta-regression models. More in general, moderators are situational variables as well as characteristics of studies that might influence the effect estimate (Judd, 2015). Fourth, in contrast to previous similar metaanalyses (e.g., König & Frey, 2022), we look closely at the issue of the specification of the meta-regression model. As observed by Stanley and Doucouliagos (2012), this is a more relevant problem in meta-analysis than in primary econometric studies given the higher risk of exhausting degrees of freedom in the former than in the latter. Following recent literature (e.g., Di Pietro, 2022), we employ different methods to select the moderator variables to be included in the meta-regression model.
The remainder of the paper is set as follows. Section 2 describes the process of selecting studies and collecting data. It also discusses the empirical approach and the possibility of publication bias. Section 3 reports and discusses the empirical results. Section 4 concludes.

Method
To perform this meta-analysis, we followed the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) (Moher et al., 2009).

Inclusion criteria
With the purpose of this study in mind, a set of inclusion criteria was defined. They guided the selection of the studies included in this meta-analysis. Specifically, the following four inclusion criteria were used: J o u r n a l P r e -p r o o f • the study should quantitatively examine the effect of Covid-19 on student achievement in primary, secondary or tertiary education. This means that the data used in this study were collected before and during the pandemic (or only during the pandemic if, when schools were closed, some students were still receiving in-person teaching thereby simulating prepandemic conditions), therefore clearly distinguishing between a control and a treated group, respectively.
• the study should use objective indicators (e.g., test scores) to measure student achievement.
• the study should be based on real data.
• the study should report data on an effect size (or sufficient information to compute it) and its standard error (or t-statistic, p-value, or sufficient information to calculate it).

Search ad screening procedures
To identify the relevant studies, we searched in six different electronic databases (i.e., Google Scholar 5 , EconLit, ScienceDirect, Education Resources Information Center, JSTOR and Emerald). The following keywords were used: "Covid-19 (OR coronavirus OR pandemic OR Cov) AND student (OR academic OR scholastic) performance (OR achievement OR learning OR outcome) OR test score". This search, which ended on 15 th July 2022, delivered 6,075 hits. 717 duplicates were removed. We kept updated or published versions of any working paper we found. Next, the titles and the abstracts of the remaining 5,358 records were assessed. Following this, 5,205 studies were excluded as they use qualitative approaches (e.g., interviews), report 5 For Google Scholar, in line with the approach of Romanelli et al. (2021), only the first 100 relevant references at each search were retrieved, as results beyond the first 100 entries were largely irrelevant given the purpose of this study.
J o u r n a l P r e -p r o o f teachers'/parents' views about the educational impact of Covid-19 (e.g., Kim et al., 2022;Lupas et al., 2021), or provide a theoretical discussion about how the pandemic is likely to affect education (e.g., Di Pietro et al., 2020). Similarly, studies containing predictions and/or projections were also removed (e.g., Kuhfeld et al., 2020a). After this initial screening, the content of the remaining 153 studies was carefully examined, and only those fulfilling all the inclusion criteria were considered. In this phase, we excluded studies that, although attempting to understand how the pandemic impacted student learning, employ a different outcome measure (e.g., dropout rate) than the one considered in this meta-analysis (e.g., Tsolou et al., 2021). In the same vein, we removed studies using student self-reported outcome measures as well as those examining the educational impact of Covid-19 on specific subgroups of students (e.g., Agostinelli et al., 2022). Finally, in order to ensure that key sources were not missed, we also screened the references included in previous meta-analyses and systematic reviews. Two more relevant articles were identified through this search. A total of 39 studies was included in this study. Figure 1 summarizes the literature search and the screening procedure.
While all the titles and abstracts were screened only by the author, the next stages of the study selection process were carried out by the author and by another researcher who independently classified the studies as relevant and irrelevant based on the predefined inclusion criteria. While the inter-rater agreement was very high (i.e., 97%), studies on which there was disagreement were discussed in depth until consensus was reached.
Insert Figure 1 about here J o u r n a l P r e -p r o o f

Study coding
All the studies included in this meta-analysis were read in-depth, and relevant information and findings were extracted. Study coding was performed following the same procedure used for the final stages of the study selection process. The inter-rater agreement was again high (i.e., 93%).
In line with the current best practice in meta-analysis (Polák, 2019), we use all relevant estimates included in the selected studies. As argued by Cheung (2019), not doing so results in missed opportunities to take advantage of all the available data to answer the research question/s under investigation. However, a fundamental issue with this approach lies in the dependence between multiple estimates from the same study given that effect sizes are assumed to be independent in meta-analysis (Cheung & Vijayakumar, 2016). As discussed later in the paper, several methods are used to account for within-study dependence.

Effect size calculation
In order to be able to aggregate the various impact estimates reported in the selected studies, one needs to convert them into a common metric. Consistent with previous relevant systematic reviews and meta-analyses, we use the Cohen's d as a scale-free effect size measure. Cohen's d refers to standardised mean differences and is calculated by dividing the mean difference in student performance between pre-Covid and Covid periods by the pooled standard deviation. While in some cases the Cohen's d was retrieved from the studies, in others it was calculated using information directly available from them. Where the latter was not possible, the studies' author/s was/were contacted to obtain the relevant data. If not reported, the Cohen's d standard error was computed using the formula given in Cooper and Hedges (1994). In case no information on sample sizes were available from the studies but exact p-values were instead reported, the formula provided by Higgins and Green (2011) was J o u r n a l P r e -p r o o f employed to obtain standard errors. In some instances, we also used information on effect sizes contained in the electronic supplement of the meta-analysis article by König and Frey (2022). For instance, this was the case when a study does not report Cohen's d but this information has been already collected by König and Frey who have contacted the relevant author/s.

Moderator variables
For each effect size, we code several moderator variables, that is, factors potentially influencing the size of the effect of Covid-19 on student achievement. These moderator variables can be divided into two categories: 1) context and 2) characteristics. Regarding the former, we consider: a) The level of education. Several arguments suggest that remote schooling is more challenging for younger students compared to their older counterparts. To start with, younger learners are less likely to have access, and be able to independently use digital devices. They may be unable to sign into an online class without assistance, may need help or supervision to perform an online task, and may more easily get distracted. Parental engagement therefore plays a crucial role in the success of younger pupils in an online learning environment.
However, even though critical, the supervision required for online schooling while younger children are at home may turn out to be unsustainable for many parents who are at the same time engaged with remote working (Lucas et al., 2020). There is also evidence showing that younger students are less likely to have a quiet space to work at home than their older peers.
For instance, Andrew et al. (2020) found that in the UK during the first Covid-19 lockdown while the proportion of primary school students reporting not to have a designated space to study at home was about 20%, the corresponding figure for secondary school students was approximately 10%. Furthermore, children in early grades may especially miss in person J o u r n a l P r e -p r o o f teaching as they depend on situational learning (Storey & Zhang, 2021b). A great emphasis is placed on relationships and interactions with others in order to acquire knowledge. Younger learners are also more likely to need movement and exploration, and these are things that one cannot do while sitting at home and looking at a screen (Hinton, 2020). Finally, some studies (Domínguez-Álvarez et al., 2020;Gómez-Becerra et al., 2020) showed that during Covid-19 younger children present more emotional problems than older children. Tomasik et al. (2020) argued that the former group are more likely to have difficulties in coping with socioemotional stressors associated with the pandemic. Perhaps also as a result of this, there was greater attention to pastoral care than curriculum coverage among primary school students, as opposed to secondary school students (Julius & Sims, 2020).
In an attempt to investigate how the educational impact of the pandemic varies across student age groups, we distinguish between primary, secondary, and tertiary education students. b) Subject. It is often claimed that the effect of the pandemic on student achievement varies depending on the subject being assessed. Specifically, three main arguments have been advanced to suggest that the pandemic has made students lose more ground in math than in other subjects.
First, while the Covid-19 lockdown has called for increased parental involvement in their children's learning, parents often feel they have difficulties in assisting their children in math. Panaoura (2020) looked at parents' perception of how they have helped their children in math learning during the pandemic in Cyprus. She found that parents' lack of confidence or their low self-efficacy beliefs were enhanced during this period. More teachers' guidance and training would have been needed. Using data on Chinese primary school students during Covid-19, Wang et al. (2022) concluded that parental involvement had a positive impact on J o u r n a l P r e -p r o o f children's achievement in Chinese and English, but not in math. While parents are likely to be knowledgeable about the learning content of Chinese and English lessons, this may not be the case for math lessons. In daily life, language practice is more used than math practice. Furthermore, parents may be familiar with math methods different from the ones used by teachers (Shanley, 2016).
Second, teaching math in a fully online context is very challenging. Using data from a survey addressed to math lecturers between May and June 2020, Ní Fhloinn and Fitzmaurice (2021) found that most of the respondents agreed that it is harder to teach math remotely. This is partly due to the idiosyncratic nature of this discipline. It is especially difficult for math instructors to adapt their teaching style to online learning conditions. While many of them used to handwrite the material in real time during their lectures, only a small proportion have the technology to continue doing so online. On the other hand, also students may have problems in communicating math online. Not only do students need to learn and accustom themselves to use technology in order to write mathematical symbols, but this is not always possible in online platforms such as chats (Mullen et al., 2021). Online engagement in math is particularly difficult. Involving students in online discussions around an exact science like math may turn out to be very challenging.
Third, the economic and health problems caused by Covid-19 coupled with the sudden shift to online learning are likely to have increased math anxiety among students. This can be defined as a negative emotional reaction that interferes with the solving of math problems (Blazer, 2011). Math anxiety prevents students from learning math because it leads to low self-esteem, frustration, and anger (Fenneman & Sherman, 1976). Mamolo (2022) found that the students' math motivation and self-efficacy decreased during the pandemic.
Similarly, Mendoza et al. (2021) and Arnal-Palacián et al. (2022) provided evidence about J o u r n a l P r e -p r o o f higher levels of math anxiety experienced by university and primary school students, respectively, during Covid-19.
In light of the above, subjects have been grouped into three different broad categories: math/science, humanities, and a mix category. c) Timing of student assessment during Covid-19. As stated earlier, an important question is the extent to which the pandemic has long-lasting effects on learning outcomes.
Several arguments suggest that the negative effect of Covid-19 on student achievement may decline as we move to a later stage of the pandemic. To start with, a number of provisions are likely to have been taken in order to help students catch up after the first lockdown and following the re-opening of schools (at least temporarily). An UNESCO, UNICEF, World Bank and OECD report (2021) showed that in the third quarter of 2020 many countries around the world were planning to adopt support programs with the aim of reducing the learning deficit suffered by students earlier in the year. These programs include increased inperson class time, remedial programs, and accelerate learning schemes. Additionally, one would expect students and their parents to have become more used to remote learning during successive school closures and periods of online classes. Finally, many teachers and schools have probably learned important lessons from the first lockdown. These lessons might have helped them design and implement more effective remote learning measures in the subsequent phases of the pandemic.
However, despite the aforementioned considerations, it is possible that it will take some time before students are able to recover from the learning deficit caused by Covid-19.
Students may experience problems in re-engaging with education activities following the reopening of schools. There is evidence showing that, after several months of remote schooling, students have become more passive ad feel disengaged from their learning (Toth, 2021). The J o u r n a l P r e -p r o o f stress and anxiety stemming from the pandemic are likely to have caused a fall in student motivation and morale. The uncertainty of the learning environment under Covid-19 could have also contributed to reduce students' educational aspirations (OECD, 2020).
Additionally, during the academic year 2021-2022, as a result of successive waves and different variants of Covid-19, schools had to face several problems including significant staff shortages, high rates of absenteeism and sickness, and rolling school closures (Kuhfeld & Lewis, 2022). Evidence from the US shows that the pandemic has aggravated the problem of teacher shortage (Schmitt & deCourcy, 2022). Following school re-opening, teachers faced new requirements (e.g., hybrid teaching, more administrative tasks) that added to their already full workloads prior to Covid-19 (Pressley, 2022). This increased their stress levels, which made them more likely to leave their job. While many teachers have quit their job during the pandemic, this reduction in staff has not been fully offset by new hires.
In an attempt to look at how the educational impact of Covid-19 changes over time, we distinguish whether the student learning outcome was assessed in 2020 or 2021.
d) The geographical area where the study takes place. We make a distinction between Europe (i.e., Belgium, Czech Republic, Denmark, Germany, Italy, Netherlands, Norway, Spain, Sweden, Slovenia, Switzerland and the UK) and non-Europe (i.e., Australia, Brazil, China, Egypt, Mexico, South Africa and the US).
Coming to 2) characteristics, we code: e) the type of data. We distinguish between cross-sectional and longitudinal data. As noted by Werner and Woessman (2021), cross-sectional data do not allow to separate the Covid-19 effect from the cohort effects. Using this type of data, the performance of a cohort of students who have been affected by Covid-19 school closures is typically compared to the performance of a previous cohort of students who took the same test in a pre-Covid-19 period. However, this approach does not take into account the possibility that other factors influencing student achievement (e.g., change in education policies) might have changed coincidentally at the same time as Covid-19. Student-level longitudinal (panel) data help to address the cohort effects bias. They allow to look at changes in student performance before and after the lockdown and compare them with the progress made by similar students over the same period of previous years. f) the type of research design. A number of different methodologies have been used in an attempt to identify the effect of Covid-19 school closures on academic achievement. In this study, we code the type of research design into the following three categories: descriptive, correlational, and quasi experimental/experimental (Locke et al., 2010). Studies using a descriptive research design (e.g.,  provide information about the average gap in test scores between the Covid-19 and non-Covid-19 cohorts without accounting for differences between these two cohorts (for example in terms of individual characteristics such as gender and socio-economic background) that could affect academic performances 6 . On the other hand, studies employing a correlational research design (e.g., Ludewig et al., 2022) attempt to isolate the effect of Covid-19 from that associated with other factors that could influence student achievement, but their results cannot be given a causal interpretation. Finally, studies using a quasi-experimental or experimental design (e.g., 6 These studies typically report in a table the mean test scores of the Covid-19 and non-Covid-19 cohorts, together with their corresponding standard deviations and information about the respective sample sizes of the two cohorts. Mean test scores ( 1 , 2 ) and their standard deviations ( 1 , 2 ) can be used to compute the Cohen's d (i.e., ). Next, Cohen's d standard error can be computed using the formula given in Cooper and Hedges (1994) where information about the sample sizes of the two cohorts and the estimated Cohen's d are used.

J o u r n a l P r e -p r o o f
Engzell et al., 2021) move closer to a causal interpretation of the relationship between Covid-19 and student performance.
g) the publication year. This study characteristic is a typical moderator variable in meta-analyses. It controls for time-trend effects (Schütt, 2021). In line with the approach followed by several recent meta-analyses (see, for instance, Di Pietro, 2022), we consider the year of the first appearance of a draft of the study in Google Scholar. This measure is preferred to publication year on the ground that journals significantly differ with respect to the time between online availability date of an article and the date when the article is given a volume and issue number 7 (Al & Soydal, 2017). Additionally, in our dataset, there are two journal articles that are only available online and it is unclear in which issue of the journal they will be published. The publication years considered are: 2020, 2021, and 2022.
h) the type of publication. This moderator variable is considered in an attempt to control for the quality of the studies included in our sample. We distinguish between journal articles and other publication formats. Articles published in journals are expected to be of higher scientific rigour since they are more likely to have gone through a review process.
Additionally, non-journal articles are more likely to contain typos in their regression tables (Cazachevici et al., 2020).  is the year when they are assigned a volume and issue number) rather than the year of the first appearance of a draft of the study in Google Scholar. 9 All the extracted effect sizes and their standard errors can be found in the supplementary Appendix.

J o u r n a l P r e -p r o o f
Insert Table 1 about here   Table 2 shows the descriptive statistics of the moderator variables used in the metaregressions. While Column 1 displays simple averages (and standard deviations), Column 2 reports averages (and standard deviations) weighted by the inverse of the number of estimates reported in each study. Column 3 reports the number of effect sizes for each moderator variable.
Insert Table 2  Each study was independently evaluated by the author and another researcher, and any disagreements were resolved through discussion to reach a consensus. Studies were scored on six different domains: confounding, participant selection, classification of interventions, missing data, measurement of outcomes, and reporting bias 11 . Table 3 shows the risk of bias ratings for each domain (as well as an overall judgement) for the 38 studies. The lack of appropriate methods to control for confounders, sample selection problems and missing data appear to be the most common sources of 10 One of the studies included in our sample (i.e., Kofoed et al., 2021) does use a randomised design.
11 Following Betthäuser et al. (2023), the domain "deviation from intended interventions" was not considered.
As noted by Hammerstein et al. (2021) (2022) relies on a sample where schools participating in the 2021 survey have a more advantaged student population in terms of neighbourhood of residence and mother's education, and have a smaller fraction of students that are considered to be slow learners.
Similarly, in the longitudinal data used by Ardington et al. (2021) attrition is significantly higher for the Covid-19 group and attrition is associated with poorer pre-pandemic reading proficiency levels. In Kuhfeld et al. (2022), between fall 2019 and fall 2021, the number of students testing in a grade dropped significantly more in high-poverty schools compared to their low-poverty counterparts. In other studies, which use non-representative samples including convenience samples (e.g., , the direction of the bias is unclear. One exception is the paper by Meeter (2021). In his sample the proportion of schools with a more disadvantaged student population appears to be slightly oversampled compared to all schools in the Netherlands, thus potentially biasing upwards the estimated impact of the pandemic on educational achievement. Finally, the question of how the use of nonappropriate methods to control for confounders might affect the estimated relationship between Covid-19 and student performance is addressed later when we discuss the results from the meta-regression analysis. As stated earlier, type of research design is one of our moderator variables.
Insert Table 3  In an attempt to investigate factors driving heterogeneity among effect sizes, a metaregression model is estimated: where denotes the estimated Cohen's d effect size, is a vector of moderator variables, and is the meta-regression disturbance term. The subscript i stands for the number of effect sizes included in the sample and the subscript n represents the number of moderator variables. In order to deal with the issue of heteroskedasticity in meta-regression analysis, we use Weighted Least Squares (WLS) with weights equal to the inverse of each estimate's standard error. This method is considered to be superior to widely employed RE estimators (Stanley & Doucouliagos, 2013).

Publication bias
Publication bias has long been identified as a major problem in meta-analysis (Dwan et al., 2008). Such an issue occurs because editors and scholars tend to prefer publishing papers with statistically significant or non-controversial results. This may lead to distorted conclusions as published findings may end up overstating the true effect. Evidence of publication bias has been found in meta-analyses covering different fields (see, for instance, Begg and Berlin (1988) in the case of medical studies).

J o u r n a l P r e -p r o o f
In line with previous studies (e.g., Di Pietro, 2022), we use the Doi plot to graphically evaluate publication bias. Not only does the Doi plot enhance visualization of the asymmetry (in absence of publication bias there is no asymmetry), but it also allows for measuring the asymmetry through the Luis-Furuya-Kanamori (LFK) index. LFK index values within ±1 suggest no asymmetry, LFK index values exceeding ±1 but within ±2 indicate minor asymmetry, while LFK index values exceeding ±2 denote major asymmetry (Furuya-Kanamori et al., 2018). As shown in Figure 2, the Doi plot shows no asymmetry (LFK index=0), indicating that no publication bias is detected.

Insert Figure 2 about here
To further examine the risk of publication bias, we employ the Egger's test (Egger et al., 1997) where the effect size is regressed against its precision (indexed by its standard error). Results indicate that we can safely accept the null hypothesis of no publication bias (p-value=0.380).
Our findings are consistent with those in previous relevant meta-analyses. König and Frey (2022) as well as Betthäuser et al. (2023) conclude that the presence of publication bias is unlikely.

Results and discussion
This Section is divided into three parts: first, we estimate a summary effect size (Section 3.1.); second, we investigate potential sources of heterogeneity (Section 3.2.); and third we provide a discussion of the main results (Section 3.3.).
J o u r n a l P r e -p r o o f Next, we compute the I 2 statistic to assess the heterogeneity of the results across studies (Higgins et al., 2003). The appropriateness of the RE model is confirmed as I 2 has a value of 100% 13 . This suggests that all the variability in the effect-size estimates is due to heterogeneity as opposed to sampling error. Additionally, we also look at 2 (between-study variance) 14 , which denotes the variability in the underlying true effects. Its large value of 1.74 further corroborates the hypothesis of substantial heterogeneity of the effect sizes (Takase & Yoshida, 2021).
One should observe that our findings from the RVE analysis are broadly consistent with those from previous meta-analyses. Storey and Zhang (2021a) concluded that due to 12 The robumeta command in Stata is employed. An intercept-only model is run where the estimate of the meta regression constant is equal to the unconditional mean effect size across studies. With this command, it is possible to specify a value for rho, the expected correlation among dependent effects. Following Tanner-Smith and Tipton (2013), we use different values of rho ranging from 0 to 1 in intervals of 0.1 in an attempt to check the consistency of results. All models yield the same outcome regardless of the specified value of rho.
13 A value of I 2 greater than 75% is considered large heterogeneity (Higgins et al., 2003).  (2021) found average delays of 0.13 standard deviations, Zierer (2021) estimated average losses at 0.14 standard deviations, and Hammerstein et al. (2021) reported average deficits of 0.10 standard deviations. 16 They found that the loss of one third of a school year of learning is equivalent to approximately 11% of a standard deviation of lost test results. This finding is broadly consistent with that obtained by Hill et al. (2008) who conclude that a value of Cohen's d of 0.4 (with a margin of error of ± 0.06) corresponds to the average annual reading achievement gains in fourth grade.

J o u r n a l P r e -p r o o f
likely to have had access to a computer, an internet connection, and a space conducive to learning (Di Pietro et al., 2020;Blaskó et al., 2022). Moreover, as argued by Ariyo et al. (2022), one would expect children of less educated parents to have received less parental support while learning at home than children of more educated parents. Greenlee and Reid (2020) provide evidence on this, showing that in Canada during the pandemic the frequency of children's participation in academic activities increased with parental educational levels. Table 4 shows the results of regressing our standardised measure of student achievement against the moderator variables described above. Column (1) of Table 4 presents estimates from a regression where all potential explanatory variables are included. However, including all 13 variables (in addition to the constant term) in the regression may inflate standard errors and lead to inefficient estimates given that some of the variables may turn out to be redundant. Therefore, the "general-to-specific" approach is employed in an attempt to identify the influential factors. Following this strategy, as shown in Column (2) of Table 4, 6 independent variables (in addition to the constant term) are included in the model. To account for the potential dependence of multiple estimates reported by a given study, in Column (3) of is on average found to be 0.17 standard deviations smaller than in humanities/subject mix.

Heterogeneity
Our findings indicate also that the negative effect of Covid-19 on student achievement J o u r n a l P r e -p r o o f appears to be more pronounced when using experimental/quasi experimental techniques than when using descriptive or correlational research designs. Additionally, studies employing cross-sectional data as well as those focusing on non-European countries tend to suggest greater learning deficits.
Insert Table 4

about here
As a robustness test, the model depicted in Column (3) of Table 4 is re-estimated but this time each effect size is weighted by its inverse variance. As shown in Column (4) of Table 4, with the exception of the estimate on longitudinal data, the sign and the magnitude of the other coefficients are broadly in line with those depicted in Column (3).
Next, the BMA approach is employed as an alternative to address the problem of uncertainty in the specification of the meta-regression model 17 . In BMA, following the rule of thumb proposed by Kass and Raftery (1995), the significance of each explanatory factor is considered not to be weak if the PIP is larger than 0.5. The results, which are reported in Table 5, show that all the variables that are consistently identified by the BMA methodology as relevant (i.e., Math/Science, Europe and Journal article) are also included in the specification whose estimates are reported in Columns (2), (3) and (4)  Insert Table 5 about here 17 We treat all moderator variables as auxiliary covariates while the constant is treated as a focus regressor. Each effect size is weighted by its inverse standard error.

Discussion of the main results
Our meta-analysis delivers six main results.
First, we find that, on average, the pandemic depressed student achievement by around 0.19 standard deviations. While this result is in line with the conclusions of earlier meta-analyses and systematic reviews, it should be taken into account that we use a more balanced sample in terms of country composition. This would suggest that our finding is more generalizable than that of previous studies.  (2020) found that in Norway, during the peak of the Covid-19 lockdown period, the proportion of parents/carers who reported having gained more information about their children's learning was higher in lower grades than in higher grades. Besides learners' age considerations, one should also observe that the shift towards online learning could have had J o u r n a l P r e -p r o o f a detrimental impact on the knowledge and skills of those students, mainly at secondary and tertiary levels, whose curriculum includes experiential learning experiences (e.g., field trips, hands-on activities) that cannot take place virtually (Tang, 2022). However, at the same time, given that our analysis was not conducted at grade level, one cannot rule out the possibility that the pandemic has disproportionately affected the achievement of very young pupils (e.g., grade 1). In other words, there could be heterogeneity within primary school children.
Fourth, our results indicate that in 2021 students were not able to recover from the learning deficits caused by Covid-19 school closures in 2020. There is no statistically significant difference in student performance between assessments that have taken place several months or more than one year after the outbreak of the coronavirus and those that have occurred in the early stages of the pandemic. A similar finding has been obtained by Betthäuser et al. (2023). It is important to note that, if not addressed, the learning deficits suffered by students may result in significant long-term consequences. Without remedial education upon school re-opening, not only may students who have been disproportionately affected by the pandemic continue to fall behind, but their learning achievements may also suffer a further setback as time goes on (Angrist et al., 2021). Kaffenberger (2021) estimates that if learning in grade 3 is reduced by one-third, the equivalent of about a three-month school closure, learning levels in grade 10 would be a full year lower. Özdemir et al. (2022) forecast that the pandemic could erase decades-long gains in adult skills for affected cohorts unless interventions to alleviate learning deficits are quickly implemented. Additionally, several papers show that there is a relationship between test scores and labour market performance. For instance, Chetty et al. (2014) find that raising student achievement by 0.2 standard deviations is expected, on average, to increase annual lifetime earnings by 2.6%.

J o u r n a l P r e -p r o o f
Fifth, the extent of the learning deficit seems to be smaller among students in Europe relative to their peers in the rest of the world. Although the reasons behind such a result are unclear, this might be due to several factors. First, one should note that the European countries considered in this study have, on average, a higher gross domestic product per capita than most of the non-European countries included in the analysis (this is not true for the US and Australia). As suggested by Donnelly and Patrinos (2020) (Marinoni et al., 2020).
Sixth, our findings seem to suggest that studies using non-causal methods tend to underestimate the negative effect exerted by Covid-19 on student performance. The study by Betthäuser et al. (2023) also hints at the same conclusion, but their meta-analysis does not provide any evidence on this. As pointed out by Engzell et al. (2021), non-causal methods fail to account for trends in student progress prior to the outbreak of Covid-19 and, hence, by assuming a counterfactual where achievement has stayed flat, they generate estimates of learning deficits that are biased downwards. The underestimation of pandemic-related learning delays may have important policy implications as it could result in under-provision of remedial support to students who are falling behind due to Covid-19.

Conclusions
We have assembled and studied a new sample of estimates about the impact of Covid-19 on student achievement. The sample includes 239 estimates from 39 studies covering 19 countries. One of the key findings emerging from our study is that the detrimental effects of Covid-19 school closure on student learning appear to be long-lasting. This calls for more efforts to help students recover from missed learning during the pandemic. As initiatives and programs aimed at learning recovery can be quite costly, several researchers (e.g., Patrinos, 2022) stress the importance of protecting the education budget whilst considering the competing financial needs of other sectors such as, for instance, health and social welfare (UNESCO, 2020). Therefore, given the current policy climate where public resources are in high demand by various sectors, it is more important than ever to identify and adopt costeffective measures.
While there seems to be a relatively large consensus in the literature that small group tutoring programs are a cost-effective way to mitigate the learning deficits caused by the pandemic (see, for instance, Burgess, 2020;Gortazar et al., 2022), less attention has been paid to a number of time-and cost-effective pedagogical practices (Carrasco et al., 2021).
Promoting the development of metacognition skills is, for instance, a powerful way to enhance student learning and performance (Stanton et al., 2021). Metacognition allows students to think about their own learning, and this may increase their self-confidence and J o u r n a l P r e -p r o o f motivation. Similarly, increased collaboration and dialogue between students can support learning. Peers may help students clarify study materials and develop critical thinking.
Overall, a better understanding is needed about the different types of educational interventions available and their cost-effectiveness. It would be desirable if governments at national, regional and local levels could exchange their experiences in this field and learn from each other.

Funding details
This work has not been supported by any grants.

Data availability statement
The data and the Stata code used to produce the empirical results reported in this article are available from the corresponding author.

Disclosure statement
No potential conflict of interest was reported by the author.

Appendix A. Supplementary data
Supplementary data to this article can be found at     J o u r n a l P r e -p r o o f   (3) and (4). In square brackets we report score wild cluster bootstrap p-values (Kline & Santos, 2012) generated using boottest command in Stata with 999 replications (Roodman, 2016). In Columns (1), (2), and (3) the regressions are estimated by weighted least squares where each effect size estimate is weighted by its inverse standard error. In Column (4), the regression is estimated by weighted least squares where each effect size estimate is weighted by its inverse variance. *, **, and *** denote statistical significance at 10, 5, and 1%, respectively.