Educational Systems and Gender Differences in Reading: A Comparative Multilevel Analysis

Girls have a substantial advantage over boys in terms of reading performance throughout all OECD countries. This paper investigates whether the structure of a country's educational system is related to this gender inequality in reading performance. We assess whether standardization of educational curricula and the age at which students are selected into educational tracks affect boys’ and girls’ reading performance differently. To test our hypotheses, we employ data from all six Programme for International Student Achievement waves enriched with contextual information on countries’ educational systems (N = 1,425,356). Results show that in country-years with more standardized curricula overall reading performance is lower and the association between standardization and reading performance is more negative for boys than for girls. In counties with educational systems in which students are selected into educational tracks at later ages, gender differences in reading are larger because girls benefit more from late selection. These results indicate that educational policies at the country level are related not only to the reading performance of all students, but also to the underperformance of boys in reading.


Introduction
Today, women obtain considerably more education than men in the vast majority of industrialized countries (DiPrete and Buchmann, 2013; Organisation for Economic Cooperation and Development (hereafter OECD), 2015; Van Hek, Kraaykamp, Wolbers, 2016). Educational researchers and policymakers therefore have become increasingly interested in understanding the lagging educational performance of boys (Buchmann, DiPrete and McDaniel, 2008;Garner, 2014; Hek, Kraaykamp and Pelzer, 2018). Insight into the factors related to boys' poorer reading performance may help to understand boys' lower relative educational performance more generally, as reading is a fundamental skill for achieving educational success (Cheung and Andersen, 2003;Kraaykamp and Notten, 2016). After all, 'reading proficiency is the foundation upon which all other learning is built; when boys don't read well, their performance in other school subjects suffers too' (OECD 2015: p. 13).
Recent studies have found a sizable reading score gap favouring girls throughout all OECD countries (Stoet and Geary, 2013;OECD 2015). Figure 1 depicts this gender inequality in reading scores for the 37 countries we include in our analyses (average scores across 2000-2015 Programme for International Student Achievement [PISA] waves), and it illustrates two important points. First, girls have higher average reading scores than boys in every country and, second, there is substantial variation across countries in the size of this gender gap. As Penner (2008: p. 140) stated, 'there is no reason to believe that genetic factors involved in determining gender will vary across countries'. Instead, the cross-national variation in gender inequality in reading scores shown in Figure 1 is likely attributable to social or institutional factors that differ between countries.
Previous research has focused primarily on countrylevel measures related to gender inequality, such as female labour force participation or the prevalence of gender egalitarian attitudes, to explain cross-national variation in gender differences in educational performance (Penner, 2008;Else-Quest, Hyde and Linn, 2010;McDaniel, 2010;Stoet and Geary, 2013), but the results of these studies are inconclusive and contradictory. In contrast to much prior research, the current study focuses on differences in national educational systems as a possible explanation for the cross-national variation in reading scores between girls and boys (Ayalon and Livneh, 2013). The structure of educational systems is strongly related to students' overall educational outcomes, but also to inequality therein (Hanushek and Wö ssmann, 2006;van de Werfhorst and Mijs, 2010). More specifically, educational systems structure students' educational careers and their entry into the labour market (Kerckhoff, 2001), and by doing so, educational systems produce unequal opportunities for certain types of students (Hanushek and Wö ssmann, 2006;Montt, 2011;Bol et al., 2014). Following prior work, van de Werfhorst and Mijs (2010) empirically classified educational systems along the features of standardization and differentiation. Although previous studies often have linked aspects of standardization and differentiation to educational inequality related to student's family background, very few studies have investigated whether these structural features of educational systems are related to gender differences in educational performance across countries (Ayalon and Livneh, 2013;Scheeren, van de Werfhorst and Bol, 2018). Yet a decade ago, in a review article on gender inequality in education, Buchmann, DiPrete and McDaniel (2008) called for research on how structures, institutions, and practices of education affect gender inequality in educational outcomes. We heed this call and ask: to what extent are the levels of standardization and differentiation of a country's educational system related to girls' and boys' reading performance?
We improve upon prior research in several ways. Our focus on reading performance enables us to investigate a domain with large and direct consequences for students' educational careers. We examine how the structure of a country's educational system relates to the reading performance of students generally, as well as to the lower performance of boys, which is a growing concern in industrialized societies. More specifically, we investigate whether the degree of standardization and differentiation of a country's educational system is related to girls' and boys' reading performance. To do so, we pool all six waves of the PISA, conducted by OECD every 3 years between 2000 and 2015 and employ advanced four-level regression models to analyse information on 1,425,356 students from 59,001 schools in 37 countries. While our primary focus is on reading performance, we conduct parallel analyses for math performance and discuss these findings to provide broader and more robust insights into the relationships between dimensions of standardization and differentiation of educational systems and student performance.

Features of Educational Systems: Standardization and Differentiation
Prior studies commonly distinguished between two features of educational systems: standardization and differentiation (Buchmann and Park, 2009;van de Werfhorst and Mijs, 2010;Montt, 2011). First, educational systems vary in their level of standardization (Bol and van de Werfhorst, 2013), which can be defined as either standardization of output, or standardization of input. Standardization of output requires that all students possess a similar level of knowledge by the end of an educational programme which is usually measured by central examinations. Standardization of input exists when governments control the organization of education by setting regulations for school policies and practices, for example, by prescribing the curricula schools should offer, such that individual schools and teachers have little room to deviate from these regulations (Montt, 2011). Herein, we examine level of standardization of educational curricula because we believe this aspect best captures processes of student learning. It specifically refers to the degree to which school leaders and teachers have the freedom to modify course offerings, course content, and textbooks; a similar definition was used by Montt (2011). As noted by Stevenson and Baker (1991), in the absence of state control over school curricula, teachers tend to modify learning processes according to the needs of their students', thereby forging a link between student characteristics and their exposure to knowledge. As we explain below, teachers' inability to modify learning processes in highly standardized educational systems may be particularly relevant to boys' reading performance.
The level of differentiation within an educational system is indicated by the age at which students are selected for different educational tracks in secondary education in a country (Bol and Van de Werfhorst, 2013). This usually refers to the age at which students finish primary or lower secondary education and are allocated to specific educational tracks in (higher) secondary education, on the basis of their prior performance and/or teacher evaluations (Buchmann and Park, 2009). Such tracks in secondary education prepare students for postsecondary pathways including employment, vocational training, or university. Generally, tracks that prepare students for university enrolment are more prestigious and demanding than tracks that lead to lower levels of post-secondary education or direct entry into the labour market (Kerckhoff, 2001;van de Werfhorst and Mijs, 2010). Different educational tracks are often located in different secondary schools (or school buildings) and mobility between tracks is often difficult and thus rare.
In highly differentiated educational systems students are selected into different tracks and/or schools at an early age. Conversely, in undifferentiated systems there is no or late tracking; secondary school students with different ability levels then are situated within the same school and experience rather similar educational trajectories; not until tertiary education students attend distinct, more or less demanding, educational programmes. 1

Standardization of Educational Curricula and Gender Differences in Reading Performance
Standardization of educational curricula is an important feature of a country's educational system, as general regulations implemented by central or regional governments serve to constrain schools' and teachers' freedom in choosing course offering, course content and textbooks (Bol and Van de Werfhorst, 2013). Typically, these rules are implemented to exercise some quality control over learning processes in secondary schools. Research on the level of control that schools and teachers have over their educational curriculum is limited, however, as prior studies tended to focus more on school autonomy in terms of finances and teacher selection (Fuchs and Wö ssmann, 2004). It therefore is an open question as to how the degree of standardization in an educational system is related to reading achievement for boys and girls.
Generally, it is expected that standardization of learning environments reduces social inequalities (van de Werfhorst and Mijs, 2010;Bol and Van de Werfhorst, 2013). Previous research has found that the differences in performance scores between students from lower and higher social class backgrounds are smaller in more standardized educational systems (Bol et al., 2014;van de Werfhorst and Mijs, 2010). This could be due to the standardization of educational input between schools with lower and higher socio-economic student compositions (Montt, 2011). Looking specifically at standardization of educational curricula, Montt (2011) found no association with socio-economic achievement inequality. There are however good reasons to expect that far-reaching standardization of educational curricula and textbooks has different consequences for gender inequality in reading performance. In the only study linking standardization to gender gaps in education to date, Ayalon and Livneh (2013) showed that boys' math performance is harmed more than girls' math performance by a high level of standardization, as measured by between-teacher instructional variation. We argue that standardization is also more detrimental for boys' reading performance than girls', resulting in a larger female advantage in reading performance in countries with highly standardized educational systems.
As a starting point to understand why standardization may have negative consequences for boys' reading performance, it is well established that students' attitudes, motivations, and interests play an important role in their reading performance (Meece, Glienke and Brug, 2006). Prior research reports that boys experience less reading enjoyment and are less frequent readers in their free time than girls (Christin, 2012;Clark and Trafford, 1995). For example, the OECD (2015) found that boys read less often for enjoyment than girls in all but one OECD country (Korea). Additionally, research suggests that the relationship between reading interest and reading performance is stronger for boys than for girls. For instance, Logan and Medford (2011) showed that intrinsic reading motivation is more strongly associated with reading performance for boys than for girls. Also, Oakhill and Petrides (2007) found that boys have better reading comprehension when they find texts interesting, while this relationship is weaker for girls. As a result, especially boys' reading performance may be harmed in highly standardized educational systems (Ayalon and Livneh, 2013), because mandatory course content (texts) may not suit their interest, and teachers do not have the flexibility to select reading materials that are tailored to students' interests.
Additionally, Wö ssmann (2003) argued that local schools and teachers are generally better able to assess students' needs than governmental institutions, as teachers and schools have personal knowledge about their students (see also Stevenson and Baker, 1991). As Montt (2011: p. 52) stated, teacher control over curricula may allow schools to 'meet the particular needs of lowachieving students in their local context, potentially reducing dispersion in achievement.' Within standardized curricula, however, teachers are required to use uniform course materials, methods, and textbooks. This uniformity may be least problematic for motivated students, who adapt well to standard educational methods, many of whom are girls. Conversely, the use of uniform teaching methods and courses likely is more detrimental to students who are less interested and experience less enjoyment in reading, many of whom are boys. To the degree that boys' reading ability is more dependent on motivation (Oakhill and Petrides, 2007), it may be extremely difficult for teachers within standardized educational systems to individualize boys' reading instruction in order to meet their needs. For these reasons, we hypothesize that boys' reading performance is more negatively affected by standardized educational curricula than girls' reading performance (Hypothesis 1).

Differentiation in Educational Tracks and Gender Differences in Reading Performance
Research has yet to reach consensus about how the level of differentiation within a country's educational system affects students' educational performance. In an early meta-analysis, Slavin (1990) concluded that early selection into tracks had no effect on students' educational achievement. Using a difference-in-difference design, Hanushek and Wö ssmann (2006), however found that reading scores of students were somewhat lower in countries with highly differentiated systems. Some previous studies have linked the age at which students are selected into tracks to gender inequality in educational performance and attainment. Hadjar and Buchmann (2016) showed that the female favourable gender gap in educational attainment was larger in educational systems with later tracking than those with tracking at earlier ages. Jü rges and Schneider (2011) and Pekkarinen (2008) claimed that selection into tracks at an early age disadvantages boys. Both studies derived insights from research that demonstrated that younger children in a class (i.e., children whose birthday is late in a school year) have a smaller chance of being allocated to academic tracks (Schneeweis and Zweimü ller, 2009). Crawford, Dearden and Meghir (2007) argued that this is due to their lower educational experience and lower maturity. If so, this may also have consequences for the inequality in reading performance of girls and boys. The finding that boys mature later than girls is well established (Tanner, 1978), and although social and cultural contexts may play a role in how developmental trajectories manifest themselves, biological causes of this phenomenon suggest that this male-female maturity gap is universal (Lim et al., 2015). As a result, the maturity gap favouring girls may lead to more girls in the higher tracks of secondary education, and boys being over-represented in lower tracks in educational systems that track students at an early age. Indeed, studies from several European countries found that girls are more likely to be allocated to higher tracks in secondary education than boys (Ayalon and Shavit, 2004;Pekkarinen, 2008).
Gender inequality in track placement may reinforce gender inequality in reading performance, first because classroom homogeneity is shown to enhance the achievement of students in higher school tracks and hinder achievement in lower tracks. Huang (2009) explained this divergent effect by pointing to lower quality instruction, less qualified and experienced teachers and slower learning pace in lower tracks of highly differentiated educational systems. Moreover, in highly differentiated countries lower performing students have fewer opportunities to interact with high-performing students. So, if boys are more likely to be placed in a lower educational track, due to their developmental lag relative to girls, such placement may hinder their reading performance; conversely, if girls are more likely placed in higher tracks, their reading performance may benefit.
Another reason that differentiation at an early age may be detrimental to boys' reading performance is related to how norms regarding masculinity differ between school tracks. In early differentiating countries, students in different educational tracks are often situated in different schools with, as student background correlates with track placement, a lower or higher socioeconomic composition. Legewie and DiPrete (2012) showed that in Germany, boys' disadvantage in reading is larger in schools with a large proportion of students from low socio-economic status backgrounds. They explicitly linked this phenomenon to the prevailing norms of masculinity arguing that, in these schools, boys gain status through sports, high-risk behaviours, and opposing authority, so non-academic norms are more common among boys. Contrastingly, in schools with a high socio-economic composition, school norms for boys are more directed at academic performance. For girls, femininity norms tend to align more with academic performance, so variation between different school tracks for them is likely smaller. Boys' lower track placement, as a consequence of early tracking, may therefore hamper their reading performance because they are more likely to be exposed to non-academic masculine norms in lower educational tracks (van de Werfhorst and Mijs, 2010).
In sum, we expect that in countries with highly differentiated educational systems, where students are selected in tracks at a relatively young age, boys are more often placed in lower level tracks where general performance levels are lower and non-academic masculine norms are more prevalent. As gender differences in cognitive and motivational development are believed to weaken after the age of 15 (Halpern, 2013), we expect that the developmental penalty for boys is lower in countries with less differentiated educational systems where tracking occurs later (or not at all). This leads us to hypothesize that: boys' reading performance is more negatively affected by early tracking than girls' reading performance (Hypothesis 2).

Data
We employ data from all waves of PISA (2000, 2003, 2006, 2009, 2012 and 2015) containing information from 15-year-old students from 37 countries. 2 Through a two-stage selection procedure, in every country first schools are sampled, after which 15-year olds in those schools are randomly selected. PISA data are generally considered to be of very high quality (Else-Quest, Hyde and Linn, 2010). The pooled PISA data set provides us with information on 1,525,604 individual students across six waves and 37 countries.

Measurements
Students' score on the PISA reading test indicates their reading performance (PISA, 2009). PISA provides measures of students' reading performance using a method based on Item Response Theory (Mislevy and Sheehan, 1987). Instead of a single measure, five 'plausible values' for a students' reading ability are provided. We estimate our models for each of the five plausible values of student's reading performance separately. Next, we merged results to arrive at correct estimates and standard errors (see OECD [2009] for details of this procedure).
Male is coded 1 for boys and 0 for girls. We further include four individual-level control variables. First, parental educational level (mean centred) is measured in years of education associated with the ISCED level of the highest educated parent. 3 Parental cultural resources are indicated by the number of books present in the student's family home. Response options were 0-10 books (0), 11-100 books (1), 101-250 books (PISA 2000) or 101-200 books (other waves) (2), and more than 250 books (PISA 2000) or more than 200 books (other waves) (3). We also include a control for students' age (mean centred) indicated by year and month of birth (ranging from 15.167 to 16.420 years) as some studies showed that older students perform slightly better in school (Schneeweis and Zweimü ller, 2009). Finally, since immigrant students tend to have a lower reading performance in most industrialized countries (OECD, 2012), we control for students' immigrant status measured with three dummy variables specifying native (born in the country to native-born parents), first generation immigrant (born outside the country to foreign born parents), and second generation immigrant (native born to immigrant parents). The 100,248 students with missing values on one or more of the individual variables were removed from the data set. The final data set consists of 1,425,356 students nested in 204 country-year combinations 4 and 37 countries. At the country-year level, the level of standardization of the educational system is indicated by the degree to which school curricula are nationally or regionally standardized. We constructed this using the PISA school questionnaire. In all waves PISA asked school principals: 'Regarding your school, who has considerable responsibility for the following tasks': (i) choosing which textbooks are used, (ii) determining which courses are offered, and (iii) determining course content. For PISA 2000 and 2003, we determined for each country the percentage of school principals that reported these matters 'were not a school responsibility. ' For PISA 2006 and 2015, we took the percentage of school principals who reported that local/regional or national education authorities were responsible. The average score of these three measurements indicates level of standardization in a country; a higher score refers to a higher level of standardization. Note that this measure of standardization varies over waves and over countries.
The level of differentiation of an educational system is derived from Bol and Van de Werfhorst (2013) and is measured at the country level; the younger students' age of selection into educational tracks in a country, the higher the level of differentiation. The countries with the highest level of differentiation are Austria and Germany that select students into educational tracks from age 10. Countries with no differentiation are assigned the age at which students leave secondary education, typically at age 16. We subtract the minimum (10) from this variable so that it ranges between 0 and 6 (see Appendix A).
Following prior research, we control for level of prosperity and level of gender equality at the countryyear level. Because country-level factors may also be related to gender gaps in reading scores (Ayalon and Livneh, 2013), we add interaction terms for both aspects with male. We used the Human Development Index (HDI) from the United Nations Development Programme (UNDP) as indicator for a country's level of prosperity. This HDI measure refers to general human development in three dimensions: health, knowledge, and standard of living (Malik, 2013); original values are multiplied by 10 for ease of interpretation. We used data from the World Value Survey (WVS) and European Social Survey (ESS) to determine a country's level of gender equality. 5 In both WVS and ESS respondents reported whether they agreed with the statement 'Men should have more right to a job than women when jobs are scarce' answer categories were: agree (0), neither agree nor disagree (1), or disagree (2). The aggregated country-year average indicates that the level of gender equality with a higher score reflecting more gender equality. Both control variables are mean-centred at the country level. Descriptive statistics for individual and contextual variables are presented in Table 1. Descriptive statistics of the contextual variables (not centred) per country are available in Appendix A.

Analytical Approach
We employ multilevel regression models in R to test our hypotheses. Based on Schmidt-Catran and Fairbrother (2016), we estimate four level models in which students (level-1 units) are nested in schools (level-2 units) that are nested in country-year combinations (level-3 units) that are nested in countries (level-4 units), and allow the effect of male to vary over all these levels. 6 Although we test hypotheses at the individual and (year-)country level, the structure of PISA data requires that we control for students being nested in schools; dealing with school level variation leads to more accurate estimates as effects of standardization and differentiation are possibly affected by processes in schools (Bol et al., 2014). As PISA prescribes, we use the student weight provided by PISA in our models.
In Table 2, we first estimate a null-model that shows how much of the variation in students' reading scores is due to their nesting in schools, country-year combinations and countries. Model 1 shows the uncontrolled effect of male; the mean difference between girls' and boys' reading scores. Model 2 includes all individual and contextual variables. The main effects of the contextual characteristics indicate how they affect reading performance of all students. In model 3, we interact the characteristics of educational systems with male; these estimates show to what degree standardization and age of selection affect girls' and boys' reading scores differently.

Results
In Table 2, the null-model shows an intraclasscorrelation of 12.3 per cent for the country variance parameter, 4.7 per cent for country-year, and 22.5 per cent for the school variance parameter. Students' average reading scores are thus dependent on the year and country in which they live and the school they attend; this justifies multilevel modelling.
In model 1, we observe that boys overall have significantly lower reading scores than girls (b ¼ À28.399). The variances in the slope of male (r 2 ¼ 5.288; r 2 ¼ 7.905) illustrate the variation in the effect of male on reading performance between country-year combinations and countries; this confirms that the difference between girls' and boys' average reading performance varies across these two levels. In model 2, all individual and contextual characteristics are included as main effects. Individual variables behave as one would expect: students with highly educated parents and parents with cultural resources as well as older and native students have higher reading performance scores than their counterparts. On the country-year level, standardization of educational curricula has a negative relationship with pupils' overall reading performance (b ¼ À27.527). In contrast, the age at which countries select students into tracks is positively related to reading scores (b ¼ 3.126), meaning that late (or no) differentiation is beneficial to students' overall reading performance. In their magnitude, the overall effect of standardization is somewhat larger than the effect of differentiation (given their range). In more prosperous countries, as measured by HDI, students have higher reading performance (b ¼ 25.854). A country's level of gender equality is negatively related to students' reading scores (b ¼ À18.614), but additional robustness checks show that this effect loses statistical significance when Japan, Turkey, Chile, Mexico, and Bulgaria are excluded from the analyses.
In model 3, we include cross-level interactions that together reduce the variance in the slope of male by 3.7 per cent on the country-year level and by 12.3 per cent on the country level. 7 Recall that standardization is measured on the country-year level and differentiation is measured on the country-level. The cross-level interactions represent the difference in the effect of standardization and differentiation of a country's educational system between girls and boys; main effects apply to girls (coded 0 on male). Figures 2 and 3 visualize these gendered effects, and Appendix B shows the associations between standardization and differentiation and the gender gap in reading performance. First, as hypothesized, in model 3 we see that a highly standardized curriculum in a country-year affects boys' reading scores (b ¼ [À21.784 À 8.128] À29.912) more negatively than girls' (b ¼ À21.784). Also in Figure 2, we observe a steeper negative slope for boys' reading scores than for girls' with rising standardization of curricula. This implies that the female-favourable gender gap in reading performance increases when  educational curricula are more standardized and teachers have less freedom to individualize reading instruction in order to meet students' needs. Gender differences in reading performance range from a 16.931 advantage for girls in countries with the least standardized educational systems, to 25.002 in countries with highly standardized educational curricula; this more than eight point difference approximates the gap in PISA reading scores between Latvia and the Switzerland. So, in line with hypothesis 1, boys are more than girls harmed by a lack of autonomy among teachers and schools.
Second, we hypothesized that boys would be more negatively affected by early differentiation than girls. Our results in Table 2 show the opposite: early differentiation is actually more detrimental for girls' reading scores than boys'. Model 3 and Figure 3 show that girls' reading scores increase (b ¼ 4.191) more than boys' reading scores (b ¼ [4.191-1.983] 2.208) as the age of selection in a country increases. In fact, additional analyses show that the main effect of country's age of selection is not statistically significant for boys. So, in countries that select students at an younger age (more differentiated countries), the gender gap in reading is smaller due to girls' lower reading performance. We therefore reject Hypothesis 2. Finally, living in a prosperous country-year affects boys' reading scores more positively than girls' (b ¼ 4.079), and a country's level of gender equality does not significantly affect the difference between girls' and boys' reading scores.

Robustness Analyses
We performed several robustness analyses to broaden the scope of our paper and shed light on some of the underlying theoretical mechanisms for male-female differences in reading performance. In this section, we discuss additional analyses we did for math scores, presented in Appendix C, and results when students' grades are accounted for, presented in Supplementary Appendix D, where we also present and discuss the results of six other robustness analyses.
We first test whether our results are specific to reading, or also apply to students' math performance. Models 2 and 3 in Table C1 of Appendix C show that standardization of educational curricula is related to overall lower math performance scores. Standardization, however, does not affect the gender gap in math performance; the interaction is not statistically significant (b ¼ 0.924). This is also visualized in Figure C1. So, whereas we find negative effects of standardization on both students' overall reading and math performance, we find that standardization enlarges the female favourable gender gap in reading, but is unrelated to the male favourable gender gap in math. It could well be that standardization asserts a less negative effect on boys' math performance because math is considered a more masculine field in which boys (are socially allowed to) have more interest. This difference in findings more generally supports the idea that different mechanisms underlie gender gaps in reading and gender gaps in math. For example, whereas social norms about gender atypical behaviour are more often brought forward to explain boys' lack of interest in reading, insecurity about math abilities has mainly been related to girls' disadvantage in math (DiPrete and Buchmann, 2013).
Second, model 3 of Table C1 shows no significant main effect of the age of selection in a country on students' overall math performance. This result contradicts earlier research on this topic (Montt, 2011). Since we employed all six available PISA waves this result reflects the most comprehensive evidence currently available on this issue. In concordance with the results for reading performance, we find that girls' math performance profits somewhat more from an later age of selection (i.e., less differentiation) (b ¼ 3.646, P¼ 0.103) than boys' (b ¼ [3.646-1.572] 2.074). Early differentiation in a country's educational system is thus more negatively related to girls' reading and math performance, than to boys' reading and math performance.
Lastly, in Table D3 in Supplementary Appendix D controls are included for whether students are below, at or above the national modal grade (ranging from À3 to 3). Model G shows that the effect of male is partly interpreted by students' grade position, which is consistent with the fact that boys more often repeat a grade and girls more often skip a grade. As shown in model H, the main effects of standardization and differentiation, as well as their interaction with male, are in the same direction and remain (marginally) significant when accounting for students' deviation from the national modal grade. This seems plausible since student's grade position likely to a large extent reflects earlier educational performance.

Conclusion and Discussion
Employing information on girls' and boys' reading performance from all six waves of PISA, this study is the first to assess how two important features of a country's educational system, standardization and differentiation, are related to gender differences in students' reading performance scores. In light of the fact that men lag behind women in educational achievement and attainment in large parts of the world today, this explicit focus on boys' disadvantage in reading, and how countries' educational policies are related to this lower reading performance, is of great importance.
First, it is important to acknowledge that standardization of educational curricula and early differentiation are related to lower overall reading performance scores in OECD countries. Our results clearly underscore the general importance of educational structures in the schooling of adolescents across the world. In countries where curricula and textbooks are more tailored to individual students, students earn higher reading scores than in countries where a one-size-fits-all ideology is prevalent. Additionally, our results support the idea that the moment of selection into different educational tracks should not be too early in a students' life; early tracking generally leads to lower reading (and math) scores.
Second, our study indicates that differences between girls and boys in reading performance in secondary school are substantially related to standardization and differentiation of a country's educational system. We find that standardization of educational curricula is more negatively related to boys' reading performance than girls' reading performance. In countries where governmental regulations largely determine school curricula and learning materials, gender gaps in reading scores are even more to the advantage of girls. These results are robust in that they hold when we control for standardization of output (central examinations) in a country (see Supplementary Appendix D1). They are consistent with theoretical notions that imply that restrictions placed upon schools and teachers to act upon students' individual needs are especially detrimental for boys. This may be because 15-year-old boys are often poorer readers who need more personalized attention, or because boys are less motivated readers who not develop their reading competencies even when they are obliged to spend time reading in class. More in depth research might be directed at the further implications of standardization for girls' and boys' motivation and learning opportunities in reading (and math), to fully probe the mechanisms behind this gender gap in learning.
We also find that in countries in which students are selected into educational tracks at later ages, gender differences in reading are larger because girls benefit more from late selection. While this finding does not confirm our hypothesis, it does align with other research that finds a larger female advantage in educational attainment in later tracking relative to earlier tracking educational systems (Hadjar and Buchmann, 2016;Scheeren, van de Werfhorst and Bol, 2018). Perhaps it is the case that in early differentiating countries, selection on performance restricts students' exposure to, and inspiration from high-ability students (Slavin, 1990), and this may be especially detrimental for studious girls (Jackson and Dempster, 2009). Future investigations of why girls and boys are not equally affected by early selection into educational tracks would be valuable and may dig deeper into aspects of class composition and schoolrelated norms in various tracks. Because of the crosssectional nature of the PISA data, we were not able to test the claim that low reading performance could be both a source and an outcome of low track placement. Longitudinal investigations that assess girls' and boys' achievement during their school career in different educational systems would provide a more direct test of how girls' and boys' reading achievement differs as a result of track placement. Ideally, such investigations would also include information on students' primary schools as the skills of students entering (different tracks of) secondary education may already be shaped by structures in primary education. This however will be difficult in a cross-national design that allows for variation in educational systems characteristics.
More generally, features of educational systems not only seem to affect inequality between students from different social backgrounds, but also between girls and boys. A focus on how various types of students are affected by the structures of educational systems therefore should be a central focus for future educational research. We find that in countries with more standardized educational curricula, gender inequalities are larger, whereas others found that standardization is associated with lower inequality between students from lower and higher socio-economic backgrounds (Montt, 2011;Bol et al., 2014). In addition, we conclude that early tracking is linked to less gender inequality in reading performance, whereas earlier studies found that in differentiated school systems differences between students from low and high socio-economic backgrounds are larger (Hanushek and Wö ssmann, 2006). Taken together, these finding suggest that structural features of educational systems do not simply enlarge or reduce inequality between student subgroups. A promising direction for future research therefore could be to address the reading performance of other vulnerable groups, such as children from immigrant backgrounds or single-parent families.
Dealing with information from a wide variety of countries is challenging, and like all cross-national research, this study has limitations. First, our measurement of differentiation of a country's educational system refers to data from 2009 (van de Werfhorst and Mijs, 2010; Bol and van de Werfhorst, 2013). Although early/ late tracking is likely a relatively stable characteristic of a country's educational system, it is possible that changes within countries occurred over time. Additionally, in constructing a differentiation measure researchers were understandably limited in dealing with all countries of PISA (Bol and Van de Werfhorst, 2013). So, in our analyses we could not include several less economically developed OECD-partner countries from the PISA samples. We recommend that future studies update the information on country's educational differentiation and extend this classification to more non-OECD countries. Moreover, working with large-scale comparable international information clearly poses limitations with respect to a thorough investigation of mechanisms. While we made an effort to consider mechanisms in several robustness analyses, our ability to do so was limited. At any rate this study is one of the first that links features of educational systems to gender inequality in reading performance, and by doing so, it provides future research a relevant foundation from which to further develop theory on this issue and establish more precisely the processes through which girls and boys may be differently affected by standardization of educational curricula and differentiation in secondary education.
Our study comprehensively assessed the relationship between the structure of a country's educational system and girls' and boys' reading performance. As girls' advantage in reading performance and women's lead in educational attainment continues (DiPrete and Buchmann, 2013; Van Hek, Kraaykamp and Wolbers, 2016), it is important to gain insight in whether and how countries' educational institutional arrangements contribute to opportunities and outcomes for both girls and boys. Our conclusion that girls and boys are differentially affected by features of educational systems implies that some countries do a better job in providing environments conducive for learning by all students. The central finding that standardization of course offerings, curricula, and reading materials is detrimental to the average reading scores of all students, but especially those of boys, is a meaningful starting point for future research.  1999, 2005and ESS 2004. If countries were present in both surveys, we preferred WVS. The years that were available differed per country; we inter-and extrapolated missing years. For Luxembourg, Latvia, Iceland, and Italy, we only had one value and assigned this to every year. For our set of countries and survey years, only one item was available to indicate countries' egalitarian gender norms. We prefer using this attitudinal item for gender equality since we consider it is a direct indicator of gender norms. Other measures of country-level gender equality, such as women's labour market participation, are possibly affected by other country variables (such as the economic necessity for women to work). 6 A replication package can be found in the Supplementary Material. 7 The results are not driven by outliers.

Gerbert Kraaykamp is a full Professor of Empirical
Sociology in the Department of Sociology and director of the Radboud Institute of Social Cultural Research at Radboud University Nijmegen. His main research interests lie with intergenerational transmission of inequality, parental socialization, educational careers, and cultural capital effects.