The Intraclass Correlation Coefficient as a Measure of Educational Inequality: An Empirical Study with Data from PISA 2018

doi:10.21203/rs.3.rs-4066275/v1

Download PDF

Research Article

The Intraclass Correlation Coefficient as a Measure of Educational Inequality: An Empirical Study with Data from PISA 2018

https://doi.org/10.21203/rs.3.rs-4066275/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

In this study we utilized data from the Programme for International Student Assessment (PISA) to investigate differences in schools’ average mathematics scores by treating intraclass correlation coefficient (ICC) as a measure of educational inequality. We computed ICCs both with unconditional and conditional multilevel models for 79 PISA 2018 participating countries and meta-analyzed the results. There are two main purposes of this study; to quantify the inequality after controlling for common PISA-variables across countries and try to explain heterogeneity at the country level, to investigate the school differences in Türkiye due to its large ICCs and the authors’ familiarity with its educational system. The results showed that PISA variables that are common across countries reduced the unconditional ICC by 5% to 73%, further at least 50% of the heterogeneity in conditional ICCs was explained by the country level non-PISA variables. The tracking age, cultural differences and Gross Domestic Product variables were particularly important to explain differences in conditional ICCs. The Türkiye specific model, consistent with the meta-analysis results, indicated that the school type is a substantial predictor to understand ICC. Overall, we briefly explained the important variables that might be helpful to investigate ICC for researchers, administrators and policy makers who are interested in studying educational inequality from a quantitative perspective.

Educational Psychology

Mathematics Achievement

PISA 2018

ICC

split analyze meta-analyze

Educational Inequality

Equity in education is a concept that can be used to guide the process of reaching all learners in society and strengthening the capacity of an education system. It should be seen as a general principle that guides all educational policies and practices, starting from the belief that education is a fundamental human right and the foundation of a more just society. Inclusion and justice have, therefore, become central themes for the overall development of education systems (Ainscow, 2016). When the focus is on the education system, the researchers have made important findings on schools’ role for students. In particular, they have focused on quantitative studies of school effects, showing that education is a key determinant of status attainment, family background is a strong predictor of success in school, and school effects are relatively small when controlling for family background (Schneider, 2018, p:493-494). Large-scale assessments have been eminently helpful for educational researchers and sociologists to compare education quality across regions, states, and countries. Despite some studies argued that comparisons from large-scale assessment might have negative effects (Emler et al., 2019), such comparisons provide essential insights for educational researchers and policymakers to evaluate an education system and students' academic progress over time.

A cross-country comparative approach using large-scale assessment data has advantages over national studies. These advantages include explaining institutional differences and revealing whether differences are country-specific or present in a large setting which often is overlooked in single-country studies (Hanushek & Woessmann, 2011). International and comparative education literature have inconclusive results on the role of schools in student achievement. For example, when this debate was examined in 25 different countries participating in the 2003 TIMMS 4^th grade administration, Chudgar and Luschei (2009) argued that family background is often more important than schools in understanding differences in student performance, that schools are an essential source of variation in student performance, especially in poor and unequal countries, and that in some cases schools can close the achievement gap between children of high and low socioeconomic status. However, they argue that the ability of schools to do so is not systematically related to a country's economic status or inequality.

Meeting the needs of a diverse student body and narrowing gaps in student performance represent formidable challenges for all countries. OECD (2004) reported that countries' approaches to these demands vary. Even in comprehensive school systems, there can be significant variation in performance levels between schools due to socioeconomic and cultural characteristics or geographical differences (i.e., differences between regions, provinces, or states in federal systems, or between rural and urban areas). There may also be differences between schools that are more difficult to measure or define, and some of these differences may be due to differences in the quality or effectiveness of the education provided by schools. As a result, the OECD report emphasizes that even in comprehensive systems, students' performance levels can vary across schools.

Students in OECD countries perform very differently. However, there is considerable variability in how students in different schools perform differently within countries. While some differences between schools can be attributed to students' socioeconomic background, others reflect the structural characteristics of schools or education systems or the policies and practices of school administrators and teachers. Thus, attending a particular school may improve academic achievement. Therefore, differences between schools, such as some schools having more resources and facilities while others lack them or some schools providing more support to students from diverse cultural, linguistic, and socio-economic backgrounds while others do less, can have a significant impact on the concepts of sustainability, inclusion, and equity (OECD, 2006).

Multilevel regression models can be utilized to investigate the impact of school differences on student achievement by summarizing within-school and between-school variability. Thus, the total variance can be divided into within-group and between-group components, and a separate model can be identified at each level (Snijders & Bosker, 2012). The intraclass correlation coefficient (ICC) is calculated to understand how much of the total variance is due to the schools (West et al., 2014, p:47) and it can be treated as a measure of educational inequality (Parker et al., 2018). Countries with high ICC values might benefit from investigating the reasons for large achievement differences between schools.

OECD (2018a) reported large differences across countries in the achievement gap between advantaged and disadvantaged students. Moreover, the size of this gap varies throughout students' lives as they complete transitions from childhood (i.e., primary school) to adolescence (i.e., secondary school) to young adulthood (i.e., higher education and the labor market). In many countries participating in PISA, levels of equity in student achievement have changed over the last 15 years. In other words, while the influence of socio-economic background on student achievement is strong, inequality of educational opportunities is also essential. Across OECD countries, students attending more socio-economically diverse schools score lower in math and science than students attending more socio-economically homogeneous schools. The opposite was reported only for Türkiye and Vietnam; however, student achievement data is not available for Vietnam. Thus, investigating achievement differences between schools within each PISA-participating country and comparing the results with a Türkiye specific model constitutes the main motivation of this study.

Overview of large-scale assessment

It can be argued that the research on the impact of school resources on student achievement began with the publication of the Coleman Report (Coleman, 1966) by the US Department of Education. Coleman (1966), in their report on Equal Opportunity in Education, found that when the socioeconomic background of students is taken into account, the relationship between schools and their students' achievement is similar and that socioeconomic factors have a strong relationship with academic achievement. However, when these factors are statistically controlled, the differences between schools account for only a small part of the variability in student achievement. They also found that differences in school buildings and curricula led to relatively small differences in student achievement as measured by standardized tests. These claims raised doubts about the suitability of schools as institutions that could improve social inequality (Coleman, 1966, p:21-22). Attention then turned both to understanding the analysis of the Coleman Report and to providing additional evidence on the effects of resources. Hanushek (1997), in a survey of nearly 400 studies on student achievement, showed that after accounting for differences in family inputs, there was no strong and consistent relationship between student performance and school resources. Today, it can be said that studies based on data from large-scale tests constitute an important body of knowledge in analyzing the factors affecting student achievement.

Measurement of student achievement with international large-scale tests has a history of more than half a century. These measures provide evidence of international differences in student achievement and how they have evolved over time. The International Association for the Evaluation of Educational Achievement (IEA) has played an important role in the development and maintenance of international large-scale tests. In 1964, the achievement of 13-year-olds in mathematics, reading comprehension, geography, science and non-verbal ability was assessed in 12 countries. The success of this study led to the First International Mathematics Study (abbreviated as FIMS). These studies continued in 1970 and 1980. Since 1995, the Trends in International Mathematics and Science Study (TIMSS) has measured fourth- and eighth-graders mathematics and science achievement every four years. The Progress in International Reading Literacy Study (PIRLS) was conducted at five-year intervals in 2001, 2006, 2011 and 2016. PIRLS investigates changes over time in children's reading achievement in the fourth grade (IEA, 2024). Until the late 1990s, the OECD’s comparisons of educational outcomes were based on unreliable measures of time spent in education that did not capture what students know or can do. The PISA changed this approach in the 2000s, introducing the idea of directly testing students' knowledge and skills through an internationally accepted measure. PISA measures the ability of 15-year-old students to use their knowledge in reading, mathematics, and science to cope with real-life challenges. It uses data from students, teachers, schools, and systems to understand differences in performance. The aim of PISA is not to create a top-down layer of accountability but to make information available so that schools and policymakers can make more informed decisions within the education system (Schleicher, 2018).

While TIMSS focuses on student achievement in mathematics and science, PISA measures overall student achievement in science, reading and mathematics, problem-solving, and collaboration skills across a broad range of subjects. Since 15 is the age at which most young people are approaching the end of compulsory education, PISA is administered to this age group. In addition, the selection of schools and students is very inclusive, so the sample of students comes from a wide range of backgrounds and abilities (OECD, 2023). One way to do this is to assess students at the same grade level. However, differences between countries in pre-primary education and its coverage, differences in the age of entry into formal education, and the institutional structure of education systems do not allow for the definition of internationally comparable grade levels (OECD, 2019). Overall, in our study we used PISA data due to its highly representativeness. We focused on mathematics achievement for two reasons; first, when we look at the large-scale assessment studies conducted in Türkiye reading and science performance were more common than mathematics, second, the authors of this study were mathematics teachers before switching to the field of educational assessment.

School and student level variables in large scale assessments

Inequalities among students can exist in many aspects, such as class, race, gender, sexual orientation, disability status, family economic status, parental education level, age at school entry, and immigration background (Aslanargun et al., 2016; Close & Shiel, 2009; Gürsakal, 2012; Karaman & Atar, 2019; Schneider, 2018; Yetişir & Batı, 2021). In addition, it has been observed that variables such as mathematics self-efficacy, anxiety, and attitude toward mathematics have significant effects on mathematics achievement (Ayebale et al., 2020; Kalaycioglu, 2015; Ötken, 2021). There are studies revealing that socio-economic and cultural status, as abbreviated ESCS (i.e., economic, social and cultural status) by PISA is one of the most important variables at both student and school level in determining student performances (Alacacı & Erbaş, 2010; Aydın et al., 2012; Demir et al., 2010; Ersan & Rodriguez, 2020; Kalaycioglu, 2015; Karaman & Atar, 2019; Laukaityte & Rolfsman, 2020; OECD, 2004, 2006; Perry & McConney, 2010; Suna & Özer, 2021; Thien, 2016; Yetişir & Batı, 2021). Variables such as teacher attitude, male student ratio, school region, classroom environment, school size, and school type play an important role in explaining the variance in student performance (Berberoğlu & Kalender, 2005; Laukaityte & Rolfsman, 2020; Lee, 2000; Topçu et al., 2015; Yetişir & Batı, 2021). Lavy (2015) investigated the effect of additional instructional time on students' academic achievement, using data from PISA 2006 and evidence from a sample of 15-year-old students from more than 50 countries, showing that additional instructional time has a positive and significant effect on student achievement. Several studies have focused on how the achievement of children with a migration background varies depending on this variable. For example, Dustmann et al. (2012) showed that the mean difference in test scores between immigrant and non-immigrant children varied widely across countries and that this difference was strongly associated with parental background. They found that when parental background was controlled for, the disadvantages of immigrant children were reduced or even disappeared in some countries, and that the foreign language spoken at home was the most important factor associated with the achievement gap. In addition, Cobb-Clark et al. (2012), using PISA 2009 data, found that while variables such as starting school at an early age and tracking students' abilities positively affected the migrant-native achievement gap in mathematics, science and reading; on the other hand, variables such as the absence of the test language as a mother tongue, education expenditures negatively affected the achievement gap. However, no matter how important student-level variables are, school-level differences in math and science achievement are found to be significant even after controlling for student and family background variables (Chudgar & Luschei, 2009). Despite socio-economic disadvantage, some students achieve high levels of academic proficiency. On average across OECD countries, one in ten disadvantaged students scored in the top quartile of reading ability performance in their country. In Argentina, Bulgaria, Colombia, Czech Republic, Hungary, Israel, Luxembourg, Peru, Romania, Slovak Republic, Slovak Republic, United Arab Emirates and Switzerland, the typical disadvantaged student has less than one in eight chances of attending the same school as a high-achieving student. In contrast, in Baku (Azerbaijan), Canada, Denmark, Estonia, Finland, Iceland, Ireland, Kosovo, Macau (China), Norway, Portugal, Spain and Sweden, the rate is one in five (OECD, 2019).

In Türkiye, achievement variation among schools is examined through international achievement monitoring studies and the data obtained from the questionnaires of these exams or the results of the central exams used at the national level. In these studies, PISA and TIMSS data are utilized to examine achievement differences in science, mathematics or reading skills between schools (Akyüz, 2014; Alacacı & Erbaş, 2010; Aslanargun et al., 2016; M. Aydın, 2015; Berberoğlu & Kalender, 2005; Demir et al., 2009, 2010; Dinçer & Uysal Kolaşin, 2009; Erdoğan & Acar Güvendi̇r, 2019; Ersan & Rodriguez, 2020; Gelbal, 2010; Kalaycioglu, 2015; Karaman & Atar, 2019; Kılıç et al., 2012; Ötken, 2021; Özdemir, 2015; Sevgi, 2009; Suna, Tanberkan, & Özer, 2020, 2020; Suna & Özer, 2021; Topçu et al., 2015; Yavuz et al., 2017; Yetişir & Batı, 2021). For example, while the PISA 2018 student performance of Albania, the Republic of Moldova, Peru, and Qatar in particular, showed remarkable improvements in their students' performance, Türkiye's progress between 2003 and 2018 was less impressive (OECD, 2019). For the 2003 cycle, in Hungary and Türkiye, differences in performance between schools were particularly large, with variation between schools about twice the OECD average (OECD, 2004). One of the situations that creates large variation and disadvantage among students is tracking systems (Van de Werfhorst & Mijs, 2010). The practice of assigning students to instructional groups based on their abilities is known as tracking (Hallinan, 1994). The negative consequence, however, is that in some cases it segregates students by socioeconomic status. Because academic achievement is associated with students' background, minority and low-income students are often disproportionately assigned to disadvantaged groups (Domina et al., 2017; OECD, 2012).

Research on tracking age

Tracking age varies among countries. Some countries direct students to schools with different abilities from the age of 10 or 11 (e.g. Türkiye, Austria, Germany, Hungary, Czech Republic, Slovakia). Some countries, such as Australia, Canada, Denmark, Estonia, Iceland, Malta, New Zealand, Norway, Poland, Sweden, the United Kingdom and the United States, offer only one academic program to 15-year-old students and direct them to different schools from the age of 16 onwards. In education systems that use tracking, some students choose or are selected into more academically challenging programs that focus on the general skills needed for postsecondary education, while other students choose or are selected into vocational or technical programs that focus on the general and labor market practical skills needed for postsecondary education. The age at which students first enter the tracking system and the number of different instructional programs offered to students are associated with students' learning outcomes (OECD, 2020). Both parents and politicians want to know whether a country's tracking of its students into different types of schools structured hierarchically according to performance would have consequences for the equity and efficiency of educational outcomes (Hanushek & Wössmann, 2006). Woessmann (2016) presented evidence that variation in student achievement across countries is associated with differences in the organization and management of school systems. For example, students in many high-performing countries such as Korea and Finland, as well as in some provinces of Canada, face external exit exams at the end of high school. While most schools in Hong Kong and the United Kingdom have some autonomy in deciding which subjects to offer and which teachers to hire, almost no schools in Greece have this autonomy. While more than half of students in the Netherlands, Belgium, Ireland and Korea attend private schools, there are almost no students attending private schools in Norway and Poland. While students in Austria and Germany are directed to different schools according to their abilities at the age of 10, two-thirds of OECD countries have inclusive school systems until at least the age of 15. Studies conducted within or between schools in a particular country may also suffer from the risk of selection bias, where students from certain backgrounds are more likely to attend certain schools or attend certain programs, and examination of national differences may ignore some of these. Hanushek and Wössmann (2006) investigated the effect of early tracking on the level and distribution of student performance. Results showed that early tracking increased inequality in achievement. Although studies conducted within or between schools in a country face the risk of selection bias, examining differences at the national level can eliminate some of these selection biases. However, this advantage of the cross-country comparative approach also brings some limitations. Identifying predictors raises particular challenges in the international setting. Countries may differ in ways that are difficult to observe in a variety of areas, such as cultural characteristics, evaluations of success, or preferences associated with institutional choices and levels of success. Such unobserved country heterogeneity leads to omitted variable bias in cross-country analyses (Woessmann, 2016).

Contini and Cugnata (2020) evaluated the impact of early tracking on learning inequalities by using large scales assessments. The findings are that, given inequality in primary school, inequalities in secondary school are significantly larger in early tracking countries than in late tracking countries. Early tracking has been questioned as a factor to increase inequality, especially in countries where strong inequalities already exist in primary school. Overall, the evidence is that early tracking increases achievement inequalities, particularly by widening the gap between children from different social backgrounds.

Piopiunik (2014) evaluated the school reform that shifted the tracking schedule from the 6^th grade to the 4^th grade in schools in the German state of Bavaria, by utilizing the data from three PISA cycles to eliminate state-specific and school type-specific differences. The results showed that the reform reduced the performance of 15-year-old students in schools. Strello et al. (2021) stated that many studies conducted to date on the effects of tracking on inequalities in achievement and performance have been inconclusive, as different studies use different data, focus on different areas, and use different inequality measurements. To solve this problem, they examined data from PISA, PIRLS and TIMSS, the three largest international assessments administered in more than 70 countries and regions over the past 20 years, and combined data from a total of 21 primary and secondary school assessment cycles to estimate difference-in-differences models for different outcome measures. The authors studied the effects using a meta-analytical approach and examined the impact of tracking age on educational, social and performance outcomes. They found no evidence that tracking increased performance levels. But they found that an extra year of exposure to the tracking system reduced the variation in achievement scores.

Van De Werfhorst (2018) used data from TIMMS, an international student assessment exam administered to eighth graders, in their study educational inequalities based on socioeconomic background in nine countries were examined. The results showed that socioeconomic inequalities decrease more strongly in systems that transform education systems from tracked education to comprehensive education than in systems without this reform. Socioeconomic inequality among top-performing students decreased after the inclusive education reform. This article has shown that inequality of educational opportunity, as measured by the relationship between parental socioeconomic status and student performance on standardized mathematics tests, is greater in tracked education systems than in comprehensive systems. It has shown that reform in England and Wales in particular has been effective in reducing observed inequalities in mathematics performance. Overall, it is noteworthy that above mentioned few studies examined the school differences rather with limited number of predictor variables. Additionally, when we look at the studies conducted in Türkiye, it is seen that nested data structure, or in general the complex structure of data sets (e.g., plausible values, weights) were not addressed properly.

Purpose of the study

This study aims to investigate the school- and student-level variables that cause variation in mathematics performance in the PISA 2018 dataset. Due to the complex structure of the PISA data, before starting the analysis, we computed unconditional ICC estimates and compared them with the values in the PISA technical report (OECD, 2019) to ensure a similar methodological approach.

There are two main purposes of this study. The first purpose is to quantify the achievement differences due to schools after controlling for common PISA-variables across countries and try to explain differences at the country level. For this purpose, student- and school-level variables that are available in all countries were included in the multilevel model, and conditional ICC values were calculated. When the conditional ICC values were analyzed, it was seen that the variables added to the model reduced the variation in mathematics performance to a certain extent, but did not eliminate it, and the effect of non-PISA variables were investigated. Therefore, in a meta-analysis, we tried to explain the heterogeneity with the moderators identified based on the literature at the country level.

The second purpose is to investigate the school differences in Türkiye given that it has particularly large differences between schools (OECD, 2018b) and the author are familiar with the education system. Türkiye specific PISA variables were determined based on a literature review at the school and student level that may be related to ICC.

We conclude the study by comparing the results obtained from the meta-analysis and Türkiye-specific model. The variables that explained school differences were identified in order to provide insights for school administrators, researchers and policy makers who are interested in eliminating educational inequalities arising from the differences at the school level. Therefore, for the PISA 2018 mathematics achievement scores, our research questions were set as:

Research Question 1: After controlling for common PISA-variables across countries, to what extent can observed heterogeneity in school differences be attributed to country-level characteristics?

Research Question 2: What are the important student- and school-level variables that can explain the differences between schools in Türkiye?

Data source

In this study, data sets of 79 out of 80 countries participating in the PISA 2018 cycle were used, Vietnam could not be included due to unavailable student achievement data. PISA collects educational data from students, teachers, school administrators and parents through surveys. The data obtained with the help of these surveys and scales are available in the OECD database as student and school level files and contain many variables at both the student and school levels.

Countries, regions or economies that are or are not OECD members can participate in PISA studies. PISA uses a two-stage sampling design to ensure national representativeness. In the first stage, schools are selected randomly, and in the second stage, it is aimed to randomly select 35 students from among the 15-year-old students in the selected schools. Excluding Vietnam, a total of 606627 students from all countries participated in PISA 2018. The total number of schools from the countries participating in the study is 21752. In some countries, data on some variables were not collected. For example, there is no data on the number of boys and girls in schools for Austria, New Zealand, and Canada, and on school ownership type (i.e., SCHLTYPE) for Belgium and Ireland. Another exceptional situation is that the variance of some independent variables in some country data sets is zero. For example, for Israel the school type is the same for all schools. In both cases, variables with all observations missing or with zero variance could not be included in the analysis for the relevant countries (see section 1 in supplementary for additional details).

Analytical approach

For the first research question, conditional ICC values obtained with multilevel models and then were examined within the framework of meta-analysis. These steps are known as split, analyze, meta-analyze (Cheung & Jak, 2016) and further classified as two-stage approach by (Scherer, Siddiq & Nilsen, 2024). For the second research question, a multilevel model was built specifically for Türkiye. The following paragraphs explain these models respectively.

After the unconditional ICC comparison, the conditional ICC was calculated by adding the predictors to the model. Conditional ICC can be expressed as the correlation between the dependent variable values of two students in the same school when the predictor variables are controlled. Predictor variables added to both levels are expected to explain some of the variability in the dependent variable. In this case, an ICC value that does not decrease compared to the unconditional ICC value indicates that the predictors added to the model are not functional in explaining the ratio of the school level-based variance in the dependent variable to the total variance. To answer the first research question, conditional ICC was calculated by including the predictors in the PISA data set, which were determined to be important at both levels based on the literature review and could be included in the model in the same way for all countries.

Computation of ICC

Before calculating the ICCs for all countries, the data set was read into R (R Core Team, 2021). The student and school level data sets downloaded separately from the OECD database that contain dependent variables (i.e., PV1MATH-PV10MATH), independent variables as explained below, student and school level weights (W_FSTUWT and W_FSCHWT). Separate data sets were combined at the student level using the package intsvy (Caro & Biecek, 2017).

The data set was divided into 79 data sets on a country/region basis. Then, each set was divided further into 79x10=790 data sets in total, with one plausible value in each. Two-level model Mplus (Muthen & Muthen, 2017) syntax to compute ICC was automated for each data sets with the help of the MplusAutomation (Hallquist & Wiley, 2018) package. The robust estimator (i.e., MLR) was chosen (see section 2 in supplementary)

The multiple imputation method (as recommended by OECD, 2009) was used to combine the results from the analysis of each plausible value to obtain the unconditional and conditional ICCs. The aim here is to combine the results obtained for different plausible values. A single input file was prepared for the null model, and plausible values and student and school level sample weights were used in the analysis. With the help of an R script (see section 3 in supplementary), the student and school level residual variances were extracted from the Mplus output files.

In the next stage, the conditional ICC coefficient was calculated by taking student and school level predictors into the model. Nominal or ordinal categorical variables were included in the analysis by creating dummy variables. As stated in the previous paragraphs, conditional model analyzes were carried out in two phases, because for some countries all observations were missing, or the variance was observed to be zero in some predictors. Since neither of the above-mentioned situations were observed in the predictor variables in the 62 countries, these data sets were analyzed with a single input file. For each of the remaining 17 countries, different input files were prepared (variables with zero variance and/or all observations were missing were excluded from the analysis) and analyzes were carried out. To handle missing data, full information maximum likelihood (FIML) was utilized as suggested by Hox et al., (2015). Mplus assumes normal distribution to apply FIML to categorical variables (Muthen & Muthen, 2017). A similar situation also applies to dummy variables used in the conditional model. Muthen (2015) stated that when dummy predictors were added to the model and the missing data was dealt with the help of FIML, normality was assumed for these variables, and in the simulation studies, accepting this assumption did not significantly change the results. In the first stage, with the help of the R script written for the 62 countries, the student and school level residual variances were extracted from the Mplus outputs, and the conditional ICC coefficient was calculated for all countries. Conditional ICC coefficients for the remaining 17 countries were calculated one by one.

Meta-analysis

The data set for the meta- analyses included conditional ICCs, school sample size and 12 moderator variables for the 79 PISA regions as explained in the measures section. To meta-analyze ICCs, we followed the steps outlined by Martinez-Romero et. al (2020) and utilized the metafor package (Viechtbauer, 2010) to estimate a random effects model with the mice package (van Buuren et. al, 2015) to address missing data. Specifically, we used escalc function to transform ICC values to Fisher’s z by treating the number of PISA participant schools in each country as the sample size. Before inviting moderators into the model, heterogeneity was assessed using the Q statistic and the I² index. To assess moderator effects, we utilized rma function and run the model with 12 moderators for each of the 100 imputed data sets. As suggested by Viechtbauer (2022) sampling variances (i.e., vi) were excluded from the predictor matrix and to adequately combine results the coefficients table was created based on the pool function’s output. Across 100 model results, we also reported the minimum, maximum and median values for the I², R², residual heterogeneity (Q_E) with 66 degrees of freedom and Q statistic for the moderators (Q_M) (see section 4 in supplementary).

Computation of ICC for Türkiye

In the previous stages, unconditional ICC for Türkiye was found to be 0.57 and conditional ICC was 0.43. While calculating the conditional ICC, predictors were added to the student level and school level. At this stage, school-level variables that have the potential to explain the differences between schools were identified and a model specific to Türkiye was build. No additions were made to student-level predictors. School-level variables that are included in the PISA 2018 data set and may have a direct impact on mathematics achievement were searched. The variables included in the model are given under the Türkiye specific model heading under the measures section.

The analysis was carried out by including student and school level variables used in previous analyzes plus additional school level variables, referred as the full model for Türkiye. We also utilized a stepwise backward selection method based on the statistical significance. The predictor that was not significant and had the highest p value was removed from the analysis and the analysis was repeated. This process was repeated until all variables in the model were significant, referred as the final model for Türkiye. The ICC coefficient was calculated for the full and final models. Data processing was made with R (R Core Team, 2021) using dplyr package (Wickham et al., 2023) for renaming variables, car (Fox & Weisberg, 2019) package for recoding variables and fastDummies package (Kaplan, 2023) to create dummy variables. Models were analyzed with Mplus (see section 5 in supplementary)

Measures

Variables in the multilevel model built for all countries

Mathematics Achievement (PV1MATH-PV10MATH): The dependent variable of the study. In PISA studies, students take only a part of the relevant achievement test, not the entire test, hence creating the basis for plausible values (Laukaityte & Wiberg, 2017; OECD, 2009; Rubin, 1987; Rutkowski et al., 2010). PISA determined 10 plausible values for each competency area per student in 2018. These values had a mean of 500 and a standard deviation of 100 across all participating countries.

Student level predictors

ESCS: Derived from three variables related to family background: parents' highest level of education, employment status and household possessions, including books. ESCS averages of the countries included in the analysis vary between -1.91 and 0.54. The average of all countries was found to be -0.28 and the median was -0.18.

Number of Classes Skipped in the Last Two Weeks (ST062Q02TA): A variable consisting of four categories in which the student is asked how many classes they missed in the last two weeks. Considering all students who answered this item in the PISA 2018 data set, 67% of the students reported that they had not missed any lessons in the last two weeks, 23% had missed one or two lessons, 6% had three or four lessons, and 4% had five or four lessons.

Primary Education Beginning Age (ST126Q01TA): Seven categories were presented to students for the age of starting primary education: category 1, "3 or earlier" and category 7, "9 or later". Categories 1 to 7 indicate ages 3 to 9 years. Based on the studies of Norman (2010), Robitzsch (2020) and Sullivan and Artino (2013), the variable "primary education beginning age" was included in the analysis as continuous variable as it has seven categories, the marginal distribution is not skewed (Rhemtulla et al., 2012) and the sample sizes are large enough. At the country level, the minimum average category value of the age for starting primary education is 2.75, the maximum average category value is 5.02, the average category value of all countries is 4.22 and the median is 4.18.

Immigration Status (IMMIG): An immigration history index derived from items (ST019) that ask about the countries of birth of the student and their parents. It consists of three categories: native, second generation and first generation. Considering all students in the PISA 2018 data set, 88% were native, 6% were second generation and 6% were first generation.

School level predictors

School Ownership (SCHLTYPE): Schools are defined in three different types of ownership: public school, state-supported private school, and private school. However, since in some countries there are no state-supported private schools or private schools, or the number of such schools is very small, these two categories were combined and included in the analysis. Of all the schools in the data set, 17% were private and 88% were public schools.

Educational Material Shortage (EDUSHORT): Lack of educational resources; It is derived from school principals' responses to four items regarding educational material (e.g. textbooks, IT equipment, library, or laboratory material) and physical infrastructure (e.g. building, floors, heating/cooling, lighting and acoustic systems). At the country level, the minimum average was –1.07, the maximum average was 1.19, the average of all countries was 0.13 with a median of 0.11.

Staff Shortage (STAFFSHORT): Derived from the items related to faculty shortage, insufficient or low-qualified faculty, shortage of auxiliary staff, and insufficient or low-qualified auxiliary staff. At the country level, the minimum value was -1.00, the maximum value was 0.94, the average was -0.03 with a median of 0.01.

Student Behavior Hindering Learning (STUBEHA): Derived from six items regarding the school principal's perceptions of the school climate, particularly student behavior that may affect teaching at the school. The country level minimum average was –1.21 and maximum average was 1.11. The average was 0.03 with a median of 0.06.

Teacher Behavior Hindering Learning (TEACHBEHA): Derived from five items regarding school principals' perceptions of the school climate, particularly teacher behaviors that may affect teaching in the school. The country level minimum average was –1.01 and the maximum average was 1.11. The average was 0.10 with a median of 0.10.

Community in Which School Located (SC001Q01TA): It includes five categories: village or rural area, small town, town, city and metropolitan. This variable has been recorded in three categories: small, middle size and large communities. The distribution of schools participating in the PISA 2018 study in these categories was 35%, 27% and 39% respectively.

Ratio of Male Students in School (BOY): Since it is a quantity stated in the literature to have an impact on mathematics achievement, the variable BOY, which expresses the proportion of male students enrolled in the school, was obtained by using the number of male students (SC002Q01TA) and female students (SC002Q02TA) in the school. At the country level, the minimum BOY value was 0.48, the maximum was 0.55, the average was 0.51 with a median of 0.51.

Variables used in meta-analysis

Tracking/ First Segregation Age: It is the age at which students are first selected to schools that implement a tracking system. Across the countries and economies participating in PISA 2018, students in 31 education systems are first selected for different programs when they turn 15. In 19 education systems, the tracking age was 16; in 8 education systems it was 14; and in 15 systems it was 13 years. The countries that select students at the youngest age, 10, are Austria, Germany and Hungary. The Czech Republic, Slovak Republic and Türkiye tracks at 11 years OECD (2020). In the analyses segregation age was recoded as 16 minus the age, resulting 0 when segregation was at 16 and 6 when the segregation was at 10.

The Number of School Types: In the OECD (2020) report the number of school types were available for each country, ranging from 1 to 6 the median value was 3. In the analyses, this variable is centered at 3.

Human Development Index (HDI): The Human Development Index is a metric compiled by the United Nations Development Program and used to measure a country's average achievement in the three key dimensions of human development: a long and healthy life, knowledge and a decent standard of living. HDI was first measured in 1990 and has been published every year since then, with the exceptions of 2012 and 2020/21. Most developed countries have an HDI score of 0.8 or above, placing them in the very high human development tier. These countries have stable governments, widespread and affordable education and healthcare, high life expectancies and quality of life, and growing, strong economies. In contrast, the "low human development" category includes the world's least developed countries with HDI scores below 0.55. Less developed countries face unstable governments, widespread poverty, lack of access to healthcare, and inadequate education. Additionally, these countries have low income, low life expectancy and high birth rates. HDI scores for countries were provided by the World Population Review and data from 2018 were used in this study (World Population Review, 2018). In the meta-analysis this variable is standardized to have a mean of 0 and standard deviation of 1.

Gini coefficient: Also called the Gini index or Gini ratio, is the most widely used measure of income distribution. The higher the Gini, the greater the difference between the incomes of a country's richest and poorest people. Gini coefficient helps identify high levels of income inequality, which can have many undesirable political and economic effects. It varies between 0 and 1 but is usually given as a percentage. For example, if a nation had absolute income equality and every individual earned the same amount, its Gini score would be 0 (0%). On the other hand, if one person earns all the income in a country and the rest earn no income, the Gini coefficient will be 1 (100%). While the Gini coefficient is a useful tool for analyzing the distribution of wealth or income in a country, it does not indicate the overall wealth or income of that country. Some of the world's poorest countries, such as the Central African Republic, have the highest Gini coefficients (61.3). A high-income country and a low-income country may have the same Gini coefficients. Gini coefficients for countries were provided by the World Population Review and data from 2018 were used in this study (World Population Review, 2018). In the meta-analysis this variable is standardized to have a mean of 0 and standard deviation of 1.

Gross Domestic Product (GDP): It refers to gross domestic product (GDP), which is the total monetary value of all goods and services produced by a country. GDP helps inform individuals about how well the economy is doing. GDP per country is primarily used to measure the strength of a country's economy and is an indicator of how fast a country is growing (World Bank Open Data, 2018). The data was provided by the World Bank and data from 2018 was used in this study. In the analysis this variable is standardized to have a mean of 0 and standard deviation of 1.

Government Expenditure on Education (GEX): It refers to the expenditures made from the general government budget on education and is expressed as a percentage of GDP. The data was provided by the World Bank and data from 2018 was used in this study. In the meta-analysis this variable is standardized to have a mean of 0 and standard deviation of 1.

Hofstede's Cultural Dimensions: Dutch psychologist Hofstede et al. (2010) explained the impact of the culture established in a society on the values of the members of that society and its connection with their behavior. They defined culture as “the collective programming of the mind that distinguishes members of one group or category of people from others.” They put forward six basic dimensions for society to organize itself and named these as dimensions of culture. These dimensions were determined as individualism, masculinity, uncertainty avoidance, long-term orientation, power distance and tolerance. Each dimension is expressed on a scale ranging from 0 to 100 but in the meta-analysis this variable is standardized to have a mean of 0 and standard deviation of 1. These dimensions were obtained by comparing many, if not all, countries in the world.

Individualism: It is the extent to which individuals feel they belong to the society they live in and the impact of this on cultural differences. For example, if individuals think about their own interests and feel independent from the society, it is classified as an individual society, and if individuals make their decisions according to the value judgments of the society and do not seek an interest independent of the interests of the society, it is classified as a social society.

Masculinity society: The decisions a society makes in a logical or emotional framework determines which type of society it belongs to. If a society makes its decisions within the framework of logic, that society is classified as a masculine society.

Uncertainty avoidance: A judgment about societies has been reached by examining the reactions of individuals in society to the uncertain situations they experience. It has been observed that societies that avoid uncertainty in events with uncertain and unpredictable outcomes are more tense, stressed and therefore less productive. It has been stated that societies that do not avoid uncertainty and think that uncertainty is a phenomenon that must exist, act more comfortably in situations where the outcome is unclear and unpredictable.

Long-term orientation: In a long-term oriented culture, the basic idea about the world is that it is in a state of change and there is always a need to prepare for the future. In a time-oriented culture, the world is essentially as it was created, so the past provides a moral compass and it is morally good to hold on to it. This dimension concerns life philosophies, religiosity, and predictions of educational success.

Power distance: Power Distance is the degree to which less powerful members of institutions and organizations (such as the family) accept and expect unequal distribution of power. This dimension explains how behaviors are shaped in proportion to the importance societies attach to hierarchy. It can be observed that in societies where power distance is low, subordinates are more comfortable in their relations with their superiors and have more say in the decisions made and can act critically, while in cases where this distance is large, subordinates do not question their situation and act more accepting.

Tolerance: This dimension emphasizes the freedom of individuals' behavior. In societies classified as tolerant, individuals act considering their own entertainment and impulses; In societies classified as restricted, it has been observed that individuals limit their actions according to the morality and value judgments of the society.

Variables in the Türkiye specific multilevel model

In addition to the variables included in the model created to calculate ICC for all countries, the following school-level variables were included in the Türkiye specific model.

Stratum as School's Academic Type (STRATUM): The school sample for Türkiye in PISA 2018 was determined by the stratified random sampling method (MEB, 2019). 186 schools and 6890 students representing Türkiye participated in the PISA 2018 application. This variable was recoded to obtain the school types in the STRATUM variable. Recoded STRATUM is a nominal variable with eight categories, and the numbers of high schools (HSs) and students in each category are given in Table 1 below.

Table 1. Number of Schools and Students in Different School Types in Türkiye

School Type	Category	Schools	Students
Anatolian High Schools	2	78	3013
Vocational and Technical Anatolian High Schools	3	56	2143
Anatolian Imam and Preacher High Schools	4	25	943
Multi-Program Anatolian High Schools	4	8	273
Lower Secondary Schools	4	6	22
Science High Schools	1	6	226
Social Sciences High Schools	1	6	228
Fine Arts High Schools	4	1	42

Since mathematics achievement levels were close to each other and the number of schools and students sampled was small, some school types were combined and grouped under a single category. As Science HSs and Social Sciences HSs as the high achiever schools, they grouped together into a single category which was the reference category in the analysis. Anatolian HSs and Vocational and Technical HSs were used as they are. All the remaining four school types were combined as the “Others” category. After these adjustments, the STRATUM variable is a nominal variable with 4 categories.

Career Guidance at School (SC162Q01SA): It is a variable with two response categories (“planned interviews are held for all students” or “students conduct interviews voluntarily”) to the question about the career guidance services of the school guidance service. 109 schools provide career guidance services on a planned (reference category) basis, and 51 schools provide them on a voluntary basis.

Student Placement by Ability into Different Classes (SC042Q01TA): It is a nominal variable with three categories (all students, some students, no students) that contains information about whether students are educated in different classes according to their ability/success status. The number of schools where all students are divided into success classes according to their ability/success status is 34, the number of schools where some of the students are divided into success classes is 63, and the number of schools where there are no success classes is 85. The category into which all students were divided into success classes was taken as the reference category and the other two categories were combined.

Student-Teacher Ratio (STRATIO): Student-Teacher ratio was obtained by dividing the number of students enrolled in the school (SC002) in the school level data set by the total number of teachers (TOTAT). The student-teacher ratio in schools in Türkiye was found to be minimum 2.34 and maximum 40.76. The Türkiye average for this variable was 13.46 with a median of 13.37.

School Size (SCHSIZE): It is equal to the sum of the number of male and female students enrolled in school (SC002) and is included in the school level data set. For Türkiye, the minimum school size was 26, the maximum was 3263, the average was 655.7 with a median of 612.

Ratio of Teachers with Master’s Degree (MASTER). In the school level data set, the rate of teachers with a master's degree (PROAT5AM), one of the variables containing the education levels of teachers at the school, was included in the analysis. The ratio of graduate teachers in Turkish schools was found to be minimum 0, maximum 1, average 0.15 and median 0.12.

Class Size (CLSIZE). The average class size in the school level data set consists of nine categories. The average class size in schools was found to be minimum 13 (15 or less), maximum 53 (50 or more), Türkiye average 41.26 (range 41-45) and median 48 (range 46-50).

ICC results of unconditional and conditional models

Unconditional ICCs (ICC_U) and conditional ICCs (ICC_C), calculated using the unexplained variances in school level and student level mathematics achievement, are given in Table 2 and were listed from largest to smallest according to unconditional ICC values. The largest ICC calculated without adding any predictor to the model was 0.61 and for the Netherlands a close estimate to 0.58 that reported in the technical report. The smallest ICC, 0.08, belongs to Iceland which was equal to the value in the technical report. Türkiye had the second highest ICC of 0.57 and almost identical to the reported value of 0.56. The mean of unconditional ICCs for all countries was 0.33 a value roughly identical to the reported average of 0.32, the median was 0.32, and the standard deviation was 0.12.

The largest conditional ICC was 0.50 and belongs to the Netherlands. The smallest ICC was 0.05 and belongs to Iceland. Türkiye had the second highest ICC, 0.43. The mean of conditional ICCs for all countries was 0.21, the median was 0.21, and the standard deviation was 0.10. With the addition of the predictors to the model, there was a decrease in the ICCs of the countries by at least 0.01 (for Kazakhstan) and at most 0.28 (for Malta). With the inclusion of predictors in the model, proportional decreases compared to the unconditional ICC were between at least 5% (for Kazakhstan) and at most 73% (for Malta). The decrease in the unconditional ICC for Türkiye was 0.15 and proportionally 25%. It has been observed that the predictors included in the model explained school differences in mathematics achievement across countries, but the explanation rates vary greatly from country to country.

Meta-analyses results

The random effect model without moderators resulted in an overall estimate of 0.22 indicating that the estimated average conditional ICC for the 79 PISA regions was .22. This model resulted in an I² of 66.4% and Q statistic of 234.06 (p<.01) indicating significant heterogeneity. Table 3 shows results for the model with moderators in which relatively important moderators were GDP (B= 0.03, SE=0.01, p =.053), Segregation age (B = 0.04, SE=0.01, p <.01), and Uncertainty avoidance (B = 0.02, SE=0.01, p =.061). Across 100 imputations I² ranged between 30.91 and 41.28 with a median value of 35.18, R² ranged between 62.96 and 76.41 with a median value of 71.36. Q_E and Q_M were all significant with Q_E ranged between 93.26 and 113.38 with a median value of 100.72 and Q_Mranged between 64.30 and 90.99 with a median value of 80.45. Overall, the moderators were considered successful to explain substantial amount of heterogeneity.

Table 2. PISA 2018 Mathematics Achievement Unconditional and Conditional ICCs

Code	Country	ICC_U	ICC_C	Code	Country	ICC_U	ICC_C
NLD	Netherlands	0.61	0.50	MYS	Malaysia	0.32	0.19
TUR	Türkiye	0.57	0.43	LTU	Lithuania	0.31	0.18
HUN	Hungary	0.57	0.40	MEX	Mexico	0.31	0.19
SRB	Serbia	0.51	0.40	THA	Thailand	0.30	0.23
SVN	Slovenia	0.51	0.34	JOR	Jordan	0.30	0.24
BEL	Belgium	0.50	0.38	CHE	Switzerland	0.29	0.21
DEU	Germany	0.49	0.37	SGP	Singapore	0.29	0.18
ITA	Italy	0.48	0.38	SAU	Saudi Arabia	0.29	0.19
PAN	Panama	0.48	0.28	MAC	Macao	0.28	0.12
AUT	Austria	0.48	0.41	BLR	Belarus	0.27	0.12
ARE	United Arab Emirates	0.47	0.28	KOR	Korea	0.27	0.15
ISR	Israel	0.47	0.37	TAP	Chinese Tapei	0.27	0.14
BRA	Brazil	0.47	0.26	GEO	Georgia	0.26	0.16
JPN	Japan	0.46	0.35	GBR	United Kingdom	0.26	0.13
FRA	France	0.46	0.36	RUS	Russian Federation	0.24	0.19
ROU	Romania	0.46	0.22	MNE	Montenegro	0.24	0.18
PHL	Philippines	0.45	0.23	AUS	Australia	0.24	0.15
ARG	Argentina	0.44	0.23	POL	Poland	0.24	0.10
CHL	Chile	0.44	0.24	QRT	Tatarstan (RUS)	0.24	0.19
LBN	Lebanon	0.44	0.38	UKR	Ukraine	0.24	0.14
BRN	Brunei Darussalam	0.44	0.22	BIH	Nosnia and Herzegovina	0.23	0.16
CRI	Costa Rica	0.43	0.24	USA	United States	0.23	0.10
QAT	Qatar	0.43	0.22	SWE	Sweden	0.23	0.14
URY	Uruguay	0.43	0.23	KAZ	Kazakhstan	0.22	0.21
QCI	B-S-J-Z (China)	0.42	0.29	KSV	Kosovo	0.21	0.14
LUX	Luxemburg	0.41	0.15	EST	Estonia	0.21	0.11
COL	Colombia	0.40	0.24	LVA	Latvia	0.20	0.13
CZE	Czach Republic	0.40	0.27	CAN	Canada	0.19	0.13
PER	Peru	0.39	0.21	IRL	Ireland	0.18	0.09
GRC	Greece	0.38	0.23	NZL	New Zealand	0.18	0.06
DOM	Dominican Republic	0.38	0.23	ALB	Albania	0.16	0.12
MLT	Malta	0.38	0.10	MDA	Moldova	0.16	0.08
HRV	Crotia	0.38	0.25	QAZ	Baku (Azerbaijan)	0.16	0.10
IDN	Indonesia	0.38	0.29	DNK	Denmark	0.16	0.08
BGR	Bulgaria	0.37	0.21	ESP	Spain	0.14	0.08
SVK	Slovak Republic	0.36	0.19	QMR	Moscow City (RUS)	0.14	0.11
PRT	Portugal	0.36	0.22	FIN	Finland	0.14	0.06
MKD	North Macedonia	0.35	0.22	NOR	Norway	0.10	0.08
MAR	Morocco	0.35	0.25	ISL	Iceland	0.08	0.05
HKG	Hong Kong	0.32	0.24

Table 3. Results of Meta-analysis with Moderators for Conditional ICC

	B	SE	t	p
Intercept	0.15	0.02	8.56	<.01*
HDI	-0.00	0.02	-0.12	.90
GINI	0.01	0.02	0.25	.80
GDP	0.03	0.01	1.98	.05
GEX	-0.01	0.01	-0.45	.65
Segregation Age	0.04	0.01	5.30	<.01*
Number of School Types	0.02	0.01	1.43	.16
Hofstede measures:
Power distance	-0.00	0.02	-0.05	.96
Individualism	-0.00	0.02	-0.15	.88
Masculinity	-0.02	0.01	-1.19	.24
Uncertainty avoidance	0.02	0.01	1.91	.06
Long term orientation	-0.02	0.02	-1.34	.19
Indulgence	0.00	0.02	0.04	.97

* p<.05

Türkiye specific multilevel model results

In addition to the student and school level predictors in the model analyzed for all countries, the variables previously described in the variables in the Türkiye specific multilevel model section, which are in the PISA 2018 school level data set and do not contain missing data for all observations, were included in the model as predictors (i.e., the full model). Türkiye's ICC value for this model was found to be 0.26. The decrease compared to the unconditional model ICC value was 54%, and the decrease compared to the previous conditional model ICC value was 40%. Considering this, it shows that the school-level predictors included in the model explain most of the school-level variance that could not be explained in the previous stages.

At this stage, stepwise backward predictive elimination method was used based on the p value. At the end of eight steps, the final model was obtained when all the predictors in the model were significant. When Table 4 is examined, although eight predictors (six school-level and two student-level predictors) were removed from the model, there was no significant increase in the unexplained variance at either the student level or the school level. The ICC coefficient calculated for the final model was found to be 0.28.

When the final model is examined in Table 4, socio-economic status (B =5.18, SE=1.30) variable is at the student level, educational material shortage (B =-10.53, SE=4.15), student behaviors hindering learning (B =-21.50, SE=4.24), teacher behaviors hindering learning (B =14.60, SE=5.11), number of students per teacher (B =-1.50, SE=0.70), community in which the school is located (B =23.72, SE=8.20), ownership type of the school (B =-35.31, SE=11.20) and academic type of the school (B =-85.78, SE=14.96) for Anatolian HSs, (B =-136.13, SE=15.41) for Vocational HSs and (B =- 161.08, SE=17.15) for Other HSs variables were found to be significant predictors of mathematics achievement.

The variables eliminated at the backward selection procedure were at the student level; the number of classes skipped by the student in the last two weeks and the age of beginning primary education, and at the school level, the ratio of male students in the school, the teacher having a master's degree, staff shortage, the way career guidance is given at the school, the number of students in the school and the number of students in the class.

Tablo 4. Türkiye Specific Full and Final Model Results

			Full Model		Final Model
			B	(SE)	B	(SE)
Within Level
	ESCS		5.18	(1.30)***	5.16	(1.29)***
	I Skipped Some Classes in the Last Two Weeks		-0.57	(2.40)
	Primary Education Beginning Age		4.82	(3.35)
	Residual Variance		3595.50	(129.02)***	3600.93	(128.16)***
Between Level
	Ratio of Male Students in the School		-29.53	(15.14)
	Ratio of Teachers with Master’s Degree		14.65	(25.91)
	Educational Material Shortage		-10.53	(4.15)**	-9.33	(3.93)*
	Staff Shortage		0.62	(3.83)
	Student Behaviors Hindering Learning		-18.35	(4.01)***	-21.50	(4.24)***
	Teacher Behaviors Hindering Learning		10.74	(4.31)**	14.60	(5.11)**
	Career Guidance (Voluntarily or Planned)		6.27	(7.65)
	Are Students Grouped by Ability into Different Classes?		18.82	(10.58)
	Student-Teacher Ratio		-1.89	(0.65)**	-1.50	(0.70)*
	School Size		0.01	(0.01)
	Class Size		-4.29	(7.46)
	The Community in Which Your School is Located?		24.78	(8.65)**	23.72	(8.20)**
	School Ownership (Public - Private)		-30.30	(10.93)**	-35.31	(11.20)**
	Stratum as School's Academic Type
		Others	-157.39	(17.30)***	-161.08	(17.15)***
		Anatolian High School	-85.54	(14.96)***	-85.78	(14.96)***
		Vocational and Technical Anatolian High School	-135.48	(16.10)***	-136.13	(15.41)***
	Residual Variance		1263.69	(189.00)***	1395.06	(220.65)***

*<.05, **<.01, ***<.001

The results of this study should provide information for future studies in education policies. In this study, firstly, the ICC values of the countries participating in the PISA 2018 cycle were obtained as an indicator of the homogeneity of mathematics achievement among schools. A low ICC indicates relatively small differences between schools, indicating that schools tend to perform at comparable levels. A higher ICC, on the other hand, indicates greater differences between schools. This means that some schools achieve very high levels of performance while others remain at very low levels.

Interpretation of ICC values

The ICC values of mathematics achievement in 79 countries participating in the PISA 2018 cycle were calculated and the homogeneity/heterogeneity between schools was studied. The largest ICC value for the unconditional model belongs to the Netherlands (= 0.61) and the smallest ICC value belongs to Iceland (=0.08). Türkiye has the second highest ICC value (=0.57). The mean of unconditional ICCs for all countries is 0.33, the median is 0.32, and the standard deviation is 0.12. ICC values of countries such as Finland, Albania, Iceland, Norway, Russia (Moscow region), Ireland, Spain, Denmark, Azerbaijan (Baku), Canada, New Zealand and Moldova are below 0.20 and all schools in these countries perform relatively similarly. In countries with high ICC values larger .50, such as the Netherlands, Türkiye, Hungary, Serbia, Slovenia and Belgium, schools perform substantially differently and therefore raises questions of inequality in learning opportunities. Analyzing the 2003 and 2012 PISA cycles, Brunner et al. (2018) reported that the ICC value in mathematics achievement varied between 0.06 (observed in Finland) and 0.61 (observed in the Netherlands) and the median value was 0.41. When compared to the ICC values in mathematics achievement in PISA 2018, the country with the highest ICC value is still the Netherlands, but the median ICC value was decreased from 0.41 to 0.33. Zopluoğlu (2012) analyzed the TIMSS data cycles between 1995 and 2007, revealed that the variance in mathematics achievement for 8^th graders varied between 0.30 and 0.36, similar findings to those reported in this study. When focused on Türkiye, the unconditional ICC value of PISA 2018 mathematics achievement is 0.57, while Zopluoğlu (2012) reported unconditional ICC values of TIMSS mathematics achievement conducted between 1995-2007 vary between 0.30 and 0.40. This results for Türkiye can be interpreted as the achievement differences carry on through higher grades.

The conditional ICC model included student-level variables such as socio-economic status, not attending some classes, age at starting primary education, immigration status, and school-level variables such as school ownership type (public-private), lack of educational staff, student behaviors that prevent learning, teacher behaviors that prevent learning, and the community in which the school is located. When the predictors were added, the largest ICC value was 0.50 (Netherlands) and the smallest ICC value was 0.05 (Iceland). The conditional ICC value for Türkiye decreased to 0.43. The mean and median of conditional ICCs for all countries were 0.21 and the standard deviation was 0.10.

Principals of disadvantaged schools in more than half of the countries and economies participating in PISA reported that their schools' capacity to provide education and teaching were hindered by the lack of educational materials, lack or inadequacy of teaching staff (OECD, 2019). In these systems, students face a double disadvantage: disadvantages from their own families and disadvantages created by the school system.

Interpretation of meta-analysis results

By including moderators in the analysis with the meta-analysis method, the variance components in mathematics achievement that could not be explained by the variables in the PISA 2018 data set were examined. For this, tracking age, HDI, Gini index, GDP, GEX and Hofstede's six cultural dimensions individualism, masculinity, uncertainty avoidance, long-term orientation, power distance, and indulgence moderators were selected. The highest correlation coefficient between the moderators used in the meta-analysis was obtained as 0.76 between individualism and HDI, assuring the absence of multicollinearity. HDI is an indicator of human development consisting of the dimensions of a long and healthy life, knowledge and a reasonable standard of living, and individualism is an indicator of individuals' feeling of belonging to the society they live in and its impact on cultural differences. It can be said that in countries with a high human development index, the rate of individuals feeling like they belong to society is also high.

Meta-analysis results revealed some factors that explained the heterogenity in mathematics achievement across schools. The predictors in the meta-analysis explained approximately 71% of the heterogeneity in conditional ICC values. The important moderators were as follows: (1) the tracking age applied by the countries in their education systems (2) the uncertainty avoidance index, (3) the gross national product of the countries.

The overall ICC average in countries where the tracking age was 16 was 0.15. The ICC value increased by 0.04 units for every 1-year decrease in the tracking age. For example, while the ICC value increased by 0.04 units in countries where the tracking age was 15, it increased by 0.08 units in countries where the tracking age was 14, and by 0.12 units in countries where the tracking age was 13. Hanushek and Wössmann (2006) obtained similar results in their study comparing the variability in achievement between countries that implement tracking and those that do not. Using country-level data on PIRLS, TIMSS and PISA the authors found that early tracking significantly increases inequality. Specific findings suggest that low performers in the achievement rankings are more harmed by early tracking than high performers. Montt (2011) has shown that reducing variability in learning opportunities within the school system (in the form of no tracking system) reduces achievement inequality. In addition, there are studies showing that schools with high achievement levels tend to be more homogeneous in their achievement distribution, but even among schools with the same success level, the score distributions of schools differ depending on classroom practices (Kim & Choi, 2008). Schnepf (2003), on the other hand, suggested that children with higher socioeconomic family structures perform better. This discrimination can be harmful to children from disadvantaged backgrounds (Jackson, 2013). As the tracking age is lower families are more likely to have a strong influence on their children's educational choices (Checchi & Flabbi, 2007; Contini & Scagni, 2011). Educational inequalities have been shown to decrease strongly in education systems that change from tracking to comprehensive education. In particular, findings have shown that reform in England and Wales were effective in reducing inequalities in mathematics performance (Van De Werfhorst, 2018).

Strello et al. (2021) examined the effect of tracking age on educational, social and performance outcomes in their research using PISA, PIRLS and TIMSS data. They showed that early tracking increased the proportion of students who did not reach the TIMSS intermediate achievement threshold or level 2 in PISA by approximately 1.5%. Although they did not find evidence that tracking increased performance levels, they found that an extra year spent in the tracking system reduced the standard deviation of countries' achievement scores by approximately 0.77 points. Based on these findings, they showed that delaying the tracking age by 5 years (for example, postponing the tracking age from the fourth grade to the eighth grade) dramatically reduced the inequality of achievement scores. Similarly, our study revealed that each additional year of exposure to the tracking system causes an increase of 0.04 units in the ICC value, and as a result, each additional year spent in the tracking system increases the variance between schools and causes educational inequality.

Hofstede's Uncertainty Avoidance Index is a cultural dimension that measures the degree to which a society or culture tolerates or avoids uncertainty and risk. This dimension evaluates how behaviors and values differ in situations of uncertainty across cultures. When the uncertainty avoidance indexes for 2018 are examined, high scores are obtained for countries such as Uruguay, Portugal, Russia, Malta, Belgium, Japan, Korea, Poland, Serbia, Peru, Panama, Slovakia, Costa Rica, Mexico, Argentina, France, Spain, and Türkiye. Japan and South Korea also have high scores. The German-speaking countries Austria, Germany and Luxemburg have medium-high scores. Canada, the United Kingdom, Hong Kong, Ireland, Malaysia, the Philippines, New Zealand, China, Switzerland, the United States and Singapore had average or low scores. The results of the meta-analysis showed that for every 1 standard deviation increase in the uncertainty avoidance index, the ICC value in mathematics achievement increased by 0.02 units. This means that as the uncertainty avoidance index positively associated with the difference in mathematics achievement between schools. The relationship between Hofstede's Uncertainty Avoidance Index and education can be complex and contextual. In cultures with a high uncertainty avoidance index, well-defined curricula, strict rules and regulations, and a focus on traditional and institutionalized knowledge may be preferred. For example, Geert Hofstede made some observations in a course he gave within the scope of the International Teachers Program (ITP). Germans preferred structured learning situations with clear objectives, detailed assignments, and strict timelines, and situations where there was only one correct answer and expected to be rewarded for correct answers. These preferences are typical for countries where uncertainty avoidance is stronger. On the other hand, most British participants preferred open-ended learning situations with vague goals, extensive assignments and no timetable, and the idea that there could be only one correct answer was seen as taboo for them. In addition, they expect to be rewarded for originality. These characteristics are typical for countries with weak uncertainty avoidance. Cultures with a low uncertainty avoidance index may be more open to more innovative educational approaches, while cultures with high uncertainty avoidance have been found to prefer authoritarian teaching styles where teachers are seen as experts and students are expected to follow rules and instructions. It can be said that in cultures where uncertainty avoidance is low, critical thinking and creativity are encouraged with a more participatory and open approach to education. The content of the training curriculum may be affected by uncertainty. High uncertainty avoidance cultures may prioritize topics that provide clear answers and practical skills, while low uncertainty avoidance cultures may be more open to exploring abstract and theoretical concepts. A high uncertainty avoidance index in a culture may lead individuals to be risk averse in their career choices and educational endeavors. Therefore, they may prefer fields and careers that offer predictability. In contrast, low uncertainty avoidance cultures may have greater risk tolerance, leading to more diverse career choices and educational opportunities. In today's global world, people from different cultural backgrounds often work together. Understanding the meaning and implications of the uncertainty avoidance index can help individuals and institutions manage cultural differences in educational and professional environments (Hofstede et al., 2010b). Minelgaitė et al. (2019) conducted regression analysis (n = 70 countries) to examine how cultural values and a country's economic prosperity affect two administrative indicators such as leadership and autonomy in education. They showed that social cultural dimensions measured by the Hofstede model are important predictors for both indicators. In the study, a negative relationship was determined between school autonomy and uncertainty avoidance dimensions. Based on this result, it can be argued that the uncertainty avoidance index is high in countries with low school autonomy.

GDP is an economic indicator that measures the total value of goods and services produced by a country during a specific period. This index provides information about a country's economic growth, welfare level and economic performance. When the GDP indexes of 2018 are examined, it is seen that the top 10 countries with the highest index are the United States, China, Japan, Germany, England, France, Italy, Brazil and Canada, respectively. Türkiye ranks 19th in the GDP index ranking. In addition, for every 1 standard deviation increase of the GDP, the ICC value increased by 0.03 units. Since improving the quality of schools has high economic returns, there is a positive relationship between the quality of education and the GDP index (Hanushek & Woessmann, 2012). Education is an important factor for the economic growth and development of a country. A good education system contributes to the development of human resources, increases workforce productivity and the emergence of innovation potential. In their article, Hanushek and Woessmann (2012) presented evidence that the strong relationship between cognitive skills and economic growth reflects the causal effect of cognitive skills and that effective school policy supports economic growth. The authors argue that improved schools play an important policy role in promoting economic growth. They conclude that differences in cognitive skills, which are greatly influenced by schools, have a strong and positive impact on economic growth and that improving the quality of schools can lead to higher rates of economic growth. The authors also emphasize the importance of basic literacy and high achievers in supporting economic growth. They found that providing broad primary education as well as guiding more students to higher levels of achievement has strong economic returns. Based on these results, it can be argued that higher economic growth rates can be achieved through policies that effectively increase student success and improve the quality of schools. Individuals with a higher level of education generally have the opportunity to access better jobs and higher incomes. Psacharopoulos (1994) stated that education investments can contribute to economic growth and reducing income inequality. Additionally, the social benefits of education and its effects on the general welfare of society are also discussed; the average rate of return for one more year of education is 10%. This means that, on average, individuals experience a 10 percent increase in earnings for each additional year of education. In his research on TIMMS data, Zopluoğlu (2012) placed countries at four different GDP levels and calculated a two-level unconditional ICC estimate across countries at each GDP level but he could not find a systematic relationship between the GDP index of the countries and the unconditional ICC values at the 8^th grade level. However, unconditional ICC estimates for mathematics and science performance in 2003 and 2007 tended to be lower in countries with higher GDP indexes. Similarly, Mohammadpour and Abdul Ghafar (2014), in their multilevel analysis on 2007 TIMMS data, did not find a significant connection between national per capita income and mathematics achievement, inconsistent with the result obtained by Chiu (2007). However, they found that the correlation coefficient between these two factors was positive and significant. In contrast, they found that national-level Socio-economic Status (ESCS) was strongly associated with achievement. This suggests that in higher ESCS countries, per capita income affects student achievement indirectly through indicators of socio-economic status (such as parents' education, educational resources at home). Meyer and Schiller (2013) found that countries' GDP index and expenditure per student in secondary education account for approximately two-thirds of the variance in average PISA scores between countries. The standardized coefficients show that a one standard deviation change in GDP is associated with a one standard deviation difference in mean PISA scores around 0.63. It has also been determined that countries' spending per student in secondary education causes a significant difference in the average PISA performance. These results suggest that a country's economic wealth is associated with PISA results and support the idea that economic resources are an important extracurricular factor affecting PISA performance. This finding supports the results of our study. An educated workforce increases productivity by contributing to the development of higher technology and knowledge-based sectors. Therefore, a country's level of education is often associated with economic growth and prosperity. While a higher GDP is often associated with a better education system, a better education system also supports economic growth.

Interpretation of ICC from the Türkiye specific model

When the results are compared in Table 2, it is observed that although there is a decrease in ICC values after the predictors are included in the model, there still exist unexplained variance in the differences in mathematics achievement between schools. For this reason, school-level predictors academic school type, the way career guidance given at school, student-teacher ratio, school size, class size, ratio of teachers with master’s degree and whether student grouped by ability into different classes, which were in the PISA 2018 data, and were not initially included in the analysis for the first research question and are thought to have an impact on explaining the variance in mathematics achievement included to Türkiye specific model. By including these variables to obtain the full model ICC value was found 0.26. The final model, which was obtained by stepwise backward selection procedure, includes ESCS at the student level, educational material shortage, student behaviors hindering learning, teacher behaviors hindering learning, number of students per teacher, community in which the school is located, ownership type of the school and academic type of the school at school level as significant predictors of mathematics achievement. After elimination of eight variables the final model was obtained and ICC was found 0.28.

While significant variables were examined, to check results in a simple way, continuous variables were recoded into four categories to represent quartiles, while nominal and ordinal variables were left as they were in the analysis, and the first plausible value (PV1) for mathematics achievement mean was calculated for each category and comparisons were made.

ESCS was the only significant predictor of mathematics achievement at student level. The PV1 mean for the students with the highest ESCS level was calculated as 499.93, while the mean of those with the lowest ESCS was 426.62. Similar to our findings, in many studies (Alacacı & Erbaş, 2010; Aslanargun et al., 2016; A. Aydın et al., 2012; Demir et al., 2009; Ersan & Rodriguez, 2020; Gelbal, 2010; Karaman & Atar, 2019; Özdemir, 2015; Özel et al., 2013; Özkan & Güvendir, 2014; Schneider, 2018; Sevgi, 2009; Suna & Özer, 2021) the student's socio-economic level appears to be a significant predictor in explaining the variance in mathematics achievement. Anderson et al. (2009) examined five studies by Goh (2006), Gu (2007), Ross (2008), Hsu (2007), and Kennedy (1999), which investigated variables that have an impact on educational performance through multilevel modeling using PISA 2003 data. They found that student socioeconomic background was a consistently important variable, but the magnitude of the relationship varied from country to country. Additionally, they argued that this means that the impact of the socio-economic status variable on achievement is not universal and needs to be carefully considered within a particular educational context. Student-level variables such as primary education beginning age and skipping some classes were not found to be significant predictors of the difference in PISA 2018 mathematics achievement.

Considering educational material shortage, one of the school level predictors, PV1 mean where the schools reported the least material deficiency, was 465.91, and 420.72 for schools reported the most material defiency. It is noteworthy that the difference between the achievement mean of the schools reporting the least educational material shortage and the schools reporting the most educational material shortage is 45 points. In their research with PISA 2015 data, Yetişir and Batı (2021) revealed that the variable of educational material shortage was not a significant variable in explaining the variance in science achievement scores. Compared to the findings of this study, educational material shortage was found to be a significant predictor in explaining the variance in PISA 2018 mathematics achievement.

In the context of the variables of student and teacher behaviors that hinder learning, higher values of these variables indicate greater learning hindrance. Based on the variable of student behaviors hindering learning, PV1 mean of students in schools with the most positive school climate, first quartile, was 489.83, and the mean of students in schools with the least positive school climate, the fourth quartile, was 411.41. Similarly, based on the variable of teacher behaviors hindering learning, the mathematics PV1 mean of students in the first quartile, most positive in behaviors, was 458.12, and the mean of the students in the fourth quartile, least positive in behaviors, was 434.98. It is observed that as the scores of student and teacher behaviors hindering learning increase, the average achievement in mathematics decreases. Ayebale et al. (2020) examined twenty-six articles on assessing student achievement in mathematics. In five of these studies (Emeke & Benedicta, 2015; Mji & Makgato, 2006; Paksu, 2008; Roh, 2003; Schenkel, 2009) student behaviors hindering learning and in ten of these studies (Kearney & Garfield, 2019; Kele, 2018; Mazana et al., 2020; Mji & Makgato, 2006; Onderi et al., 2015; Paksu, 2008; Rasid et al., 2020; Sule & Yusuf, 2018; Tamukong Ndifor Mariana Ngeche, 2017; Wamala et al., 2013) teacher behaviors hindering learning variables were significant predictors. Similar to the findings in our study, Yetişir and Batı (2021) found that the variable of student behaviors hindering learning negatively affects PISA 2015 science performance and is a significant predictor, but contrary to the findings in our study, the variable of teacher behaviors hindering learning was not found as a significant predictor.

Based on the OECD (2020) report, among the countries and economies participating in PISA 2018, although the relationship between mathematics performance and guidance at school was found to be statistically significant after taking into account the gross national product per capita, no significant relationship was found with reading performance. Therefore, the report states that more research is needed to better understand the relationship between career guidance and student performance. In this study, career guidance variable was found to be a nonsignificant predictor in explaining the variance in mathematics achievement for the Turkish sample.

Student teacher ratio was another significant predictor in our Türkiye specific model. Investigating studies based on PISA mathematics performance in Türkiye, unlike our findings in this study, A. Aydın et al. (2012)) did not find student teacher ratio to be a significant variable in explaining the variance in mathematics performance in PISA 2012, but Alacacı and Erbaş (2010) found this variable to be a significant variable explaining the variance in mathematics achievement in PISA 2006. It is seen that the number of students per teacher in Türkiye decreased from 2006 to 2018. While the number of students per teacher was 18 in 2006 (MEB, 2007), it decreased to 16 in 2012 (MEB, 2013) and to 12 in 2018 (MEB, 2018). It should be emphasized that one of the reasons for the inconsistency in the studies examined may be the decrease in the ratio of students per teacher over the years, and another reason may be the non-linear association between achievement and student-teacher ratio. Reasons of this inconsistency can be studied in future research.

Another significant variable in explaining the variance in mathematics achievement is schools’ academic type. In some education systems, low performers tend to be dispersed across a variety of schools, while in others low performers tend to cluster in particular schools, often compounded by social disadvantages. In systems where low-performing students are mostly placed in certain types of schools, such as Germany, Hungary, Israel, Lebanon, the Netherlands, the Slovak Republic and Türkiye, it is especially important to ensure that low-performing schools receive adequate education (Schleicher, 2018). In Türkiye, students studying in the 8th grade of secondary school optionally take the central exam at the end of this grade. Students who fall within a certain percentile as a result of the exam can choose Anatolian High Schools, Anatolian Imam Hatip High Schools, Science High Schools, Social Sciences High Schools, and Multi-Program Anatolian High Schools. Fine Arts High Schools and Sports High Schools accept students through aptitude tests. Anatolian High Schools, which accept students without examination, also accept students based on address. Average mathematics scores vary between 320.26 and 595.33 depending on school types.

As we grouped schools into four categories in Türkiye specific model, the highest PV1 average was 552.93 for Science and Social Sciences High Schools, separately PV1 averages were 595.34 and 510.89 respectively. Students studying in these high schools performed above the OECD average in mathematics (MEB, 2019). Anatolian High Schools and Vocational and Technical Anatolian schools included as separate categories in analysis and PV1 averages were 481.83 and 412.64 respectively. Finally, the others category PV1 average was 415.65 which included Lower Secondary, Anatolian Preacher and İmam, Multi Program Anatolian and Fine Arts High Schools combined; PV1 averages separately were 320.26, 430.96, 373.94 and 393.13 respectively. It can be concluded that the high mathematics performance of the first group is due to the fact that these schools accept students with the highest standardized test scores of the central exam. (Suna et al., 2020) examined the change in the proportion of students with basic and advanced literacy competencies at the school type level in Turkey in PISA applications between 2003 and 2018. The results showed that almost all students in Science High Schools and Social Sciences High Schools had basic competencies in all three literacy areas. In their study, it was found that although the rates of Anatolian High School and Anatolian İmam Hatip High School students' basic literacy competencies increased over the years, the rates of students reaching advanced competency levels were low in all school types except Science High Schools. The findings of Suna, Tanberkan and Özer (2020) are in parallel with the findings of this study on the effect of the academic type of the school on mathematics achievement. Gümüş and Atalmış (2012), examined 2003 and 2009 PISA data and reported consistent findings, students in Science and Anatolian High Schools showed high performance in mathematics performance, while the achievement of other schools was low.

Whether schools are public or private is another significant predictor explaining the variance in mathematics achievement. Among Science and Social sciences High School students, the average PV1 for those who attended private schools was 593.55 and 548.89 for those who attended public schools. On the other hand, the averages of Anatolian High School students who attended private schools were 463.63 and 400.95, respectively, while the averages of and Vocational High School students who attended public schools were 485.75 and 413.29, respectively. It is seen that the averages of the students attending private schools in science and social sciences high schools are higher than those attending public schools. However, the opposite is the case in Anatolian and Vocational High Schools. Suna et al. (2020) used analysis of covariance to compare the mean scores of students attending public and private schools, controlling for the effect of students' socioeconomic status, and found that students in private schools, performed significantly higher than other students in language, mathematics and science tests. A similar finding was obtained by Aksit (2007) for 2003 PISA mathematics performance. Even though our findings are consistent with Suna et al. (2020) and Aksit (2007), a closer look at the averages by school types invites researchers to study the interaction effect.

The school location variable has been studied in many studies conducted in Türkiye ((lacacı & Erbaş, 2010; Berberoğlu & Kalender, 2005; Ertem, 2021; Gumus & Atalmis, 2012; Özdemir, 2015; Özkan & Güvendir, 2014; Sevgi, 2009; Topçu et al., 2015; Yetişir & Batı, 2021) and was obtained as an important variable explaining the variability in PISA 2018 mathematics achievement. Mathematics PV1 mean of the students in the rural schools was found 426.97, while mean of students from urban aera schools was found to be 466.14. These findings show that there is a difference between the mathematics achievement in rural schools and urban schools.

Finally, the proportion of male students in the school found to be marginally significant. While mathematics PV1 mean of student in schools where there are no or limited number of male students was 455.76, the mean of students in schools where there are no or limited number of female students was 408.75. It has been observed that the proportion of male students in school is negatively correlated with the school’s average mathematics achievement.

Although the variables added to the conditional model caused a certain amount of decrease in ICC values, it is seen that there is still unexplained variance in mathematics achievement. This suggests that there are other predictors that affect variability in mathematics achievement that cannot be measured by PISA student and school surveys.

In our study, school and student level variables related to the variation in 2018 PISA mathematics performance were investigated using multilevel model and meta-analysis frameworks, both internationally and specifically for Türkiye. Unlike other studies in the literature, this study conducted a meta-analysis with moderators that were not included in the PISA 2018 survey.

Our findings showed that the largest ICC value for the unconditional model was observed in the Netherlands, the smallest ICC value in Iceland, while Türkiye had the second highest ICC value. In some countries with ICC values above 0.50, such as the Netherlands, Türkiye, Hungary, Serbia, Slovenia and Belgium, schools do not perform equally and therefore raises the questions on educational inequality. In addition, countries such as Finland, Albania, Iceland, Norway, Russia (Moscow region), Ireland, Spain, Denmark, Azerbaijan, Canada, New Zealand and Moldova had ICC values below 0.20 and all schools in these countries have roughly equal averages. ICC values were also studied in a conditional model. Student-level variables added to the model were socio-economic status, beginning age to primary education, and immigration status. School level variables were selected as school ownership type (public-private), educational staff shortage, student behaviors hindering learning, teacher behaviors hindering learning, the residential area where the school is located, and the proportion of male students in the school. When these variables were added to the model, the largest ICC value was obtained in the Netherlands, the smallest ICC value in Iceland, and Türkiye remained the country with the second highest ICC value. As can be seen from the results, although the variance in mathematics achievement decreased with the predictors, there are still countries with a high rate of unexplained variance. Türkiye is one of the countries with a high ICC value of 0.43 even after controlling for school and student level PISA variables. For this reason, a country specific multilevel model was created for Türkiye.

In addition to the variables previously added to the model for all countries, in the model constructed for Türkiye, school level predictors such as school academic type, guidance at school, student placement procedure into classes, student-teacher ratio, school size, ratio of teachers with masters’ degree in the school, class size variables were modeled and it was investigated how the ICC value changed. The ICC value calculated in this model was 0.26. Among the student-level variables initially included in the analysis, socio-economic status was determined to be significant predictor. Among the school level variables educational material shortage, student behaviors hindering learning, teacher behaviors hindering learning, school academic type, student-teacher ratio, school ownership type and the location of the school were determined to be significant predictors. Although the predictors in the final model explained a significant amount of variability in mathematics achievement for the Turkish sample, it was observed that there was still unexplained variance between schools. When non-significant predictors were removed from the model, the ICC value was obtained as 0.28.

Although the variables added to the conditional model for all countries caused a certain amount of decrease in ICC values, it is seen that there was still unexplained variance in mathematics achievement. This suggests that there are other predictors that affect variability in mathematics achievement that cannot be measured by PISA student and school surveys. For this reason, some variables that are thought to explain the variability in mathematics achievement were obtained based on the literature and a meta-analysis was conducted. The predictors included in the meta-analysis explained approximately 71% of the heterogeneity in conditional ICC values. Our results showed that the moderators that caused substantial decrease in the ICC value were obtained as the tracking age, the uncertainty avoidance index and the gross national product of the countries. Our findings showed that, the ICC value increases by 0.04 units for every 1-year decrease in the tracking age. As a result, it can be argued that each additional year spent in the tracking system increased the variance in mathematics performance between schools and causes educational inequality. The conditional ICC increased by 0.03 units for every 1 standard deviation increase in GDP. Accordingly, it can be argued that a country's economic wealth is associated with PISA results and that economic resources are an important out-of-school factor affecting PISA performance. When the effect of the uncertainty avoidance index is examined, it is observed that for every 1 standard deviation increase in this index, the ICC value increased by 0.02 units. While cultures with a low uncertainty avoidance index may be more open to innovative educational approaches, it has been observed that cultures with high uncertainty avoidance prefer authoritarian teaching styles. It can be said that in cultures with low uncertainty avoidance, critical thinking and creativity are encouraged with a more participatory and open approach to education (Hofstede et al., 2010).

Our findings showed that student and teacher behavior variables that hinder learning have a significant effect on the differentiation in mathematics achievement, supportive classroom environment studies or teacher training can be organized on this issue. Considering that the mean scores of schools that report more lack of educational materials are lower than those that report less, it may be recommended to develop educational material sharing and cooperation policies between schools. It has been observed that the population size of the residential area where the school is located has an impact on school and student success. The achievement of schools in less populated regions are lower than those of schools in more densely populated regions. There may be more competition among students in areas with large populations. This may encourage students to show more motivation and effort. It is important that these effects are managed well and taken into account by educational institutions and administrators to achieve positive results. There are large achievement differences between schools’ academic types in Türkiye. One reason for this may be that students in schools that accept students by examination show higher academic success and are placed in these schools. National standards are needed to standardize the quality of education between different school types and providing education in accordance with these standards, distributing educational resources equally, organizing continuous training and support programs for teachers working in all school types, regular evaluation, and tracking of schools, providing education to students in different school types. In addition, the scope of transferring between school types should be discussed in more detail.

Limitations and future research

In this study, only variables thought to have an impact on PISA 2018 mathematics performance were investigated. This study can be expanded to include 2018 science and reading literacy performances. The same analysis can be performed on PISA 2022 mathematics results and compared with the results of this study. There are unsubstantial limitations for our analyses, for example our models did not include non-linear relations between predictor, for example the interaction between school ownership and school type was not included in the Türkiye specific model. We also merged categories of nominal variables when needed given that our primary focus was on ICC values rather than predictors of achievement scores. Nevertheless, we share the data sets and the codes for those interested in alternative model building strategies. We are also aware that backward selection based on p-values is a controversial topic, however, as stated earlier our primary goal was the computation of ICCs rather than specific predictor effects. Findings from the meta-analysis, especially the effect of tracking age should invite other researchers to investigate this issue. The tracking age clearly has an impact on the ICC value, future research can investigate whether giving the chance to switch between school types would reduce the negative effects of early tracking on achievement.

Author contributions: YA contributed to country-level data collection and writing (particularly introduction and discussion). GC contributed to multilevel data-analysis and writing (particularly methods and results). BA contributed to design and implementation of the research.

Funding: This publication is funded by the Open Access Publication Fund of the Leuphana University Lüneburg.

Availability of data and materials: The main PISA data is available at http://www.oecd.org/ pisa/ data/. All R and Mplus syntax and the data set for the meta-analyses as shared as supplementary.

Declarations Ethics approval and consent to participate: Not applicable.

Consent for publication: Not applicable.

Competing interests: The authors declare that they have no competing interests.

Ainscow, M. (2016). Diversity and equity: A global education challenge. New Zealand Journal of Educational Studies, 51, 143–155.
Aksit, N. (2007). Educational reform in Turkey. International Journal of Educational Development, 27(2), 129–137. https://doi.org/10.1016/j.ijedudev.2006.07.011
Akyüz, G. (2014). The effects of student and school factors on mathematics achievement in TIMSS 2011.
Alacacı, C., & Erbaş, A. K. (2010). Unpacking the inequality among Turkish schools: Findings from PISA 2006. International Journal of Educational Development, 30(2), 182–192.
Anderson, J., Milford, T., & Ross, S. (2009). Multilevel modeling with HLM: Taking a second look at PISA. In Quality Research in Literacy and Science Education: International Perspectives and Gold Standards (pp. 263–286). https://doi.org/10.1007/978-1-4020-8427-0_13
Aslanargun, E., Bozkurt, S., & SARIOĞLU, S. (2016). Sosyo ekonomik değişkenlerin öğrencilerin akademik başarısı üzerine etkileri. Uşak Üniversitesi Sosyal Bilimler Dergisi, 9(27/3), 201–234.
Aydın, A., Sarıer, Y., & Uysal, Ş. (2012). Sosyoekonomik ve sosyokültürel değişkenler açısından PISA matematik sonuçlarının karşılaştırılması. Eğitim ve Bilim, 37(164).
Aydın, M. (2015). Öğrenci ve okul kaynaklı faktörlerin TIMSS matematik başarısına etkisi [doctoralThesis, Necmettin Erbakan Üniversitesi]. https://acikerisim.erbakan.edu.tr/xmlui/handle/20.500.12452/767
Ayebale, L., Habaasa, G., & Tweheyo, S. (2020). Factors affecting students’ achievement in mathematics in secondary schools in developing countries: A rapid systematic review. Statistical Journal of the IAOS, 36, 73–76. https://doi.org/10.3233/SJI-200713
Berberoğlu, G., & Kalender, İ. (2005). Öğrenci Başarısının Yıllara, Okul Türlerine, Bölgelere Göre İncelenmesi: ÖSS ve PISA Analizi. Journal of Educational Sciences & Practices, 4(7).
Brunner, M., Keller, U., Wenger, M., Fischbach, A., & Lüdtke, O. (2018). Between-school variation in students’ achievement, motivation, affect, and learning strategies: Results from 81 countries for planning group-randomized trials in education. Journal of Research on Educational Effectiveness, 11(3), 452–478.
Caro, D. H., & Biecek, P. (2017). intsvy: An R Package for Analyzing International Large-Scale Assessment Data. Journal of Statistical Software, 81(7 SE-Articles), 1–44. https://doi.org/10.18637/jss.v081.i07
Checchi, D., & Flabbi, L. (2007). Intergenerational mobility and schooling decisions in Germany and Italy: The impact of secondary school tracks.
Cheung, M. W. L., & Jak, S. (2016). Analyzing big data in psychology: A split/analyze/meta-analyze approach. Frontiers in psychology, 7, 182798.
Chiu, M. M. (2007). Families, economies, cultures, and science achievement in 41 countries: Country-, school-, and student-level analyses. Journal of Family Psychology, 21(3), 510–519. https://doi.org/10.1037/0893-3200.21.3.510
Chudgar, A., & Luschei, T. F. (2009). National Income, Income Inequality, and the Importance of Schools: A Hierarchical Cross-National Comparison. American Educational Research Journal, 46(3), 626–658. https://doi.org/10.3102/0002831209340043
Close, S., & Shiel, G. (2009). Gender and PISA mathematics: Irish results in context. European Educational Research Journal, 8(1), 20–33.
Cobb-Clark, D. A., Sinning, M., & Stillman, S. (2012). Migrant Youths’ Educational Achievement: The Role of Institutions. The Annals of the American Academy of Political and Social Science, 643(1), 18–45.
Coleman, J. S. (1966). Equality of Educational Opportunity [summary Report (Vol. 1, Issue 1. c.). U.S. Department of Health, Education, and Welfare, Office of Education. https://books.google.com.tr/books?id=ZiciAQAAIAAJ
Contini, D., & Cugnata, F. (2020). Does early tracking affect learning inequalities? Revisiting difference-in-differences modeling strategies with international assessments. Large-Scale Assessments in Education, 8(1), 14. https://doi.org/10.1186/s40536-020-00094-x
Contini, D., & Scagni, A. (2011). Inequality of opportunity in secondary school enrolment in Italy, Germany and the Netherlands. Quality & Quantity, 45(2), 441–464. https://doi.org/10.1007/s11135-009-9307-y
Demir, İ., Kilic, S., & Depren, O. (2009). Factors Affecting Turkish Students’ Achievement in Mathematics. Online Submission, 6(6).
Demir, İ., Kılıç, S., & Ünal, H. (2010). Effects of students’ and schools’ characteristics on mathematics achievement: Findings from PISA 2006. Procedia-Social and Behavioral Sciences, 2(2), 3099–3103.
Dinçer, M. A., & Uysal Kolaşin, G. (2009). Türkiye’de Öğrenci Başarısında Eşitsizliğin Belirleyicileri.
Domina, T., Penner, A., & Penner, E. (2017). Categorical inequality: Schools as sorting machines. Annual Review of Sociology, 43, 311–330.
Dustmann, C., Frattini, T., & Lanzara, G. (2012). Educational achievement of second-generation immigrants: An international comparison. Economic Policy, 27(69), 143–185.
Emler, T. E., Zhao, Y., Deng, J., Yin, D., & Wang, Y. (2019). Side effects of large-scale assessments in education. ECNU Review of Education, 2(3), 279–296. https://doi.org/10.1177/2096531119878964
Erdoğan, E., & Acar Güvendi̇r, M. (2019). Uluslararası Öğrenci Değerlendirme Programında Öğrencilerin Sosyoekonomik Özellikleri ile Okuma Becerileri Arasındaki İlişki. Eskişehir Osmangazi Üniversitesi Sosyal Bilimler Dergisi, 20, 493–523. https://doi.org/10.17494/ogusbd.548530
Ersan, O., & Rodriguez, M. C. (2020). Socioeconomic status and beyond: A multilevel analysis of TIMSS mathematics achievement given student and school context in Turkey. Large-Scale Assessments in Education, 8, 1–32.
Ertem, H. Y. (2021). Examination of Turkey’s PISA 2018 reading literacy scores within student-level and school-level variables. Participatory Educational Research, 8(1), Article 1. https://doi.org/10.17275/per.21.14.8.1
Fox, J., & Weisberg, S. (2019). An R Companion to Applied Regression (Third). Sage. https://socialsciences.mcmaster.ca/jfox/Books/Companion/
Gelbal, S. (2010). Sekizinci sınıf öğrencilerinin sosyoekonomik özelliklerinin Türkçe başarısı üzerinde etkisi. Eğitim ve Bilim, 33(150).
Gumus, S., & Atalmis, E. H. (2012). Achievement gaps between different school types and regions in Turkey: Have they changed over time. Mevlana International Journal of Education, 2(2), 50–66.
Gürsakal, S. (2012). PISA 2009 Öğrenci Başarı Düzeylerini Etkileyen Faktörlerin Değerlendirilmesi. Suleyman Demirel University Journal of Faculty of Economics & Administrative Sciences, 17(1).
Hallinan, M. T. (1994). Tracking: From theory to practice. Sociology of Education, 67(2), 79–84.
Hallquist, M. N., & Wiley, J. F. (2018). Mplus Automation: An R package for facilitating large-scale latent variable analyses in Mplus. Structural Equation Modeling, 25(4), 621–638. https://doi.org/10.1080/10705511.2017.1402334
Hanushek, E. A. (1997). Assessing the effects of school resources on student performance: An update. Educational Evaluation and Policy Analysis, 19(2), 141–164.
Hanushek, E. A., & Woessmann, L. (2011). The Economics of International Differences in Educational Achievement. In Handbook of the Economics of Education (Vol. 3, pp. 89–200). Elsevier. https://doi.org/10.1016/B978-0-444-53429-3.00002-8
Hanushek, E. A., & Woessmann, L. (2012). Do better schools lead to more growth? Cognitive skills, economic outcomes, and causation. Journal of Economic Growth, 17, 267–321.
Hanushek, E. A., & Wössmann, L. (2006). Does educational tracking affect performance and inequality? Differences‐in‐differences evidence across countries. The Economic Journal, 116(510), C63–C76.
Hofstede, G. H., Hofstede, G. J., & Minkov, M. (2010). Cultures and organizations: Software of the mind: intercultural cooperation and its importance for survival (3rd ed). McGraw-Hill.
IEA. (2024). History About IEA. https://www.iea.nl/about/org/history
Jackson, M. (2013). Determined to succeed?: Performance versus choice in educational attainment. Stanford University Press.
Kalaycioglu, D. B. (2015). The influence of socioeconomic status, self-efficacy, and anxiety on Mathematics achievement in England, Greece, Hong Kong, the Netherlands, Turkey, and the USA. Educational Sciences: Theory and Practice, 15(5), 1391–1401.
Kaplan, J. (2023). fastDummies: Fast Creation of Dummy (Binary) Columns and Rows from Categorical Variables. https://CRAN.R-project.org/package=fastDummies
Karaman, P., & Atar, B. (2019). The Effects of Student and School Level Characteristics on Academic Achievement of Middle School Students in Turkey. Journal of Measurement and Evaluation in Education and Psychology, 10(4), 391–405.
Kim, J., & Choi, K. (2008). Closing the gap: Modeling within-school variance heterogeneity in school effect studies. Asia Pacific Education Review, 9, 206–220.
Kılıç, S., Çene, E., & Demir, İ. (2012). Comparison of Learning Strategies for Mathematics Achievement in Turkey with Eight Countries. Educational Sciences: Theory and Practice, 12(4), 2594–2598.
Laukaityte, I., & Rolfsman, E. (2020). Low, medium, and high-performing schools in the Nordic countries. Student performance at PISA Mathematics 2003-2012. Education Inquiry, 11(3), 276–295.
Laukaityte, I., & Wiberg, M. (2017). Using plausible values in secondary analysis in large-scale assessments. Communications in Statistics - Theory and Methods, 46(22), 11341–11357. https://doi.org/10.1080/03610926.2016.1267764
Lavy, V. (2015). Do Differences in Schools’ Instruction Time Explain International Achievement Gaps? Evidence from Developed and Developing Countries. The Economic Journal, 125(588), F397–F424. https://doi.org/10.1111/ecoj.12233
Lee, V. E. (2000). Using hierarchical linear modeling to study social contexts: The case of school effects. Educational Psychologist, 35(2), 125–141.
MEB. (2007). Millî Eğitim İstatistikleri, Örgün Eğitim 2006-2007. Milli Eğitim Bakanlığı. https://sgb.meb.gov.tr/istatistik/meb_istatistikleri_orgun_egitim_2006_2007.pdf
MEB. (2013). Millî Eğitim İstatistikleri, Örgün Eğitim 2012-2013. Milli Eğitim Bakanlığı.
MEB. (2018). Millî Eğitim İstatistikleri, Örgün Eğitim 2017/’18. Millî Eğitim Bakanlığı.
MEB. (2019). PISA 2018 Türkiye Ön Raporu. http://pisa.meb.gov.tr/www/pisa-2018-turkiye-on-raporu-yayimlandi/icerik/3
Meyer, H.-D., & Schiller, K. (2013). Gauging the role of non-educational effects in large-scale assessments: Socio-economics, culture and PISA outcomes. PISA, Power, and Policy: The Emergence of Global Educational Governance, 207–224.
Minelgaitė, I., Nedzinskaitė, R., & Kristinsson, K. (2019). Societal cultural dimensions and GDP as predictors of educational leadership and school autonomy indicators’ in PISA. Pedagogika, 131(3), 233–251. https://doi.org/10.15823/p.2018.44
Mohammadpour, E., & Abdul Ghafar, M. N. (2014). Mathematics Achievement as a Function of Within- and Between-School Differences. Scandinavian Journal of Educational Research, 58(2), 189–221. https://doi.org/10.1080/00313831.2012.725097
Montt, G. (2011). Cross-national differences in educational achievement inequality. Sociology of Education, 84(1), 49–68.
Muthen, L. K. (2015, September 9). Mplus Discussion. Missing Data with Categorical Variables. http://www.statmodel.com/discussion/messages/22/7554.html?1477079819
Muthen, L. K., & Muthen, B., Muthén &. Muthén (Los Angeles). (2017). Mplus Version 8 User’s Guide. Muthen \& Muthen.
Norman, G. (2010). Likert scales, levels of measurement and the “laws” of statistics. Advances in Health Sciences Education: Theory and Practice, 15(5), 625–632. https://doi.org/10.1007/s10459-010-9222-y
OECD. (2004). Learning for Tomorrow’s World: First Results from PISA 2003 (pp. 159–205).
OECD. (2006). “Between- and within-school Variation in the Mathematics Performance of 15-year-olds”, in Education at a Glance 2006: OECD Indicators. OECD Publishing. https://doi.org/10.1787/eag-2006-6-en.
OECD. (2009). Plausible Values. https://doi.org/10.1787/9789264056275-7-en
OECD. (2012). Equity and Quality in Education: Supporting Disadvantaged Students and Schools. OECD. https://doi.org/10.1787/9789264130852-en
OECD. (2018). Global Competency for an Inclusive World. OECD.
OECD. (2019). PISA 2018 Technical Report. https://www.oecd.org/pisa/data/pisa2018technicalreport/
OECD. (2020). PISA 2018 Results (Volume V): Effective Policies, Successful Schools. OECD. https://doi.org/10.1787/ca768d40-en
OECD. (2023). Frequently Asked Questions. https://www.oecd.org/pisa/pisafaq/
OECD. (2018a). Equity in Education: Breaking Down Barriers to Social Mobility: Vol. a. OECD. https://doi.org/10.1787/9789264073234-en
Ötken, Ş. (2021). PISA 2012’DE Öğrencilerin Matematik Başarısını Sınıflayan Değişkenlerin Belirlenmesi. The Journal of Social Science, 5(9), 241–249. https://doi.org/10.30520/tjsosci.871481
Özdemir, C. (2015). Relationship between equity and excellence in education: Multilevel analysis of international student assessment data with a focus on turkey.
Özel, Z. E. Y., Özel, S., & Thompson, B. (2013). Türkiye’deki Sosyoekonomik Seviyeye Bağlı Matematik Başarı Farklılıklarının Avrupa Birliği Ülkeleri ile Karşılaştırılması. Eğitim ve Bilim, 38(170).
Özkan, Y. Ö., & Güvendir, M. A. (2014). Socioeconomic Factors of Students’ Relation to Mathematic Achievement: Comparison of PISA and ÖBBS. International Online Journal of Educational Sciences, 6(3).
Parker, P. D., Marsh, H. W., Jerrim, J. P., Guo, J., & Dicke, T. (2018). Inequity and Excellence in Academic Performance: Evidence From 27 Countries. American Educational Research Journal, 55(4), 836-858. https://doi.org/10.3102/0002831218760213
Perry, L. B., & McConney, A. (2010). Does the SES of the school matter? An examination of socioeconomic status and student achievement using PISA 2003. Teachers College Record, 112(4), 1137–1162.
Piopiunik, M. (2014). The effects of early tracking on student performance: Evidence from a school reform in Bavaria. Economics of Education Review, 42, 12–33. https://doi.org/10.1016/j.econedurev.2014.06.002
Psacharopoulos, G. (1994). Returns to investment in education: A global update. World Development, 22(9), 1325–1343. https://doi.org/10.1016/0305-750X(94)90007-8
R Core Team. (2021). A Language and Environment for Statistical Computing. In R Foundation for Statistical Computing (p. https://www.R-project.org) [Computer software]. R Foundation for Statistical Computing. http://www.r-project.org
Rhemtulla, M., Brosseau-Liard, P. É., & Savalei, V. (2012). When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions. Psychological Methods, 17(3), 354–373. https://doi.org/10.1037/a0029315
Robitzsch, A. (2020). Why Ordinal Variables Can (Almost) Always Be Treated as Continuous Variables: Clarifying Assumptions of Robust Continuous and Ordinal Factor Analysis Estimation Methods. Frontiers in Education, 5. https://www.frontiersin.org/articles/10.3389/feduc.2020.589965
Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys (1st ed.). Wiley. https://doi.org/10.1002/9780470316696
Rutkowski, L., Gonzalez, E., Joncas, M., & von Davier, M. (2010). International Large-Scale Assessment Data: Issues in Secondary Analysis and Reporting. Educational Researcher, 39(2), 142–151. https://doi.org/10.3102/0013189X10363170
Scherer, R., Siddiq, F., & Nilsen, T. (2024). The potential of international large-scale assessments for meta-analyses in education. Large-scale Assessments in Education, 12(1), 1-35.
Schleicher, A. (2018). Insights and interpretations. Pisa 2018, 10.
Schneider, B. (Ed.). (2018). Handbook of the sociology of education in the 21st century. Springer Berlin Heidelberg.
Schnepf, S. V. (2003). Inequalities in secondary school attendance in Germany.
Sevgi, S. (2009). The connection between school and student characteristics with mathematics achievement in Turkey.
Snijders, T. A. B., & Bosker, R. J. (2011). Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling. SAGE Publications. https://books.google.com.tr/books?id=N1BQvcomDdQC
Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling/Tom AB Snijders, Roel J. Bosker.
Strello, A., Strietholt, R., Steinmann, I., & Siepmann, C. (2021). Early tracking and different types of inequalities in achievement: Difference-in-differences evidence from 20 years of large-scale assessments. Educational Assessment, Evaluation and Accountability, 33, 139–167.
Sullivan, G. M., & Artino, A. R. (2013). Analyzing and Interpreting Data From Likert-Type Scales. Journal of Graduate Medical Education, 5(4), 541–542. https://doi.org/10.4300/JGME-5-4-18
Suna, H. E., & Özer, M. (2021). Türkiye’de sosyoekonomik düzey ve okullar arası başarı farklarının akademik başarı ile ilişkisi. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 12(1), 54–70.
Suna, H. E., Tanberkan, H., Gür Bekir, Perc, M., & Özer, M. (2020). Socioeconomic status and school type as predictors of academic achievement. Journal of Economy Culture and Society, 61, 41–64.
Suna, H. E., Tanberkan, H., & Özer, M. (2020). Türkiye’de öğrencilerin okuryazarlık becerilerinin yıllara ve okul türlerine göre değişimi: Öğrencilerin PISA uygulamalarındaki performansı. Journal of Measurement and Evaluation in Education and Psychology, 11(1), 76–97.
Thien, L. M. (2016). Malaysian students’ performance in mathematics literacy in PISA from gender and socioeconomic status perspectives. The Asia-Pacific Education Researcher, 25(4), 657–666.
Topçu, M. S., Arıkan, S., & Erbilgin, E. (2015). Turkish students’ science performance and related factors in PISA 2006 and 2009. The Australian Educational Researcher, 42, 117–132.
van Buuren, S., Groothuis-Oudshoorn, K., Robitzsch, A., Vink, G., Doove, L., & Jolani, S. (2015). Package ‘mice’. Computer software.
Van de Werfhorst, H. G. (2018). Early tracking and socioeconomic inequality in academic achievement: Studying reforms in nine countries. Research in Social Stratification and Mobility, 58, 22–32. https://doi.org/10.1016/j.rssm.2018.09.002
Van de Werfhorst, H. G., & Mijs, J. J. (2010). Achievement inequality and the institutional structure of educational systems: A comparative perspective. Annual Review of Sociology, 36, 407–428.
Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of statistical software, 36, 1-48.
Viechtbauer, W. (2022). The Metafor package. Multiple Imputation with the mice and metafor Packages [The metafor Package]. https://www.metafor-project.org/doku.php/tips:multiple_imputation_with_mice_and_metafor
West, B. T., Welch, K. B., & Galecki, A. T. (2014). Linear Mixed Models: A Practical Guide Using Statistical Software, Second Edition. CRC Press.
Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). dplyr: A Grammar of Data Manipulation. https://dplyr.tidyverse.org
Woessmann, L. (2016). The Importance of School Systems: Evidence from International Differences in Student Achievement. Journal of Economic Perspectives, 30(3), 3–32. https://doi.org/10.1257/jep.30.3.3
World Bank Open Data. (2018). World Bank Database. https://data.worldbank.org
World Population Review. (2018). World Population Review. https://worldpopulationreview.com/
Yavuz, H. Ç., Demirtaşlı, R. N., Yalçın, S., & Dibek, M. İ. (2017). Türk öğrencilerin TIMSS 2007 ve 2011 matematik başarısında öğrenci ve öğretmen özelliklerinin etkileri. Eğitim ve Bilim, 42(189).
Yetişir, M. İ., & Batı, K. (2021). The effect of school and student-related factors on PISA 2015 science performances in Turkey. International Journal of Psychology and Educational Studies, 8(2), 170–186.
Zopluoğlu, C. (2012). A cross-national comparison of intra-class correlation coefficient in educational achievement outcomes. Journal of Measurement and Evaluation in Education and Psychology, 3(1), 242–278.

The authors declare no competing interests.

ICCSAM2024LSAEsupp.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

The Intraclass Correlation Coefficient as a Measure of Educational Inequality: An Empirical Study with Data from PISA 2018

Status:

Version 1

Abstract

Introduction

Method

Results

Discussion

Conclusion and Suggestions

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1