Data source
In this study, data sets of 79 out of 80 countries participating in the PISA 2018 cycle were used, Vietnam could not be included due to unavailable student achievement data. PISA collects educational data from students, teachers, school administrators and parents through surveys. The data obtained with the help of these surveys and scales are available in the OECD database as student and school level files and contain many variables at both the student and school levels.
Countries, regions or economies that are or are not OECD members can participate in PISA studies. PISA uses a two-stage sampling design to ensure national representativeness. In the first stage, schools are selected randomly, and in the second stage, it is aimed to randomly select 35 students from among the 15-year-old students in the selected schools. Excluding Vietnam, a total of 606627 students from all countries participated in PISA 2018. The total number of schools from the countries participating in the study is 21752. In some countries, data on some variables were not collected. For example, there is no data on the number of boys and girls in schools for Austria, New Zealand, and Canada, and on school ownership type (i.e., SCHLTYPE) for Belgium and Ireland. Another exceptional situation is that the variance of some independent variables in some country data sets is zero. For example, for Israel the school type is the same for all schools. In both cases, variables with all observations missing or with zero variance could not be included in the analysis for the relevant countries (see section 1 in supplementary for additional details).
Analytical approach
For the first research question, conditional ICC values obtained with multilevel models and then were examined within the framework of meta-analysis. These steps are known as split, analyze, meta-analyze (Cheung & Jak, 2016) and further classified as two-stage approach by (Scherer, Siddiq & Nilsen, 2024). For the second research question, a multilevel model was built specifically for Türkiye. The following paragraphs explain these models respectively.
After the unconditional ICC comparison, the conditional ICC was calculated by adding the predictors to the model. Conditional ICC can be expressed as the correlation between the dependent variable values of two students in the same school when the predictor variables are controlled. Predictor variables added to both levels are expected to explain some of the variability in the dependent variable. In this case, an ICC value that does not decrease compared to the unconditional ICC value indicates that the predictors added to the model are not functional in explaining the ratio of the school level-based variance in the dependent variable to the total variance. To answer the first research question, conditional ICC was calculated by including the predictors in the PISA data set, which were determined to be important at both levels based on the literature review and could be included in the model in the same way for all countries.
Computation of ICC
Before calculating the ICCs for all countries, the data set was read into R (R Core Team, 2021). The student and school level data sets downloaded separately from the OECD database that contain dependent variables (i.e., PV1MATH-PV10MATH), independent variables as explained below, student and school level weights (W_FSTUWT and W_FSCHWT). Separate data sets were combined at the student level using the package intsvy (Caro & Biecek, 2017).
The data set was divided into 79 data sets on a country/region basis. Then, each set was divided further into 79x10=790 data sets in total, with one plausible value in each. Two-level model Mplus (Muthen & Muthen, 2017) syntax to compute ICC was automated for each data sets with the help of the MplusAutomation (Hallquist & Wiley, 2018) package. The robust estimator (i.e., MLR) was chosen (see section 2 in supplementary)
The multiple imputation method (as recommended by OECD, 2009) was used to combine the results from the analysis of each plausible value to obtain the unconditional and conditional ICCs. The aim here is to combine the results obtained for different plausible values. A single input file was prepared for the null model, and plausible values and student and school level sample weights were used in the analysis. With the help of an R script (see section 3 in supplementary), the student and school level residual variances were extracted from the Mplus output files.
In the next stage, the conditional ICC coefficient was calculated by taking student and school level predictors into the model. Nominal or ordinal categorical variables were included in the analysis by creating dummy variables. As stated in the previous paragraphs, conditional model analyzes were carried out in two phases, because for some countries all observations were missing, or the variance was observed to be zero in some predictors. Since neither of the above-mentioned situations were observed in the predictor variables in the 62 countries, these data sets were analyzed with a single input file. For each of the remaining 17 countries, different input files were prepared (variables with zero variance and/or all observations were missing were excluded from the analysis) and analyzes were carried out. To handle missing data, full information maximum likelihood (FIML) was utilized as suggested by Hox et al., (2015). Mplus assumes normal distribution to apply FIML to categorical variables (Muthen & Muthen, 2017). A similar situation also applies to dummy variables used in the conditional model. Muthen (2015) stated that when dummy predictors were added to the model and the missing data was dealt with the help of FIML, normality was assumed for these variables, and in the simulation studies, accepting this assumption did not significantly change the results. In the first stage, with the help of the R script written for the 62 countries, the student and school level residual variances were extracted from the Mplus outputs, and the conditional ICC coefficient was calculated for all countries. Conditional ICC coefficients for the remaining 17 countries were calculated one by one.
Meta-analysis
The data set for the meta- analyses included conditional ICCs, school sample size and 12 moderator variables for the 79 PISA regions as explained in the measures section. To meta-analyze ICCs, we followed the steps outlined by Martinez-Romero et. al (2020) and utilized the metafor package (Viechtbauer, 2010) to estimate a random effects model with the mice package (van Buuren et. al, 2015) to address missing data. Specifically, we used escalc function to transform ICC values to Fisher’s z by treating the number of PISA participant schools in each country as the sample size. Before inviting moderators into the model, heterogeneity was assessed using the Q statistic and the I2 index. To assess moderator effects, we utilized rma function and run the model with 12 moderators for each of the 100 imputed data sets. As suggested by Viechtbauer (2022) sampling variances (i.e., vi) were excluded from the predictor matrix and to adequately combine results the coefficients table was created based on the pool function’s output. Across 100 model results, we also reported the minimum, maximum and median values for the I2, R2, residual heterogeneity (QE) with 66 degrees of freedom and Q statistic for the moderators (QM) (see section 4 in supplementary).
Computation of ICC for Türkiye
In the previous stages, unconditional ICC for Türkiye was found to be 0.57 and conditional ICC was 0.43. While calculating the conditional ICC, predictors were added to the student level and school level. At this stage, school-level variables that have the potential to explain the differences between schools were identified and a model specific to Türkiye was build. No additions were made to student-level predictors. School-level variables that are included in the PISA 2018 data set and may have a direct impact on mathematics achievement were searched. The variables included in the model are given under the Türkiye specific model heading under the measures section.
The analysis was carried out by including student and school level variables used in previous analyzes plus additional school level variables, referred as the full model for Türkiye. We also utilized a stepwise backward selection method based on the statistical significance. The predictor that was not significant and had the highest p value was removed from the analysis and the analysis was repeated. This process was repeated until all variables in the model were significant, referred as the final model for Türkiye. The ICC coefficient was calculated for the full and final models. Data processing was made with R (R Core Team, 2021) using dplyr package (Wickham et al., 2023) for renaming variables, car (Fox & Weisberg, 2019) package for recoding variables and fastDummies package (Kaplan, 2023) to create dummy variables. Models were analyzed with Mplus (see section 5 in supplementary)
Measures
Variables in the multilevel model built for all countries
Mathematics Achievement (PV1MATH-PV10MATH): The dependent variable of the study. In PISA studies, students take only a part of the relevant achievement test, not the entire test, hence creating the basis for plausible values (Laukaityte & Wiberg, 2017; OECD, 2009; Rubin, 1987; Rutkowski et al., 2010). PISA determined 10 plausible values for each competency area per student in 2018. These values had a mean of 500 and a standard deviation of 100 across all participating countries.
Student level predictors
ESCS: Derived from three variables related to family background: parents' highest level of education, employment status and household possessions, including books. ESCS averages of the countries included in the analysis vary between -1.91 and 0.54. The average of all countries was found to be -0.28 and the median was -0.18.
Number of Classes Skipped in the Last Two Weeks (ST062Q02TA): A variable consisting of four categories in which the student is asked how many classes they missed in the last two weeks. Considering all students who answered this item in the PISA 2018 data set, 67% of the students reported that they had not missed any lessons in the last two weeks, 23% had missed one or two lessons, 6% had three or four lessons, and 4% had five or four lessons.
Primary Education Beginning Age (ST126Q01TA): Seven categories were presented to students for the age of starting primary education: category 1, "3 or earlier" and category 7, "9 or later". Categories 1 to 7 indicate ages 3 to 9 years. Based on the studies of Norman (2010), Robitzsch (2020) and Sullivan and Artino (2013), the variable "primary education beginning age" was included in the analysis as continuous variable as it has seven categories, the marginal distribution is not skewed (Rhemtulla et al., 2012) and the sample sizes are large enough. At the country level, the minimum average category value of the age for starting primary education is 2.75, the maximum average category value is 5.02, the average category value of all countries is 4.22 and the median is 4.18.
Immigration Status (IMMIG): An immigration history index derived from items (ST019) that ask about the countries of birth of the student and their parents. It consists of three categories: native, second generation and first generation. Considering all students in the PISA 2018 data set, 88% were native, 6% were second generation and 6% were first generation.
School level predictors
School Ownership (SCHLTYPE): Schools are defined in three different types of ownership: public school, state-supported private school, and private school. However, since in some countries there are no state-supported private schools or private schools, or the number of such schools is very small, these two categories were combined and included in the analysis. Of all the schools in the data set, 17% were private and 88% were public schools.
Educational Material Shortage (EDUSHORT): Lack of educational resources; It is derived from school principals' responses to four items regarding educational material (e.g. textbooks, IT equipment, library, or laboratory material) and physical infrastructure (e.g. building, floors, heating/cooling, lighting and acoustic systems). At the country level, the minimum average was –1.07, the maximum average was 1.19, the average of all countries was 0.13 with a median of 0.11.
Staff Shortage (STAFFSHORT): Derived from the items related to faculty shortage, insufficient or low-qualified faculty, shortage of auxiliary staff, and insufficient or low-qualified auxiliary staff. At the country level, the minimum value was -1.00, the maximum value was 0.94, the average was -0.03 with a median of 0.01.
Student Behavior Hindering Learning (STUBEHA): Derived from six items regarding the school principal's perceptions of the school climate, particularly student behavior that may affect teaching at the school. The country level minimum average was –1.21 and maximum average was 1.11. The average was 0.03 with a median of 0.06.
Teacher Behavior Hindering Learning (TEACHBEHA): Derived from five items regarding school principals' perceptions of the school climate, particularly teacher behaviors that may affect teaching in the school. The country level minimum average was –1.01 and the maximum average was 1.11. The average was 0.10 with a median of 0.10.
Community in Which School Located (SC001Q01TA): It includes five categories: village or rural area, small town, town, city and metropolitan. This variable has been recorded in three categories: small, middle size and large communities. The distribution of schools participating in the PISA 2018 study in these categories was 35%, 27% and 39% respectively.
Ratio of Male Students in School (BOY): Since it is a quantity stated in the literature to have an impact on mathematics achievement, the variable BOY, which expresses the proportion of male students enrolled in the school, was obtained by using the number of male students (SC002Q01TA) and female students (SC002Q02TA) in the school. At the country level, the minimum BOY value was 0.48, the maximum was 0.55, the average was 0.51 with a median of 0.51.
Variables used in meta-analysis
Tracking/ First Segregation Age: It is the age at which students are first selected to schools that implement a tracking system. Across the countries and economies participating in PISA 2018, students in 31 education systems are first selected for different programs when they turn 15. In 19 education systems, the tracking age was 16; in 8 education systems it was 14; and in 15 systems it was 13 years. The countries that select students at the youngest age, 10, are Austria, Germany and Hungary. The Czech Republic, Slovak Republic and Türkiye tracks at 11 years OECD (2020). In the analyses segregation age was recoded as 16 minus the age, resulting 0 when segregation was at 16 and 6 when the segregation was at 10.
The Number of School Types: In the OECD (2020) report the number of school types were available for each country, ranging from 1 to 6 the median value was 3. In the analyses, this variable is centered at 3.
Human Development Index (HDI): The Human Development Index is a metric compiled by the United Nations Development Program and used to measure a country's average achievement in the three key dimensions of human development: a long and healthy life, knowledge and a decent standard of living. HDI was first measured in 1990 and has been published every year since then, with the exceptions of 2012 and 2020/21. Most developed countries have an HDI score of 0.8 or above, placing them in the very high human development tier. These countries have stable governments, widespread and affordable education and healthcare, high life expectancies and quality of life, and growing, strong economies. In contrast, the "low human development" category includes the world's least developed countries with HDI scores below 0.55. Less developed countries face unstable governments, widespread poverty, lack of access to healthcare, and inadequate education. Additionally, these countries have low income, low life expectancy and high birth rates. HDI scores for countries were provided by the World Population Review and data from 2018 were used in this study (World Population Review, 2018). In the meta-analysis this variable is standardized to have a mean of 0 and standard deviation of 1.
Gini coefficient: Also called the Gini index or Gini ratio, is the most widely used measure of income distribution. The higher the Gini, the greater the difference between the incomes of a country's richest and poorest people. Gini coefficient helps identify high levels of income inequality, which can have many undesirable political and economic effects. It varies between 0 and 1 but is usually given as a percentage. For example, if a nation had absolute income equality and every individual earned the same amount, its Gini score would be 0 (0%). On the other hand, if one person earns all the income in a country and the rest earn no income, the Gini coefficient will be 1 (100%). While the Gini coefficient is a useful tool for analyzing the distribution of wealth or income in a country, it does not indicate the overall wealth or income of that country. Some of the world's poorest countries, such as the Central African Republic, have the highest Gini coefficients (61.3). A high-income country and a low-income country may have the same Gini coefficients. Gini coefficients for countries were provided by the World Population Review and data from 2018 were used in this study (World Population Review, 2018). In the meta-analysis this variable is standardized to have a mean of 0 and standard deviation of 1.
Gross Domestic Product (GDP): It refers to gross domestic product (GDP), which is the total monetary value of all goods and services produced by a country. GDP helps inform individuals about how well the economy is doing. GDP per country is primarily used to measure the strength of a country's economy and is an indicator of how fast a country is growing (World Bank Open Data, 2018). The data was provided by the World Bank and data from 2018 was used in this study. In the analysis this variable is standardized to have a mean of 0 and standard deviation of 1.
Government Expenditure on Education (GEX): It refers to the expenditures made from the general government budget on education and is expressed as a percentage of GDP. The data was provided by the World Bank and data from 2018 was used in this study. In the meta-analysis this variable is standardized to have a mean of 0 and standard deviation of 1.
Hofstede's Cultural Dimensions: Dutch psychologist Hofstede et al. (2010) explained the impact of the culture established in a society on the values of the members of that society and its connection with their behavior. They defined culture as “the collective programming of the mind that distinguishes members of one group or category of people from others.” They put forward six basic dimensions for society to organize itself and named these as dimensions of culture. These dimensions were determined as individualism, masculinity, uncertainty avoidance, long-term orientation, power distance and tolerance. Each dimension is expressed on a scale ranging from 0 to 100 but in the meta-analysis this variable is standardized to have a mean of 0 and standard deviation of 1. These dimensions were obtained by comparing many, if not all, countries in the world.
Individualism: It is the extent to which individuals feel they belong to the society they live in and the impact of this on cultural differences. For example, if individuals think about their own interests and feel independent from the society, it is classified as an individual society, and if individuals make their decisions according to the value judgments of the society and do not seek an interest independent of the interests of the society, it is classified as a social society.
Masculinity society: The decisions a society makes in a logical or emotional framework determines which type of society it belongs to. If a society makes its decisions within the framework of logic, that society is classified as a masculine society.
Uncertainty avoidance: A judgment about societies has been reached by examining the reactions of individuals in society to the uncertain situations they experience. It has been observed that societies that avoid uncertainty in events with uncertain and unpredictable outcomes are more tense, stressed and therefore less productive. It has been stated that societies that do not avoid uncertainty and think that uncertainty is a phenomenon that must exist, act more comfortably in situations where the outcome is unclear and unpredictable.
Long-term orientation: In a long-term oriented culture, the basic idea about the world is that it is in a state of change and there is always a need to prepare for the future. In a time-oriented culture, the world is essentially as it was created, so the past provides a moral compass and it is morally good to hold on to it. This dimension concerns life philosophies, religiosity, and predictions of educational success.
Power distance: Power Distance is the degree to which less powerful members of institutions and organizations (such as the family) accept and expect unequal distribution of power. This dimension explains how behaviors are shaped in proportion to the importance societies attach to hierarchy. It can be observed that in societies where power distance is low, subordinates are more comfortable in their relations with their superiors and have more say in the decisions made and can act critically, while in cases where this distance is large, subordinates do not question their situation and act more accepting.
Tolerance: This dimension emphasizes the freedom of individuals' behavior. In societies classified as tolerant, individuals act considering their own entertainment and impulses; In societies classified as restricted, it has been observed that individuals limit their actions according to the morality and value judgments of the society.
Variables in the Türkiye specific multilevel model
In addition to the variables included in the model created to calculate ICC for all countries, the following school-level variables were included in the Türkiye specific model.
Stratum as School's Academic Type (STRATUM): The school sample for Türkiye in PISA 2018 was determined by the stratified random sampling method (MEB, 2019). 186 schools and 6890 students representing Türkiye participated in the PISA 2018 application. This variable was recoded to obtain the school types in the STRATUM variable. Recoded STRATUM is a nominal variable with eight categories, and the numbers of high schools (HSs) and students in each category are given in Table 1 below.
Table 1. Number of Schools and Students in Different School Types in Türkiye
School Type
|
Category
|
Schools
|
Students
|
Anatolian High Schools
|
2
|
78
|
3013
|
Vocational and Technical Anatolian High Schools
|
3
|
56
|
2143
|
Anatolian Imam and Preacher High Schools
|
4
|
25
|
943
|
Multi-Program Anatolian High Schools
|
4
|
8
|
273
|
Lower Secondary Schools
|
4
|
6
|
22
|
Science High Schools
|
1
|
6
|
226
|
Social Sciences High Schools
|
1
|
6
|
228
|
Fine Arts High Schools
|
4
|
1
|
42
|
Since mathematics achievement levels were close to each other and the number of schools and students sampled was small, some school types were combined and grouped under a single category. As Science HSs and Social Sciences HSs as the high achiever schools, they grouped together into a single category which was the reference category in the analysis. Anatolian HSs and Vocational and Technical HSs were used as they are. All the remaining four school types were combined as the “Others” category. After these adjustments, the STRATUM variable is a nominal variable with 4 categories.
Career Guidance at School (SC162Q01SA): It is a variable with two response categories (“planned interviews are held for all students” or “students conduct interviews voluntarily”) to the question about the career guidance services of the school guidance service. 109 schools provide career guidance services on a planned (reference category) basis, and 51 schools provide them on a voluntary basis.
Student Placement by Ability into Different Classes (SC042Q01TA): It is a nominal variable with three categories (all students, some students, no students) that contains information about whether students are educated in different classes according to their ability/success status. The number of schools where all students are divided into success classes according to their ability/success status is 34, the number of schools where some of the students are divided into success classes is 63, and the number of schools where there are no success classes is 85. The category into which all students were divided into success classes was taken as the reference category and the other two categories were combined.
Student-Teacher Ratio (STRATIO): Student-Teacher ratio was obtained by dividing the number of students enrolled in the school (SC002) in the school level data set by the total number of teachers (TOTAT). The student-teacher ratio in schools in Türkiye was found to be minimum 2.34 and maximum 40.76. The Türkiye average for this variable was 13.46 with a median of 13.37.
School Size (SCHSIZE): It is equal to the sum of the number of male and female students enrolled in school (SC002) and is included in the school level data set. For Türkiye, the minimum school size was 26, the maximum was 3263, the average was 655.7 with a median of 612.
Ratio of Teachers with Master’s Degree (MASTER). In the school level data set, the rate of teachers with a master's degree (PROAT5AM), one of the variables containing the education levels of teachers at the school, was included in the analysis. The ratio of graduate teachers in Turkish schools was found to be minimum 0, maximum 1, average 0.15 and median 0.12.
Class Size (CLSIZE). The average class size in the school level data set consists of nine categories. The average class size in schools was found to be minimum 13 (15 or less), maximum 53 (50 or more), Türkiye average 41.26 (range 41-45) and median 48 (range 46-50).