Tracking and educational inequality: a longitudinal analysis of two school reforms in Switzerland

Abstract This paper examines the effects of different forms of tracking on learning inequalities in compulsory education. We use longitudinal monitoring of four cohorts of students over a four-year period, from their entry into secondary 1 education until they enter secondary 2 education. Our data include 18,706 students. We use multilevel regression analyses to test the effects of different forms of tracking after controlling for students’ sociodemographic and academic characteristics, and for contextual factors. Our results suggest not only that the effect of tracking is a composition effect or a peer effect, but that tracking per se explains a significant part of the variance in learning in the two educational contexts in our study. Our results confirm that educational policies and institutions play a pivotal role in the construction of learning inequalities and social reproduction through school in contemporary societies.


Introduction
This paper examines the effects of different forms of tracking on learning inequalities in compulsory education. In contrast to comprehensive school systems, the tracking system involves assigning students to different tracks according to their academic level, early on in their education. In many cases, tracks are hierarchized and provide different curricula. This separation of students-often referred to as 'orientation'-can be implemented at various levels of schooling. In some countries, including Germany and Austria, and in many cantons in Switzerland, the first orientation takes place at the end of primary school. In other countries, the first orientation takes place later, at the end of compulsory education, when students are around fifteen or sixteen years old.
The tracking of students is often analysed in relation to the consequences it has on efficiency and equity. Is tracking more effective than other modes of grouping in improving students' learning? Is tracking effective for all types of students? In line with the pivotal work of Boudon (1974), much of the literature on this topic shows that an early separation produces greater inequalities in learning between students from different social and School inequalities; tracking systems; social reproduction; school segregation; educational policies; streaming migratory backgrounds. At the end of the 1980s, a large longitudinal survey conducted by Kerckhoff (1986) in the UK demonstrated that in schools with ability-based groupings, the learning inequalities between students were reinforced. Similar evidence can be found in other national contexts, for example, in the United-States (Gamoran and Mare 1989;Oakes 1994;Hallinan 1994a;1994b), in Switzerland (Bauer and Riphahn 2006;Felouzis and Charmillot 2013), as well as in international comparisons (Burger 2016;Hanushek and W ößmann 2006;Pomianowicz 2021). Furthermore, according to Dupriez' literature review (2010), tracking systems are not particularly effective in terms of student achievement. Indeed, one of the findings from PISA surveys since 2000 is that 'comprehensive education systems, where all students follow a similar path through education, regardless of their academic performance, often perform better and are more equitable than education systems that rely on horizontal stratification (e.g. tracking students based on ability or interests). ' (OECD 2019b, 44).
While the literature is consistent on this issue, it should be noted that there are many forms of tracking across countries and education systems. For example, by comparing the different cantonal education systems in Switzerland, Felouzis and Charmillot (Felouzis and Charmillot 2013;Charmillot and Felouzis 2020) show that (1) there are various types of tracking across the cantons and (2) the most important factor to explain the effects of tracking is the level of segregation in the different tracks. This means that seemingly similar school systems may have different effects depending on the degree of segregation they involve.
Furthermore, segregation of second generation migrants does not have the same effect in different school systems. Baysu and de Valk (2012) conducted a comparative survey across four European countries and found that the consequences of segregation on second-generations migrants are very different depending on the school system. They note that 'in more open educational systems, such as Sweden and Belgium, the segregation experience is less negative, absent or even positive for the academic school careers of the Turkish and Moroccan second generation' . This suggests that 'when segregation is not accompanied by an early selection into differential grouping of students in terms of parental and individual background (such as in highly stratified systems) positive effects of segregation may outweigh its negative effects' . (794).
These findings raise questions about educational policies and the consequences of different forms of tracking. In order to answer them, we need to observe in detail the nature of school systems to understand the extent to which they produce inequalities. It is also essential to compare different tracking systems in order to better understand the mechanisms by which these inequalities are produced.

What does 'tracking' mean?
The aim of this section is to define tracking and its different forms. In the literature, tracking appears as one option among others for organizing compulsory education and managing differences in student learning at the end of primary school. In addition, early differentiation of curricula is not the only form of differentiation used. LeTendre, Hofer, and Shimizu (2003) suggest a typology of curriculum differentiation based on an international survey in Germany, the United States and Japan. The authors identify five types of differentiation.
The first is by school type (for example, vocational vs. academic schools). The second is by option and special program, the third is by track, the fourth is by ability grouping and the fifth is by geographical location. This means that curriculum differentiation exists in most school systems, including comprehensive school systems, and that tracking is one of several possible forms of differentiation, not the only one.
In line with this typology, Dupriez (2010) identifies different ways of grouping students at school. The first is to divide students up into classes within a given school according to their ability level in one or several academic subjects (ability grouping). The second is to send students to different schools based on the institutions' curricula or the characteristics of the urban area. The third is tracking, which is a specific way of organizing schooling that involves dividing students up into different streams. However, the author notes that while these typologies distinguish between different ways of grouping students, they do not fully reflect the complexity of education systems: 'Be that as it may, most of this research is based on an almost dichotomous representation of education systems, hinging on the age at which students are channeled towards a particular track. Recently, several authors have drawn attention to the fact that the metabolism of education systems is more complex and that one should consider mechanisms other than tracks when managing students of mixed ability. ' (74-75) The effects of different forms of tracking Chmielewski (2014) proposes to go beyond the comparison between tracking and comprehensive systems. The author uses data from the PISA 2003 survey to compare two types of curriculum differentiation. The first is course-by-course within-school differentiation-i.e. 'ability grouping'-and the second is the differentiation into separate school buildings to distinguish between academic and vocational streams. The author shows that the two systems are very similar and differ 'more in degree than in kind' (318). However, 'SES [Socioeconomic status] segregation between tracks is higher in academic/vocational streaming than in course-by-course tracking' (318) and the achievement gap is larger in the first system than in the second one. These results show that the nature of tracking affects the extent of the achievement gap between students from different social groups. Pomianowicz (2021) compares three forms of tracking based on the degree of separation between students. The first involves separating students into different tracks and schools; the second, separating students into different tracks within the same schools; and the third, separating students into ability groups within the same classroom. The author focuses on the achievement gap between second-generation and non-migrant students: does the gap between these two populations depend on the form of tracking in the school system? To answer this question, the author uses data from the PISA 2018 survey for 28 countries. The results confirm those of Chmielewski (2014): 'a higher tracking degree leads to substantial reading performance disadvantages for second-generation compared to non-immigrant students' (21).
In addition, the nature and form of school systems do not influence the educational careers of different groups of students in the same way. This may be, as Pásztor (2010) shows, due to a higher motivation of girls for long studies. Thus, according to Crul and Schneider (2009) 'institutional arrangements affect ethnic minority groups differently from their native peers, and they also shape the trajectories of men and women in different ways (…) the more open educational system seems to offer better opportunities for young Turkish women. ' (1522) Finally, the comparisons of different forms of tracking from the PISA surveys demonstrate the link between the degree of separation of students and the degree of social, ethnic or racial inequality. Thus, according to Brunello and Checchi (2007), tracking reduces equality of opportunity, but furthermore, we can add, following Chmielewski, Dumont, and Trautwein (2013), that the effect of tracking depends on the tracking type. It might therefore be relevant to examine in more detail what we mean by 'tracking type' .

Could the devil be in the details?
One of the main issues related to tracking concerns the features of tracking that explain its effects on learning inequalities. To determine this, it is necessary to examine the actual implementation of tracking in different contexts. In a comparative study of cantons, Stadelmann-Steffen (2012) indicates that it is not tracking itself that affects inequalities between students, but rather the degree of permeability between tracks. The cantons in which students switch more frequently from one track to another during their studies are also those in which social inequalities are lower. This is also what Felouzis and Charmillot (2013) note when they observe that cantons with education systems that are comparable in terms of structure can differ greatly, not only in terms of the skills produced, but also in terms of equity. The authors show that the cantons where social inequalities in skills are the highest are also those where the social segregation of tracks is the most pronounced. This suggests that it is in part the way in which tracking is actually organized that explains the extent of school inequalities. In other words, it is not how students are grouped per se that is relevant, but rather the characteristics that accompany this grouping.
Another aspect that is important for understanding the effects of tracking is determining what criteria influence track placement. The main question is the extent to which track assignment is based not only on academic criteria, but also on non-academic considerations (especially social background, ethnicity and gender). The impact of students' social or ethnic characteristics on tracking has been widely demonstrated empirically: students from disadvantaged backgrounds and minority groups are proportionally more likely to be directed towards less demanding tracks (Felouzis, Charmillot, and Fouquet-Chauprade 2011;Lucas and Berends 2002;Lucas 1999;Mickelson 2003;Oakes 2005). Their over-representation in the tracks with low requirements is not entirely justified by differences in academic level. Indeed, when academic level is statistically controlled, the effect of students' socioeconomic status and ethnic background on track placement decreases, but does not disappear entirely (Hallinan 1994b). In other words, assuming the same level of ability, disadvantaged and minority students are significantly more likely to be assigned to a less demanding track.
In the Swiss context, Kronig (2007) shows that teachers' orientation decisions are strongly influenced by the context of the school and by the characteristics of the students (gender, social or migratory origin). This suggests that the over-representation of students from disadvantaged and migrant backgrounds in low requirement tracks is not solely the result of their academic level. It is also the result of biases in judgements when orientation decisions are made. Thus, Kronig argues that: 'the link between selection decisions and social origin is decidedly strong. For equal performance, the actual chances of pursuing a school career are significantly better for students from privileged, native-born families. This […] raises considerable doubts about the legitimacy of the school as a certifying and allocating institution' (Kronig, quoted in Meyer 2008, 69).

Research questions
In this literature review, we have demonstrated the relevance of comparing different tracking systems to understand the mechanisms by which inequalities are produced. The debate about tracking should therefore no longer be about comparing it to a comprehensive system, but rather about understanding which features are relevant to explain the effects of different forms of tracking. All the more so as the trend in education policies, especially in Switzerland, is to mix different solutions: within-school tracking coupled with ability grouping in some subjects, gateway systems between tracks, a mix between the comprehensive school system and the tracking system, etc.
This paper questions the effects of these 'mixed solutions' by comparing two French speaking cantons in Switzerland. In each of them, a reform of compulsory education (secondary 1 education) was implemented in the 2010s with the aim of limiting inequalities in learning and orientation.
In the canton of Naven, 1 the reform has strengthened tracking. Before the reform, this canton had a mixed system: most schools had a two-track system, while a few others had heterogeneous classes, with ability groups for some subjects. With the reform, all schools switched to a three-track system. The canton of Berg went in the opposite direction. The reform enabled a change from a three-track system to a system with only two tracks (basic requirements and high requirements), but with ability groups for students in the track with basic requirements.
We use these regional differences in educational policies to analyse the consequences of different forms of tracking on social and academic segregation and, consequently, on inequalities of learning: Which way of organizing tracking produces which learning inequalities? Empirically, we have one cohort of pupils before and one cohort after the implementation of the reform in the two cantons. It is therefore possible to evaluate the effects of these two reforms in terms of inequalities of learning.

Data
The data for our analysis come from official statistical sources in the two cantons. These two databases keep track of students when they enter the canton's education system. Updated on a yearly basis, they make it possible to monitor students throughout their school career, up until they leave the education system.
This study is based on the longitudinal monitoring of four cohorts of students over a four-year period, from their entry into lower secondary education (secondary 1 education) until the first year of secondary 2 education. In total, our data include 18,706 students. The first two cohorts are made up of students who started secondary 1 education before the reform in each canton (N = 3,507 in Naven;N = 5,879 in Berg). The other two cohorts comprise students who started their secondary education after the reform (N = 3,594 in Naven; N = 5,726 in Berg).

Secondary 1 education in Switzerland
Because of Switzerland's federal structure, three political bodies share the educational tasks: the Confederation (central government), the cantons and the municipalities. Compulsory schooling (primary and secondary 1 education) is mainly under the jurisdiction of the cantons. They are more or less free to organize the education system as they see fit, providing that they meet the general objectives set at the federal level.
In the two cantons under study, compulsory education spans 11 years. Primary school, from year 3 to year 8 (for students from 6 to 11 years old), is split into heterogeneous classes, irrespective of academic level. Secondary 1 education lasts for 3 years (year 9 to year 11); students are typically between 12 and 15. At this stage, students are grouped into different tracks within the school according to their academic level.

Secondary 1 education in the canton of Naven
Before the reform, the education system in Naven offered two organizationally contrasting modes of grouping students: depending entirely on their catchment area, students attended either schools which ran a tracking system or schools with a system presented as comprehensive.
• In 17 of the schools, students were grouped into two tracks, based on their academic level at the end of primary school: students with a good academic level were oriented towards classes with high requirements, while those with a lower academic level were assigned to small classes with basic requirements. • In three of the schools, students were grouped in heterogeneous classes; however, certain subjects were divided into ability groups (mathematics, French and German).
The reform, which took place during the 2011-12 school year, unified the education system in Naven. All schools switched to a system with three tracks (basic requirements; intermediate requirements; high requirements). In addition, the conditions for entry into the different tracks were strengthened and gateways were implemented to facilitate transfers between tracks. The reform has thus reinforced selectivity at the beginning of secondary 1 education, by increasing the number of tracks and by making it more difficult to enter tracks with high and intermediate requirements, but at the same time it has made it easier to move from one track to another, especially to a more demanding track.

Secondary 1 education in the canton of Berg
The canton of Berg borders the canton of Naven. However, its educational system works very differently. Before the reform, students were divided into three tracks with different levels of requirements (basic requirements; intermediate requirements; high requirements). Since the reform, during the 2013-14 school year, students are grouped into two tracks (basic requirements; high requirements). However, in the basic track, the teaching of certain subjects (mathematics, French and German) is divided into two ability groups (high level, basic level). Therefore, it appears that the canton of Berg has gone in the opposite direction to that of Naven, since the reform has made the system more heterogeneous by reducing the degree of separation of students into different classes.
It should be noted that in both cantons, the reform has completely changed the process of admission into the different tracks. Before the reform, both quantitative and qualitative criteria were used to make a decision on orientation. It was determined by both the academic level at the end of primary school and by teachers' professional judgment as well as parents' preferences. The reform has completely discarded subjective elements related to the perceptions of teachers and parents, in order to focus exclusively on the objective criteria of academic performance.

Dependent variable
The aim was to estimate the effects of different forms of tracking on students' academic level at the end of secondary 1 education, which was measured by scores on standardized tests in French and mathematics. These tests are designed to assess students' acquired knowledge and skills relative to the learning outcomes defined in the study plan, which is the same in both cantons. 2 For each cohort, the content, pass requirements, correction methods, and grading scales of these tests are standardized. However, it should be noted that these tests are not the same in the two cantons and that they also vary from one year to the next. As they are not created using item response theory, which utilizes anchor test items to ensure comparability of tests across different groups, it is not possible to directly compare the scores for each cohort. To get around this problem, we chose to focus on the difference in scores rather than comparing raw scores. The goal is to assess whether the reform in each canton contributed to widening the score gap between students. This analytical approach is inspired by studies on the 'achievement gap' , which typically use the difference in scores to compare results measured with different tests (see, for example Clotfelter, Ladd, and Vigdor 2006;Fryer and Levitt 2004;Reardon 2018;Reardon and Galindo 2009). More precisely, we standardized the score in French and mathematics to the mean for each cohort. We then use the z-score to measure the distance from the mean in terms of percentage of standard deviation.

Independent variables
Individual characteristics of students were taken into account using the following variables: • Gender was made up of two categories: male and female. • Immigration status was measured with two variables: student's first language, which distinguishes between French speakers (i.e. students who have French as their first language) and non-French speakers, and nationality, which differentiates between Swiss and foreign students. Combining these two variables, immigration status has four modalities: Swiss and speaks French, Swiss and speaks another language, foreigner and speaks French, foreigner and speaks another language.
• Academic level at the end of primary school was measured by the students' scores on standardized tests in French and mathematics. Because the tests were not identical across cohorts, we focused on differences in scores, as we did for the dependent variable, using z-scores rather than raw scores. • The track taken in secondary 1 education was measured by a variable with four modalities: basic requirements, intermediate requirements, high requirements, and heterogeneous classes with ability groups. It should be noted that, in Berg after the reform, students in the basic requirements track were split into different ability groups (high level; basic level) according to their academic level in the three main subjects (French, German and mathematics). This results in potentially eight different categories (two ability groups for each of the three main subjects). In order to make the comparison with other cohorts possible, we coded students with at least two core subjects at the basic level as belonging to the 'track with intermediate requirements, low ability groups' and those with at least two main subjects at the high level as belonging to the 'track with intermediate requirements, high ability groups' .
Along with these individual variables, we also included variables measuring the composition of the class (average percentage of girls per class, average percentage of students who are Swiss and speak French per class, average percentage of students with low academic level per class, etc., see table in Appendix 1). Table 1 shows the individual characteristics and academic level of students in each track for each cohort.

Descriptive statistics
In terms of gender differences, girls tend to be overrepresented in the track with high requirements and underrepresented in the track with basic requirements. In Naven before the reform, 52.4% of the students in the track with high requirements are girls. After the reform, the overrepresentation of girls in the track with high requirements increases further (55.6%). In contrast, girls are clearly underrepresented in the track with basic requirements (43.8% before the reform and 36.4% after the reform). The trend is similar in Berg.
Differences related to immigration status can also be highlighted: the track with high requirements has a high share of students who are Swiss and speak French, while the track with basic requirements has a high percentage of students who are foreigners and do not speak French. In Naven, the percentage of students who are Swiss and speak French in the track with high requirements is 60.9% before the reform and 58.6% after. The same trend is observed in Berg, with 74.3% of students who are Swiss and speak French in the track with high requirements before the reform and 69.4% after the reform. By contrast, whether in Naven or in Berg, students who are foreigners and speak another language are heavily overrepresented in the track with basic requirements. The comparison of the two cantons shows that in Naven there was an increase in the concentration of foreign and non-French speaking students in the track with basic requirements (from 45.8% before the reform to 48.8% after the reform), whereas in Berg their presence in this track decreased slightly after the reform (from 34.9% to 33.7%).
The academic level of students was measured at two points of schooling: first at the end of primary school and then in the last year of secondary 1 education. As expected in a tracking system, the average academic level is higher in the track with high requirements. In Naven, the average score in this track is between 0.4 and 0.5 standard deviations above the mean, at the end of primary school and at the end of secondary 1 education. In Berg, the average score is between 0.7 and 0.8 standard deviations above the mean, suggesting that access to the track with high requirements in this canton is somewhat more selective than in Naven. In contrast, the track with basic requirements is made up of students with a very low academic level. If we consider, for example, the academic level at the end of secondary 1 education, in Naven the average score in the track with basic requirements is −1.16 standard deviations below the average before the reform and −1.66 standard deviations below it after the reform. In Berg, the average scores in this track is −1.20 and −0.96 respectively. This shows once again the opposite effects of the reform in the two cantons. In Naven, the reform, by reinforcing the grouping of underperforming students in the least demanding track, led to a decrease in the average score in the track with basic requirements (from −1.16 to −1.66), whereas in Berg, the average academic level in this track increased slightly (from −1.20 to −0.96).
To summarize, this table points out similarities and differences between the four cohorts. In regard to the similarities, it appears that the characteristics of the students in each track are very similar from one cohort to the other. The track with high requirements has more girls, more students who are Swiss and speak French, and, logically, stronger students academically. Conversely, the track with basic requirements has a higher share of boys, of students who are foreigners and allophones, and of students who are very weak academically. Regarding the differences, the percentage of students in the track with high requirements is very different between the two cantons: more than 60% in Naven (64.1% before the reform; 69.4% after the reform) as opposed to only 43% in Berg. This demonstrates the contrasting educational context of the two cantons. Regarding the effects of the reform, in Naven it reinforced the separation between girls and boys, between Swiss and foreign students, between French and non-French speakers, and between the strongest and weakest students academically. In Berg, on the other hand, the reform led to a decrease in the inequalities related to students' individual and academic characteristics

Analysis design
We used multilevel regression analysis to test the effects of different forms of tracking after controlling for students' sociodemographic and academic characteristics as well as for contextual factors.
Multilevel regression analysis is used to take into account the fact that individuals grouped in the same units 'are likely to be experientially and demographically similar to each other, but different from observations in other groups' (Bickel 2007, 61-62). In other words, this allows the hierarchical structure of the data to be considered: when individuals are grouped into larger units, it can be assumed that there is a correlation between the residuals within the groups. In our data, we have two hierarchical levels: level 1, which represents the students in the two cantons studied (N = 18,055) and level 2, which represents the class 3 in which the students are enrolled at the end of secondary 1 education (N = 1,181).
We built five regression models: • Model 0 (empty) simply makes it possible to decompose between level 1 (individual) and level 2 variance (class). • Model 1 includes the effect of students' individual and academic characteristics. • Model 2 introduces the effect of class composition. • Model 3 takes into account the effect of the tracking system. • Model 4 includes a cross-level interaction effect between students' initial academic level and the tracking system. The goal is to take into account the fact that students' initial academic level may have a different effect depending on the track they attend. The

The effects of class composition and tracking organisation
• The multilevel regression analysis has two purposes. First, to test which factors influence the academic level of students at the end of secondary 1 education. Following the literature, three kinds of factors are to be taken into account: the individual and academic characteristics of students, school segregation through class composition and school organisation through tracking systems. • Second, to estimate how much of the variance in scores is explained at the class level.
The aim here is twofold. First, to determine whether the class plays a role in explaining the differences in scores between students. Second, to examine how the effect of the class changes when students' characteristics, class composition and tracks are taken into account.To calculate the share of variance scores explained at the class level (level-2 variance), we used the intraclass correlation coefficient (ICC). It is given by the fol-  Table 2 gives the level-2 variance for each regression model. Model 0, which does not include any explanatory variables, makes it possible to estimate the share of variance explained at the class level. For all four cohorts, the share of variance explained at the class level is high, which is expected with tracked systems: students are grouped in classes according to their academic level, which mechanically produces a strong differentiation in relation to the class. In Naven, the share of variance explained at the class level is 56.2% before the reform and 66% after the reform. In Berg, it is 71.3% and 43.7% respectively. This confirms that the reform resulted in a strengthening of the class effects in Naven, while it led to a strong decrease in these effects in Berg.
When the effect of students' individual and academic characteristics is included (model 1), the share of variance explained at the class level decreases sharply. The most significant decrease occurs in Berg after the reform, where it falls from 43.7% to 15.3% between model 0 and model 1, a drop of 65% in the level 2 variance, indicating that class effects are more dependent on individual and academic characteristics than for the other three cohorts.
Model 2 takes into account the effect of class composition, which further reduces share of variance explained at the class-level. Berg pre-reform shows the largest decline: the level 2 variance is reduced from 57.9% to 20.4%, a 65% drop.
Model 3 includes the effect of the track, which again contributes to reducing the level 2 variance. This time, it is for Naven after the reform that the variance decreases the most, from 26% to 15.1%, which represents a 42% drop. In sum, Table 2 shows that in the two cohorts where tracking is most pronounced (Naven post-reform and Berg pre-reform), the class-level variance is more dependent on contextual and organisational factors, such as class composition and the track attended. In contrast, in Berg after the reform, where tracking is less prevalent, class variance is more related to the individual and academic characteristics of the students. Table 3 shows the regression coefficients for model 4, which allows for a more detailed analysis of the effect of individual and academic characteristics of students, class composition and tracking on the score at the end of secondary 1 education. 4 In terms of the individual characteristics of students, there is first a significant effect of immigration status: foreign and/or allophone students have a lower score at the end of secondary 1 than students who are Swiss and speak French. There are, however, some differences depending on the cohort considered. In Naven before the reform, only the students who are foreigners and who speak another language have a significantly lower score (b = −0.117; p < .001). After the reform, it is Swiss students who speak another language (b = .059; p = .042) and foreigners who speak French (b = −0.110; p = .013) who obtain a lower score.
In Berg, before the reform, all students who are not Swiss and French speaking have a significantly lower score (Swiss and speaks another language: b = −0.073; p = .007; Foreigner and speaks French: b = −0.096; p = .001; Foreigner and speaks another language: p = −0.110; p < .001). After the reform, only the students who are Swiss and speak another language have a significantly lower score (b = −0.072; p = .033).
Additionally, it can be noted that when immigration status is significant, its effect is stronger for students who are foreigners and speak another language than for other groups.
With respect to gender, there is no significant effect in Naven. In Berg, girls scored significantly higher than boys (before the reform: b = .069; p < .001; after the reform: b = .060; p = .001) The effect of initial academic level is significant and positive for all cohorts, which shows, not surprisingly, that students who scored high on standardized tests at the end of primary school also tend to perform better at the end of secondary 1. In Naven, the effect of the initial academic level does not change: whether before or after the reform, for each one standard deviation increase in the initial score, the score at the end of secondary 1 increases by about 0.5 standard deviations. In Berg, on the other hand, the impact of the initial academic level increases noticeably after the reform: each time the initial score increases by one standard deviation, the score at the end of secondary 1 increases by 0.82 standard deviations, whereas before the reform the increase was only 0.26 standard deviations. 5 This suggests that after the reform, academic achievement is more dependent on students' ability level than on other individual, contextual or organisational factors.
The effects of class composition vary by cohort. In Naven before the reform, only the percentage of students with a high academic level per class is significant: each time it increases by one standard deviation, the score at the end of secondary 1 improves by 0.11 standard deviations (p = .006). After the reform, there are no significant effects of class composition in Naven. In Berg, two composition effects are significant before the reform: the percentage of students who are Swiss and speak French per class, which has a positive effect on the score at the end of secondary 1 (b = .041; p = .008), and the percentage of students with a low academic level per class, which has a strong negative impact (b = −0.128; p < .001). After the reform, only the effect of the percentage of Swiss students speaking French per class remains significant (b = .061; p = .001).
Regarding tracking, the effect is similar for all cohorts. After controlling for individual characteristics and initial academic level, students in the track with high requirements perform better at the end of secondary 1 than those in the track with intermediate requirements, who in turn perform better than those in the basic requirements track. Two findings can be highlighted. First, the effects of tracking are less pronounced in Berg after the reform than for the other cohorts: compared to the track with high requirements, the difference in score is only −0.23 standard deviations for students in the intermediate requirements track (p < .001) and −0.51 standard deviations for those in the basic requirements track (p < .001). Multilevel regression analysis (model 4). significance threshold: ***p < .001, **p < .01, *p < .05.
Second, the comparison between the two cantons shows that the reform increased the gap between the basic and high requirements track in Naven (b = −0.77 before the reform; −1.79 after the reform; p<.001) while in Berg it had the opposite effect (from −0.969 to −0.511; p<.001). Table 3 shows that two factors have a strong impact on the scores at the end of secondary 1: the initial academic level, especially in Berg after the reform, and the track, especially for students in the basic requirements track.

Initial level of students and tracking system. what interaction effect?
To explore the effects of these two factors more thoroughly, the regression model includes an interaction effect between tracking and initial academic level. The aim is to estimate whether the effect of the initial academic level on the score at the end of secondary 1 varies depending on the track attended. Table 3 indicates that these interaction effects are always significant, except in Berg before the reform. Figures 1 and 2, which give the score predicted by the regression model depending on the track and the initial academic level, allow us to visualise these interaction effects by looking at the slopes of the regression lines.
In Naven before the reform (Figure 1), the regression line is steeper for heterogeneous classes with ability groups, which means that students in these classes progress more than those in the track with high or basic requirements. The regression model predicts that students with a high initial academic level in heterogeneous classes (score at the end of primary school between 0.5 and 1 standard deviations above the mean) do not have a score at the end of secondary 1 that is significantly different from those in the track with high requirements. For an initial score of 0.5, the regression model estimates that the score at the end of secondary 1 is equal to 0. These results show that, compared to the system with tracks, heterogeneous classes with ability groups contribute to reducing inequalities, especially for lower achieving students, and at the same time, they do not hinder the progress of academically stronger students.
A significant interaction effect for students in the track with basic requirements can also be found. Figure 1 shows that the difference in scores between the tracks with high and basic requirements tends to increase slightly as initial academic level rises. In other words, the model predicts that students make less progress in this track than in the track with high requirements.
In Naven after the reform, the regression line is the steepest for the high requirements track and the least steep for the basic requirements track. This means that the differences in scores between tracks tend to grow as the initial academic level increases. The regression model predicts that, for an initial score of −1, the score at the end of secondary 1 is −0.24 in the track with high requirements (95% CI [-0.31, −0.18]) and −1.59 in the track with basic requirements (95% CI [-1.70, −1.49]), a difference of −1.35 standard deviations. When the initial score is 0, the difference is −1.73 standard deviations; for an initial score of 1, it increases to −2.11 standard deviations.
In sum, Figure 1 shows that in Naven the reform has not only reinforced the gap between tracks, as we have seen above, but it also penalises more strongly the students with an average and high initial academic level. This is likely to affect, in particular, the progress of students who are close to the threshold for admission into the track with high requirements.
In Berg before the reform (Figure 2), the interaction effects are not significant. In Figure 2, this is shown by the regression lines, which are parallel: the differences in scores between tracks remain constant and do not vary according to the initial academic level.
After the reform, the interaction effects are significant. In particular, Figure 2 shows that the slope is steeper in the track with high requirements: as the initial score increases, the difference in scores compared to the other two tracks increases. For example, the regression model predicts that for an initial score of 0, the score at the end of secondary 1 is equal to 0.04 (95% CI [-0.01, 0.10]) in the track with high requirements and −0.19 (95% CI [-0.23, −0.15]) in the track with intermediate requirements, a difference of 0.23 standard deviations. When the initial score is equal to 1, the score at the end of secondary 1 is equal to 0.86 (95% This means that the track with high requirements is particularly beneficial to academically strong students, who progress more in this track than in the others. As before, it can be assumed that it is the students who are just at the limit of the admission threshold who are likely to be most penalized by tracking.

Discussion
The purpose of this article was to question the effects of different forms of tracking on academic inequalities in learning. To address this issue, we implemented a quantitative analysis of four cohorts of students over a four-year period, from the end of elementary school to the end of secondary 1 education (from approximately age 12 to age 15). Using multilevel regression analysis, we have shown that different forms of tracking produce different levels of learning and different levels of inequality. More precisely, first, we have shown that enrolling a student in a track with basic requirements is detrimental to his or her learning. For a similar initial academic level at the end of primary school, differences in learning can reach more than one standard deviation by the end of secondary 1. Second, we have shown that the 'strongest' tracking systems (those that separate students into closed and hierarchical tracks at the end of elementary school) are the most unequal. More specifically, it appears that students' achievement is the least dependent on the track in Naven before the reform and in Berg after the reform. Conversely, in Naven after the reform and in Berg before the reform, students' achievement is more strongly dependent on the way the education system is organised.
This multilevel regression analysis also reveals the mechanisms by which these inequalities are produced. These results suggest that the effect of tracking is not only a composition effect or a peer effect (van Ewijk and Sleegers 2010), but that tracking per se explains a significant part of the variance in learning in the two educational contexts in our study.
It is then a question of explaining these learning inequalities. We mobilize two hypotheses that are not contradictory. The first centres on teachers' expectations: learning inequalities may be related to the stigmatization of students in the track with basic requirements, like a negative 'Pygmalion effect' (Rosenthal and Jacobson 1968) which may have strong consequences on the academic level (de Boer, Timmermans, and van der Werf 2018). The second hypothesis focuses on the supply of knowledge provided by teachers in each track and on the differences in teaching methods and teachers' skills. We can assume, first, that the goals and focus of teaching are very different in a high or low requirement track (Mazenod et al. 2019). Moreover, as Blömeke et al. (2022) show in the German context, we can also hypothesize 'a mediating role of teachers' skills and their instructional quality' (p. 1) in explaining students' learning progress in the different tracks. More generally, the teaching activity in the classroom is deeply relevant for understanding the construction of social inequalities at school, as Dunne and Gazeley (2008) show in the English case, or Rochex and Crinon (2011) in the French educational context.
To better address these two hypotheses, it would be relevant to undertake a qualitative study on teaching in the different tracks similar to the research done by Agnès van Zanten (2012) on teachers' practices in disadvantaged schools in France, or Çelik (2022) on the development of an oppositional culture among Turkish male students in secondary education in Germany. Beyond the composition effect, which is now well supported by the scientific literature (van Ewijk and Sleegers 2010), what is the impact of teachers' skills and teaching practices on the achievement gap in relation to the tracking system?
Another way to understand tracking systems might be to analyse the social construction of educational policies on tracking. This would answer the central question of the social role of tracking and why it still exists. Three sociological theories can explain its persistence in many school systems around the world. The first is the theory of social domination (Bourdieu and Passeron 1970;Ball 2021). According to Turner (1960), who compared education systems in England and the USA, early separation of students into differentiated tracks can be seen as a form of social reproduction through sponsorship. Morgan (1990) also emphasizes that this 'sponsored mobility norm favours selection rather than a prolonged open contest: the idea is to sort out those capable of meeting a high standard of education from those who are not, and to devote the available educational resources those people capable of the most gain ' (p. 40).
In this case, tracking would be an institutional translation of social domination and a tool to maintain it. The second is a functionalist theory: tracking provides an early differentiation of schooling to foster the development of vocational training, in order to meet the needs of the labour market. In this perspective, the long-term consequences of tracking on the links between school and work should be studied in order to understand the persistence of tracking in many national contexts (Imdorf et al. 2010). Finally, the third sociological model comes from to political science. Specifically, the history of the institution produces a strong path dependency in education, as in all other sectors of society. Reforming an education system from tracking to a comprehensive school is very costly and requires strong political will and the ability to construct new frames of reference for public action in education (Wentzel et al. 2021).
The results of this study must be considered in light of certain limitations. These are mainly related to the nature of the data used. A first limitation is the use of different standardized tests to measure students' academic level for each cohort as it may impact the reliability and comparability of the results. Reardon and Galindo (2009) highlight that such score differences are 'widely used in the literature' and enable 'approximate comparability' between tests, but also present 'several potential problems' that may lead to 'potentially erroneous inferences ' (863-864). This should make us cautious when interpreting the results, particularly when making comparisons between the two cantons.
Another limitation concerns the use of administrative data. On the one hand, this makes it possible to include all the students and to follow their career paths in a precise way. On the other hand, some variables are missing, such as social origin, which is very relevant for understanding educational inequalities and social reproduction through schooling.
In the end, our results confirm the pivotal role played by educational policies and institutions in the construction of learning inequalities and, more broadly, of social reproduction through school in contemporary societies. As discussed in a recent paper (Felouzis 2021), educational inequalities are not only 'primary'-related to the primary socialization of individuals-but also 'secondary'-related to the structure and organization of schooling. Thus, comparing the different forms that these structures and organizations take provides insight on how to improve equity in education.

Notes
1. For confidentiality reasons, we use pseudonyms to name the two cantons studied. 2. The study plan and learning objectives are defined at the regional level. This means that all cantons in French-speaking Switzerland share the same study plan. 3. It is important to note that the class may not be consistent over time or between subjects.
There may be changes in class assignments from one year to the next, although these changes are not common. In heterogeneous classes with ability groups, students need to change classes for the core subjects, but they maintain a reference class for other subjects. 4. The complete tables for models 0 to 4 are in appendices 2 to 5. 5. The increase in the effect of the initial academic level in Berg, from 0.26 before the reform to 0.86 after the reform, may seem surprising. It should be noted that a simple correlation analysis between the score at the end of primary school and the score at the end of secondary school shows a high correlation, both before the reform (r = 0.719) and after the reform (r = 0.752).

Disclosure statement
No potential conflict of interest was reported by the authors

ORCID
Appendix 3. Effect of individual and academic characteristics of students, class composition and tracking on the score to standardized tests at the end of secondary 1, in naven after the reform. Multilevel regression analysis (models 0-4). significance threshold: ***p < .001, **p < .01, *p < .05.
Appendix 4. Effect of individual and academic characteristics of students, class composition and tracking on the score to standardized tests at the end of secondary 1, in Berg before the reform. Multilevel regression analysis (models 0-4). significance threshold: ***p < .001, **p < .01, *p < .0.5.
Appendix 5. Effect of individual and academic characteristics of students, class composition and tracking on the score to standardized tests at the end of secondary 1, in Berg after the reform. Multilevel regression analysis (models 0-4). significance threshold: ***p < .001, **p < .01, *p < .05.