Keywords

4.1 Introduction

From a didactical perspective, alignment between the intended and implemented curriculum (educational goals at the system and classroom level) and the achieved or attained curriculum (learning outcomes as gained by students) is considered to support students’ learning. Such alignment is thus seen as a vital characteristic of effective teacher practice (Daus et al., 2019) and as essential in offering equal opportunities to students across schools. These different levels of content coverage develop continuously, as educational reforms change national curricula, didactical principles develop, and teachers’ experiences and the materials they have access to shape how they implement the curriculum over time. For an example of the latter, one need to look no further than how much the use of information technology (IT) in schools has changed between the most recent rounds of TIMSS.Footnote 1 However, it can be difficult to compare the three different levels of content coverage as they describe the content in three different ways. A national curriculum describes the intended curriculum in relatively broad and abstract terms, extending across several school years. The implemented curriculum in the form of teachers’ descriptions of the content covered when responding to the TIMSS teacher questionnaire’s section on content coverage is based on somewhat technical terms associated with the subject, while their everyday descriptions are presumably closer to the terminology they use during lessons. Finally, the attained curriculum in terms of students’ learning outcomes is described empirically, using specific test items within the subject that may not be clearly associated with the different topics within the curriculum.

This chapter presents how these three curriculum levels are measured and the changes that have occurred based on the 2011, 2015, and 2019 cycles of the TIMSS fourth-grade study conducted in the Nordic countries. Furthermore, it comments on the Icelandic curriculum where relevant, as the only Nordic country not participating in TIMSS.

4.2 TIMSS and Curricula in the Nordic Countries

The TIMSS goal of assessing and comparing student learning necessitates close collaboration with representatives from all participating educational systems. In general, the student assessments implemented in TIMSS and described in the TIMSS Assessment Frameworks (Mullis & Martin, 2013, 2017; Mullis et al., 2009) are considered to cover the participating countries’ curricula relatively well. However, the overlap between assessment frameworks and the national curriculum varies between countries (Wagner & Hastedt, 2022).

4.2.1 The Test-Curriculum Matching Analysis

To measure how well TIMSS covers a country’s curriculum, a Test-Curriculum Matching Analysis (TCMA) was conducted for each country, comparing student performance based on all items included in TIMSS with performance based only on those items that were considered within the country’s curriculum. A summary of the TCMA for the four Nordic countries participating in TIMSS is presented in Table 4.1 (mathematics) and Table 4.2 (science) for the years 2011 to 2019. The tables show the number of score points for test items within each domain and the number of score points considered as falling within each country’s curriculum in each cycle. In 2011 and 2015, the results of the analyses were presented as the average percentage of correct test-item responses for all test items compared to the average percentage corrected for country specific test items. This was changed in 2019 to the calculated student score based on all test items vs the score based on the test items considered to be within the country’s curriculum. Thus, the TCMA analyses cannot be compared before 2019.

Table 4.1 Test-curriculum matching analysis for the years 2011 to 2019 in mathematics
Table 4.2 Test-curriculum matching analysis for the years 2011 to 2019 in science

The TIMSS 2019 curriculum questionnaire was administered online with the suggestion to draw “on the expertise of curriculum specialists and educators”Footnote 2 to judge whether an item falls within the curriculum, but without documentation of the procedures used or measures for reliability checks (Martin et al., 2020).

The test items in TIMSS usually assigned one score point for a correct answer, but in each cycle a few test items could assign up to two score points, with one point assigned for a partially correct answer to the test item. The number of score points that were assigned to the test items solved by students differed slightly between cycles. By contrast, according to responses to the curriculum questionnaires, there were considerable variations in the number of score points assigned to test items in a cycle that were considered within the national curriculum. In general, fewer test items were considered to be covered by the national curricula in science than in mathematics, and there was greater variation in the number of test items not reported as covered by the national curricula in science than mathematics—both between countries and within countries between cycles. As around one-third of the test items were released and replaced with new items in each cycle, some variation should be expected. However, the swapping out of test items does not seem able to explain the changes seen in differences between the total number of test-item score points and score points for test items in the national curriculum. For example, Table 4.2 shows a fall from 149 score points for test items in the Norwegian national science curriculum in the 2011 cycle to 116 score points in the 2015 cycle before increasing again to 146 score points in 2019. Whether the 2015 rise in the number of excluded test items due to previous items being replaced by test items outside the curriculum, should be reflected in the TCMA for the 2019 cycle, which would still include items introduced in 2015. As the tested grade level changed in Norway from fourth to fifth grade in the 2015 cycle, the Norwegian drop seems to be for some unaccountable reason. There were similar patterns in science for Sweden (score points 2011: 152, 2015: 107, 2019: 131) and mathematics for Denmark (score points 2011: 179, 2015: 146, 2019: 176). It seems implausible that these fluctuations were caused by the swapping out of test items alone as all test items were based on the TIMSS Assessment Framework, which only underwent minor revisions between 2011, 2015, and 2019 (Mullis & Martin, 2013, 2017).

Turning our attention to the consequences of the measurement of students’ ability by the TIMSS assessments, the 2019 analyses showed no clear and significant differences between the countries’ average scale scores based on all test items and scale scores based only on country-specific test items. Observed differences ranged from zero points (Finland, both for mathematics and science, Norway for mathematics) to four points on the TIMSS scale (Sweden for mathematics). The rest of the analyses showed differences of one (Denmark for mathematics) or two points on the TIMSS scale.

A significant difference was seen in the percentage of correct test-item score points between the 2011 and 2015 cycles. The difference varied between countries, subjects, and cycles, but there was a clear pattern showing a higher difference in the years when a country had marked relatively many of the test items as being outside the curriculum. The TCMAs showed no clear patterns of coherence between the TIMSS Assessment Framework, and the test items considered part of the individual country’s curriculum for each cycle. In some cases, there appeared to be a strong and stable connection with little variation in the number of excluded items (e.g., for mathematics in Finland and Norway) while in other cases, the differences seemed more substantial (e.g., the generally lower number of included items for mathematics in Sweden). Nonetheless, there was variation across cycles, (e.g., for mathematics in Denmark).

4.2.2 Test Items Covered by the Nordic Curricula Over Time

As described above in Sect. 4.2.1, there were relatively large variations in the number of test items considered not covered in the national curricula. Examining these test items (see the respective appendices of the international reports on each TIMSS cycle, Martin et al., 2012, 2016; Mullis et al., 2012a, 2016a, 2020), revealed inconsistencies across the three cycles in all four countries in terms of which test items were counted as covered in the national curricula. In all countries, there are examples of test items that were considered part of the curriculum in one cycle but not covered in the next cycle—and vice versa—with no obvious link to changes to the national curriculum. While some of these differences might be ascribable to adjustments in the respective curricula, as described in Sect. 4.3.1, it must be assumed that others were caused by national changes in interpretations of which test items were within the curriculum.

Hence, the results of the TCMA should be regarded as an indicator of the degree to which the curriculum covered the assessment framework rather than a clear indication of whether or not specific test items were within the curriculum. Further, these inconsistencies demonstrate that rather than a statement of objective truth, such measures are based on subjective judgments. However, we agree with Wagner and Hastedt’s (2022) conclusion that TIMSS provides a relatively accurate measurement of student performance also in relation to the national curriculum. This certainly seems to be the case in the Nordic countries, with comparisons between the 2019 assessments of achievement based on all test items and achievement based solely on those test items included in the national curriculum showing only minor differences—likewise when looking at cases where relatively many test-items were considered outside the curriculum. Nonetheless, there was an unexplained variance between the three cycles of TIMSS in the number of excluded items within countries.

4.3 TIMSS and the Intended Curriculum

The intended curriculum in a school or country changes over time, governed by changes in national legislation and, to the extent that local adaptation is permitted, changes in municipal or school-level curriculum frameworks. The following section describes broad overall changes in the national curricula for Denmark, Finland, Norway, and Sweden based on the reporting of the respective curricula in the Encyclopedia entries in the international reports for each cycle of TIMSS, including a few comments on the Icelandic curriculum based on national curriculum documents.

4.3.1 The Nordic Curricula and Changes Over Time

A simple overview of curricular development in the Nordic countries is provided in Table 4.3. For the 2019 cycle of TIMSS, all countries reported that a national curriculum was in place with some possibilities for local (municipal or school-level) adaptations.

Table 4.3 Overview of curriculum revisions in the years 2011 to 2019

All the Nordic countries had presented the national curriculum in a format describing the intended content for each subject across a range of grade levels. However, these grade-specific curriculum objectives were not aligned with the grade levels at which TIMSS was conducted, with the exception of science in Denmark and the 2011 TIMSS cycle in Norway. Thus, the curricula described what students should have achieved one (Finland) or two (Denmark for mathematics, Norway, and Sweden for both subjects) grade levels later than those measured in TIMSS. Consequently, it is difficult to ascertain precise learning objectives for the point at which TIMSS was administered as no particular order was stipulated for the implementation of various elements of the curriculum.

Table 4.3 shows that the Nordic countries have changed their respective curricula at different times relative to the last three TIMSS cycles. In Norway, there was no reform of the curriculum between the three cycles, but while TIMSS 2011 assessed fourth grade students, later cycles assessed students in fifth grade. As such, it might be expected that students in the 2015 and 2019 cycles would have covered more of the curriculum objectives than students in the 2011 cycle. A reform of the Swedish curriculum was implemented after the 2011 cycle of TIMSS with some later minor revisions, and reforms were enacted in Finland and Denmark between the 2015 and 2019 cycles. In Iceland, a new curriculum came into force in 2014 following a revision of the school act in 2008.

Looking across the curriculum changes implemented in the four countries that participated in TIMSS, in all cases, a revision seemed to be curriculum-wide, implementing changes in both mathematics and science. According to the TIMSS Encyclopedias (Kelly et al., 2020; Mullis et al., 2012b, 2016b), the curricula all outlined a number of broad motivations and an overall purpose for the subject, as well as underlining the importance of nurturing students’ engagement and self-confidence in the subject. Further, they stipulated general goals, which was also the case for the Icelandic curriculum (Icelandic Ministry of Education Science & Culture, 2014). Revisions shifted descriptions towards more specific learning goals that students should reach by the end of the grade levels covered.

4.4 Students’ Opportunity to Learn—The Implemented Curriculum According to Teachers

Students’ opportunity to learn (OTL) specific content areas (Scheerens, 2017) was measured using the teacher questionnaire, which asked teachers whether they had taught specific content in previous years, during the current school year leading up to TIMSS, or whether they expected to cover it later. Consequently, teachers had no opportunity to indicate that a given content area was not part of the national curriculum. The National Research Coordinators (NRC) also reported on whether or not the various content areas were part of the national curriculum at the fourth-grade level, with the same content areas included in the curriculum and teacher questionnaires. Appendix 1 shows the items used, whether an area was considered within the national curriculum, and provides an overview of the content coverage as reported by teachers for the last three cycles of TIMSS (2011, 2015, and 2019). As the TIMSS Assessment Framework has changed over time, the corresponding questions to both teachers and NRCs concerning content coverage also changed slightly between cycles.

4.4.1 Content Coverage in Mathematics in the Intended and Implemented Curriculum

For fourth grade assessment in TIMSS, mathematics content was divided into three content domains across the three cycles. In 2019, the three content domains were labeled number, measurement and geometry, and data. In each cycle, 50 percent of test items in the assessment relate to the content domain number, while the proportion of test items assessing measurement and geometry decreased from 35 to 30% from 2015 to 2019, with a corresponding increase in test items concerning data from 15 to 20%.

The curriculum questionnaire and teacher questionnaire were used to determine whether topics within each of these three content domains were included in the national (intended) curriculum and covered in class (implemented curriculum), with 17 items in the 2011 and 2015 cycles and 18 items in the 2019 cycle. For example, within the content domain data, teachers and NRCs were asked whether students had or would be expected to have worked with “reading and representing data from tables, pictographs, bar graphs, line graphs, and pie charts”. As this item illustrates, the questions addressing whether a topic had been covered could address multiple sub-topics. An overview of the items is provided in Appendix 1, which also illustrates that there were minor changes in phrasing between cycles.

Based on responses from NRCs, there were fluctuations between countries and cycles concerning the number of topics that were expected to have been covered, with no clear link to changes to the curriculum or the number of topics considered outside the respective curriculum (Sect. 4.2.2). Norway, where there were no major revisions to the national curriculum between cycles, provides an illustrative case. Despite the switch from assessing grade four students to grade five students between the 2011 and 2015 cycles, there was no change in the number of topics that were expected to have been covered. This was followed by an increase in the number of excluded topics in 2019.

Integers and the four basic arithmetic operations seemed to comprise the core content within mathematics, considered part of the curriculum by NRCs across all cycles and countries except for Denmark in 2011, and taught to between 96 and 100% of students across all cycles. All other topics within mathematics seemed to be less central, and with large variations. When a topic was not considered within the curriculum by the NRC, it tended to have lower coverage by teachers, but all topics had been presented to some students, however, in cases where the NRC indicated that a topic was included in the curriculum during one cycle but not the next, or vice versa, teachers’ responses regarding their implementation of this topic in classroom teaching did not reflect such fluctuations.

4.4.2 Content Coverage in Science in the Intended and Implemented Curriculum

Fourth grade science content was likewise divided into three content domains: life science, physical science, and earth science. These domains were covered by 45%, 35%, and 20% of the test items respectively. There was an increase in the number of items in the curriculum questionnaire and teacher questionnaire concerning whether various topics were included in the curriculum and covered in class, rising from 20 items in 2011 to 26 items in 2019. There were likewise changes in how each topic was worded. For instance, in 2011, the topic of fossils was referred to as “fossils of animals and plants (age, location, formation)”; in 2015, the word “understanding” was part of the question on the topic; and in 2019, it was rephrased as “fossils and what they can tell us about past conditions on Earth”, thus shifting the focus.

Among teacher responses, the content domain of life science had the highest coverage, with an apparent correlation between the number of students encountering each topic and whether the topic was considered part of the national curriculum. However, the content domain of life science differed from the content domain of number within mathematics in the sense that no topic was covered by teachers to the same extent as the central topics within number. The topic with the highest coverage within life science was “physical and behavioral characteristics of living things and major groups of living things (e.g., mammals, birds, insects, flowering plants)” with 93% coverage by the end of fourth grade among students in Finland in 2015 and 91% in 2019. However, the vast majority of topics had less than 90% coverage.

In general, more topics were considered outside the curriculum in science than in mathematics by the NRCs, with between 25 and 69% of topics considered not included in the country’s national curriculum in a given cycle. The exception was Sweden, which only reported one topic that was not included in each of the three cycles. Of the 26 topics identified in the 2019 questionnaires, 7 topics were considered not included in the Danish curriculum, and 17 not included in the Finnish and Norwegian curricula.

Compared to mathematics, larger variations were found in teachers’ coverage of topics within science, which was to be expected given the NRCs’ indication that fewer topics were included in the respective national curricula. However, teachers reported covering all topics to a limited extent, with some correlation as to whether or not NRCs indicated that the topic was included in the national curriculum.

4.5 The Attained Curriculum—Student Learning and Its Relationship with the Implemented Curriculum

In TIMSS, the attained curriculum is measured by an overall score and scores within each of the three content domains in both mathematics and science, based on all test items. This implies that the measure of attained curriculum covers more than the intended and implemented curriculum, as described above in Sect. 4.2. The following section describes how this measure relates to the implemented curriculum.

4.5.1 The Attained Curriculum in Mathematics

Figure 4.1 shows the development in teachers’ reported content coverage within the three different mathematical content domains across the three cycles, as well as student attainment. As described in the previous sections, there were some variations in content coverage within countries. It is notable that, at the national level, an increase or decrease in content coverage in one content domain was shadowed by similar changes in other content domains. One notable exception can be observed in Finland, where coverage of the data domain decreased sharply from 2015 to 2019 following changes in the curriculum.

Fig. 4.1
figure 1figure 1

Note One bar graph per country.

Bar graphs showing achievement in mathematics content domains for the years 2011 to 2019 for all countries.

Fig. 4.2
figure 2

Structural equation model of the relationships between content coverage and scores. Note Content coverage = percentage of students taught topics.

Average student scores within the different content domains generally fluctuated in line with the overall score, with minor variations between cycles. The large increase in scores seen in Norway from 2011 to 2015 can be attributed to the shift in the target population from grade four to grade five students between the two cycles. However, it is noteworthy that this shift did not seem to lead to major changes in the reported content coverage in geometry, which decreased with each cycle from 2011 to 2019. While the content domains numbers and data saw minor increases in coverage from 2011 to 2015, for data, this increase was not significantly greater than the increase between the 2015 and 2019 cycles, suggesting a general trend to focus more on topics related to processing and interpreting data rather than a direct result of an additional year of teaching.

The data used to produce Fig. 4.1 are presented in greater detail in the table in Appendix 2, divided into teachers with and without specialization as a mathematics teacher. This revealed differences in both content coverage and achievement scores, with specialized mathematics teachers in general covering slightly more of the curriculum and their students achieving slightly higher scores on average, although the differences go in both directions. It should be noted that not all countries require specialized mathematics training to teach the subject.

Appendix 3 presents the results of a structural equation model (SEM) of the relationship between content coverage in the three content domains and student scores in these domains for all three cycles, as described in Chap. 3. The model, as shown in Fig 4.2, controls for student socioeconomic background (SES) using the student-reported number of books in the home, as students in general are found to profit differently from teaching quality based on their SES level (Atlay et al., 2019).

Some clear relationships are seen in certain countries in some years. However, although the number of significant predictions (all being positive) exceeded what would be expected by chance alone, no clear pattern emerged in the analyses in terms of which content domain predicts scores or across countries.

4.5.2 The Attained Curriculum in Science

Content coverage for the three content domains life science, physical science, and earth science fluctuated slightly between cycles and between countries, with physical science having a lower degree of coverage than the other two content domains in all countries. A higher proportion of missing data was observed for content coverage in science than for mathematics in all countries and cycles, which means that the results should be interpreted with some reservations, especially in relation to Denmark where the rate of missing data exceeded 50 percent for all content domains in each cycle. Appendix 4 provides similar content to Fig. 4.1 for the subject science.

Examining student scores in the different content domains as well as the overall score revealed similar patterns to those found for mathematics. The scores followed similar trends within countries across cycles, with a minor exception being an increase in the Danish students’ average score in earth science between 2015 and 2019 while scores in the other domains decreased between these two cycles.

Dividing student scores and content coverage between teachers with and without subject specialization in science, as presented in Appendix 2, some variations were found in both content domain scores and content coverage—both within countries between cycles and within cycles across countries. Once again, it should be noted that subject specialization in science is not a requirement in all countries.

The SEM analyses assessing whether coverage in science content domains predicted student scores within these domains are presented in Appendix 4. These analyses revealed a slight deviation from the patterns previously described for mathematics. Content coverage in physical science significantly predicted student scores in all three content domains for Denmark in 2011 and Finland in 2015, as well as predicting Swedish students’ scores in physical science and earth science in 2019. However, a negative correlation was found between content coverage in earth science and scores in physical and earth science in Finland in 2019. Thus, while there seemed to be some significant correlations between content coverage and achievement within and across content domains, there were no consistent patterns across countries or cycles.

4.6 Conclusion

The starting point for this chapter is the didactical expectation that there is (and should be) a connection between the intended, implemented, and achieved curriculum. The analyses presented here do not corroborate the conclusion from Wagner and Hastedt (2022) that TIMSS can be used to draw inferences about the performance of education systems, including those of the Nordic countries, due to the study’s coverage of national curricula. Instead, the analyses highlight difficulties in measuring and describing the intended, implemented, and achieved curriculum.

The chapter shows that the measures of implemented curriculum in TIMSS correlate with the measures of achieved curriculum in terms of students’ scores in mathematics and science (for further analyses, see Chap. 6). However, the results also illustrate divergence between the different measures of intended and implemented curriculum, which are less reliable than one might hope. Based on the analyses, a range of possible explanations for the low reliability of the measures can be identified.

The first of these explanations relates to differences in how content is defined across the different measures. National curricula are described in general terms with content covering broad areas that students should be taught—often within a time span covering a longer period than the grade where outcomes are measured in TIMSS. As a result, decisions as to whether or not a TIMSS measure of intended, implemented, or achieved curriculum falls within the national curriculum must be based on subjective judgment by the NRC.

Secondly, the terminology that teachers use in their daily work may differ from the terms used in the questionnaires developed to measure teachers’ implementation of the curriculum. This can once again introduce reliability issues by requiring interpretations of questions by teachers, which might be complicated further by some questions being ambiguous and hence, force teachers to make a choice about content coverage if only parts of the content are covered. As outlined in the description of curriculum development processes, local adaptations of national curricula can likewise muddy the waters.

Thirdly, whether or not a test item concerns a topic covered by the national curriculum is based on an assessment of whether or not it is included in the curriculum’s description of learning objectives and an estimate of the point at which it is taught, given that the curricula generally cover a period extending beyond the TIMSS assessment. For example, there are test items covering the order of operations when using parentheses in mathematics. While not directly mentioned in the Danish curriculum, this is something students would be expected to have learned by the end of the period covered by the curriculum, which ends with sixth grade, but it is difficult to determine more precisely whether this is a topic that will have been covered at the time of the TIMSS assessment. One possible solution might be to consult commonly used textbooks to determine whether the use of parentheses is generally introduced.

Fourthly, and especially relevant when measuring the intended curriculum, some measures are reported for the whole country by a single person, whether the NRC or someone delegated the task by the NRC. Thus, the uncertainties outlined in our previous point will have more serious implications for reliability than the measures conducted at the teacher or student level, where the samples are much larger. These variations in measures may be explained by a change of NRC or in the staff reporting on behalf of the NRC with other content knowledge or other preferences related questionnaire answering.

As indicated throughout this chapter, changes between cycles in the TIMSS Assessment Framework have led to changes in the phrasing of questions collecting information on the intended and implemented curriculum. One limitation of the analyses presented here is that they do not consider how such changes are implemented in the national translations of the teacher questionnaire or whether the translations have been revised without any changes in the international source.

The identified changes in the TIMSS Assessment Framework and its measures of content coverage at different levels are to be expected as they reflect developments in national curricula and teaching practices. As such, we conclude that this framework is well-suited to measuring attained curriculum in the forms of Nordic students’ achievement in mathematics and science and to monitoring changes over time despite reliability issues in measuring development in specific areas within the curriculum.