The Role of Digital, Formative Testing in e-Learning for Mathematics: A Case Study in the Netherlands

Repeated formative, diagnostic assessment lies at the heart of student-centred learning, providing students with a continuous stream of information on the mastery of different topics and making suggestions to optimize the choice of subsequent learning activities. When integrated into a system of e-learning, formative assessment can make that steering information instantaneous, which is a crucial aspect for feedback in student-centred learning. This empirical study of the role of formative assessment in mathematics e-learning focuses on the important merit of integrating these assessments into a system of state or national testing. Such tests provide individual students with crucial feedback for their personal learning, teachers with information for instructional planning, and curriculum designers with information on the strengths and weaknesses in the mastery states of students in the program and the need to accommodate any shortcomings. Lastly, they provide information on the quality of education at state or national level and a means to monitor its development over time. We shall provide examples of these merits based on data from the national project ONBETWIST, part of the Dutch e-learning program Testing and Test-Driven Learning. La repetida evaluación diagnóstica y formativa es uno de los elementos clave del aprendizaje centrado en el alumno, ya que ofrece a los estudiantes un flujo continuo de información sobre su nivel de conocimientos en distintas materias y permite optimizar la posterior elección de actividades de aprendizaje. Cuando se integra en un sistema de aprendizaje virtual, la evaluación formativa puede convertir esta información en instantánea, lo que constituye un aspecto crucial para el retorno de información en un aprendizaje centrado en el alumno. Este estudio empiríco sobre el papel de la evaluación formativa en el aprendizaje virtual de matemáticas se centra en la ventaja de integrar estas evaluaciones en un sistema national o estatal de exámenes. Estos exámenes proporcionan a los estudiantes una información crucial para su aprendizaje personal; suministran a los profesores los datos necesarios para llevar a cabo la planificación docente; y ofrecen a los encargados de elaborar los planes de estudio la información necesaria sobre las fortalezas y las debilidades de los estudiantes de cada programa y la necesidad de solucionar cualquier deficiencia. En último lugar, ofrecen información sobre la calidad de la enseñanza a escala national o estatal y son un medio para controlar su desarrollo a través del tiempo. Daremos ejemplos de todas estas ventajas según los datos del proyecto nacional ONBETWIST, que forma parte del programa de aprendizaje virtual holandés «Los exámenes y el aprendizaje basado en exámenes».


Introduction
According to a recent, domain overarching meta-analysis of empirical educational studies (Hattie, 2008), feedback is the most eff ective instructional mechanism. Feedback can have many diff erent sources, and in student-centred learning, students' mastery or lack of mastery to perform a specifi c task is an important part of that feedback. Formative assessment is a means to repeatedly assess a student's mastery in order to establish the subsequent learning step, and its importance is extensively documented, in the context of both traditional learning (Donovan et al., 2005;Pellegrino, et al., 2001) and e-learning (Juan et al., 2011). Recently, there has been some interest in systematically combining formative assessment with the use of state or national tests. In the U.S., this is termed 'interim assessment' (Beatty, 2010). According to the U.S. national Research Council, interim assessments "are assessments that measure students' knowledge of the same broad curricular goals that are measured in annual large-scale assessments, but they are given more frequently and are designed to give teachers more data on student performance to use for instructional planning. Interim assessments are often explicitly designed to mimic the format and coverage of state tests and may be used not only to guide instruction, but also to predict student performance on state assessments, to provide data on a program or approach, or to provide diagnostic information about a particular student. Researchers stress the distinction between interim assessments and formative assessments, however, because the latter are typically embedded in instructional activities and may not even be recognizable as assessments by students …" (Beatty, 2010, p. 6).
Continuous evaluation processes are at least as crucial in mathematics education as they are in other disciplines (Donovan et al., 2005;Taylor, 2008;Trenholm et al., 2011). Beyond assessment for development and assessment for achievement, both of which are generally recognized as important assessment functions, formative assessment in mathematics education functions as 'transition' or 'placement' assessment, particularly in the fi rst year of university education (Taylor, 2008). In their comparative study of long-term online mathematics teaching experiences, Trenholm et al. (2011) provide four major case studies, all of which point to continuous assessment as a key factor of success.
However, empirical studies into the eff ect of formative assessment in mathematics education remain scarce (Wang et al., 2006).
In the netherlands, SURF, the Dutch collaborative organisation for higher education institutions and research institutes aimed at innovations in ICT, initiated the nationwide program Testing and medio para controlar su desarrollo a través del tiempo. Daremos  Test-Driven Learning to stimulate the design and use of such interim assessments, among other things. Part of this program is the onBETWIST project (http://www.onbetwist.org/), focusing on mathematics learning, both in the transition from high school to university, and in the first year of university, using e-learning with the support of these interim assessments. The onBETWIST project builds on earlier projects, such as SURF projects nKBW (http://www.nkbw.nl/) and TELMME (www. telmme.tue.nl), and EU projects S.T.E.P. (www.transitionalstep.eu/) and MathBridge (http://www.mathbridge.org/). All these projects focus primarily on the design and use of mathematics e-learning tools to facilitate the transfer from high school to university, e.g., for international students who have been educated in school systems whose premises differ considerably from those on which the university curriculum is built. offering flexible bridging courses in mathematics when the inflow of students is too heterogeneous in terms of prior mathematics mastery to start immediately with class-based regular university teaching is, in short, the main aim of all these initiatives. Reviews of some of these endeavours can be found in Brants et al. (2009), Rienties et al. (2011 and Tempelaar et al. (2008). In our companion paper, Tempelaar et al. (2011), we report on the outcomes of bridging education in the context of the nKBW project for one Dutch university. This university is a typical exponent of European internationalisation of higher education, where international students account for more than 70% of the total. Although most of these students are not very international in terms of the geographical distance they have to bridge, there is huge diversity with respect to the high school education they have received. Secondary school systems, even in neighbouring countries like the netherlands, Germany and Belgium, are very different, producing major heterogeneity in mathematical knowledge and skills that prospective students have. Such heterogeneity means that there is a considerable need for bridging education in the transfer from secondary to university education, and it offers an outstanding case to demonstrate the advantages of interim assessment. The tool makes use of server-based computing and can be characterised as supporting individual distance learning. The ALEKS system (see also Doignon & Falmagne, 1999;Falmange et al., 2004;Tempelaar et al., 2006) combines adaptive, diagnostic testing with an e-learning and practice tutorial in several domains relevant to higher education. In addition, it provides lecturers with an instructor module, where students' progress can be monitored in both learning and assessment modes.
The ALEKS assessment module starts with an entry assessment in order to evaluate a student's knowledge of the domain. Following this assessment, ALEKS delivers a graphic report analyzing the student's knowledge within all curricular areas of the course. The report also recommends concepts on which the student can begin working; by clicking on any of these concepts or items, the student gains immediate access to the learning module. See Figure 1 for a sample of the learning report.
Some key features of the assessment module are that all problems require the student to produce authentic input, all problems are algorithmically generated, and assessment questions are generated from a carefully designed repertoire of items, thus ensuring comprehensive coverage of the domain.
The assessment is adaptive: the choice of each new question is based on the aggregate of responses to all previous questions. As a result, the student's knowledge state can be found by asking only a small subset of the possible questions (typically 15-25). Both the principles of the UM summer course, and the use of the e-tutorial ALEKS, are described in more detail in Tempelaar et al. (2011). An important

Participants
This study is based on the investigation of five cohorts, of about equal size, of first-year students at a Business and Economics School in the south of the netherlands (academic years 07/08, 08/09, 09/10, 10/11 and 11/12). Programs offered by this school deviate from mainstream European university education in two important ways: the student-centred learning approach of problem-based learning and a strong international orientation (the programs are offered in the English language and mainly attract international students). of the 3,900 students in these five cohorts, 71% have an international background (mostly European, and just over 50% from German-speaking European countries) and 29% are Dutch. of these students, 36.7% are female and 63.3% are male. The mean age of the students was 20.12 years, with a range of 17-31 years, though most students were in their teens: the median age was 19.82 years. They were all enrolled on a business and economics program.
A large majority of these students took part in the administration of at least one diagnostic entry test: 3,014. A small minority of the students took part in the voluntary summer course: 622, of which 267 passed and 335 failed (did not achieve a 55% mastery level in ALEKS).
After finishing the summer course in late August, the regular program of bachelor's degree studies in International Business and International Economics started in early September. Both programs begin with two eight-week (half semester) integrated, problem-based learning designed courses, each having a 50% study load. The first course is an introduction to organizational theory and marketing. The second course, called Quantitative Methods I or QM1, is an introduction to mathematics and statistics. The very first activity in the QM1 course is to administer the mathematics entry test. The coverage of the QM1 course mirrors the circumstance that strong heterogeneity in mathematics mastery, due to students being educated in different national systems and at different mathematics levels, necessitates a fair amount of repetition. Most of the topics covered are repeats of those taught in grades 11 and 12 of Dutch secondary schooling, basic mathematics level (the last two years of high school), with some time devoted to new topics. There is no overlap between QM1 and the content of the summer course, since that content covers those topics taught in grades 7-10 of secondary schooling (middle school and first year of high school).
The major component of heterogeneity in mathematics mastery is caused by the level of mathematics schooling in high school. European countries generally distinguish between two different levels of high school mathematics: basic and advanced. of the students in this study, education), and were taught mathematics at one of two different basic levels (A1 or A1,2) or one of two advanced levels (B1 or B1,2). The lowest level, A1, prepares students for studies in arts and humanities, but does not qualify them to take social sciences studies such as business or economics, so what remains is only the higher basic level: DutchA12 (18.6%). Another two tracks are at advanced levels: DutchB1 (4.5%, preparing for life sciences studies) and DutchB12 (2.3%, preparing for technical studies). Due to a reform in mathematics education in the netherlands, students taking the advanced track in high school from the last two cohorts (10/11 and 11/12) were educated in an undifferentiated advanced track: DutchB (5.4%). A majority of students (53.1%) was educated in a German-speaking high school system. That system again has two different levels of prior mathematics education, the advanced level or Leistungskurs, and the basic level or Grundkurs. Students taking the basic track have a further choice to select mathematics as one of their four subjects in the final examination or Abitur (students in the advanced track always do so). As a consequence, there is one advanced track: GermanLK (13.9%), and two basic tracks: GermanGKA (25.0%) and GermanGKnA (13.8%), where the last category has opted out with regard to final examination. Again, in the last two cohorts, a new but very small category of students can be distinguished owing to a reform in mathematics education in some of the German states: the merger of basic and advanced tracks into one single, undifferentiated level of mathematics education: GermanUndif (0.8%). In comparison to other European universities, there is a rather large share of students having an International Baccalaureate (IB) diploma (6.9%). IB again allows one advanced level (HL) to be distinguished from two basic levels (SL and StudiesSL), generating the categories IBMathHL (1.5%), IBMathSL (5.1%) and IBMathSSL (0.3%, but excluded from this study due to its small size). The remaining students (11.9%) are educated within a national system outside the Dutch or German-speaking part of Europe. For this last category, students were asked to classify their own prior mathematics education at the level of either mathematics major or mathematics minor. This results in the categories othMathMajor (6.2%) and othMathMinor (5.7%).

Interim assessments
In this study, we investigate the role of two different interim assessments. Both are designed for use in the transfer from high school to university and, for that reason, are labelled as entry assessments in the two projects for which they were designed. We shall adhere to that convention.  AlgebraicSkills of the main prior mathematics education groups in our study. We shall focus on the component algebraic skills in most of this section, since it is at the heart of the project. However, the analysis of total scores in the entry test results in rather similar outcomes, with identical patterns, but at a slightly lower level.  When the entry tests were administered for the very first time in 2007, we were surprised to find such a major underperformance of national (Dutch) students compared to international students. For example, national students with the most advanced prior mathematics education, DutchB12, scored no more than 60%, against a 62% score for German students educated at basic level (GermanGKA) and a 77% score for German students educated at advanced level (GermanLK).

Prior education and the 3TU and NKBW entry tests
needless to say, the scores of Dutch students from the less advanced tracks were even lower: 41% for DutchA12 and 53% for DutchB1. Indeed, they were the lowest of all other types of prior education. However, given the raison d'être of our national bridging project, this outcome was not that surprising. In fact, it did provide justification for the project, since Dutch secondary education proved to lack in preparing students for university, especially in the area of algebraic skills, not only in an absolute sense, but also in a relative sense, when comparing Dutch students to international students.
Since 2007, several remarkable developments have occurred. School reforms in Dutch secondary mathematics education have improved the performance of advanced track students year after year, for both the B1 and B12 tracks. The merger of both of these tracks into one DutchB track was another successful step in terms of mastery of algebraic skills: students from that broad track achieved 72% and 79% scores, higher than ever before. And by doing so, they approached the score of German advanced track students (74%, GermanLK). Scores of Dutch basic track students, however, remained at the very lowest level.
Amongst the three different types of international prior education, radically different developments can be observed. Mastery levels of the advanced tracks are relatively stable and high (greater variability present in the scores of IBMathHL, though that may simply be due to sampling variability, given the smaller size of this group, 15 on average). The othMathMajor category seems to demonstrate decreasing scores, but, being a residual category, this is not easy to interpret.
Mastery levels among basic track students do, however, signal a decline over the years for both German and IB students, with very marked developments for the GermanGKnA and IBMathSL groups. As a consequence, mastery levels among all tracks of basic mathematics education (except othMathMinor) converge to worryingly low levels -ranging between 40% and 50% -that have been present in the netherlands for some time. In contrast to the success of the Dutch educational reform, the reform in Germany of removing different tracks to create an undifferentiated system seems to be less successful: the score is certainly not higher -and is more likely to be lower -than that of the basic track still in existence in other states. However, this group is somewhat small to place a lot of trust in this outcome.
The assessment of the German educational reform also depends on the type of entry test applied: changing to the nKBW entry test, which is based rather more on conceptual understanding and somewhat less purely on skills, German undifferentiated system students score midway between the basic and advanced tracks (60% versus 59% and 69%). In addition, the other educational reform is assessed differently: the new DutchB group scores similarly to, or even slightly lower than, students from the advanced track the year before. Besides being more conceptually oriented, the nKBW entry test is clearly less difficult (scores are uniformly higher) and less discriminative between the basic and advanced tracks than the 3TU entry test: see Figure 3.    Where the AlgebraicSkillsNo3 scores are not beyond guessing level in some group and year combinations, we at least observe improved mastery over time, especially for the Dutch students, who were the weakest in 2007. In contrast, scores for AlgebraicSkillsNo2 in the Dutch basic track are even lower than guessing rate, and do not indicate any sign of improvement over time: students continue to become strongly attracted to the third answer option, apparently following the strategy of eliminating equal quadratic terms in numerator and denominator. Beyond very strong differences between outcomes for the Dutch and other European educational systems, both items, but especially the first one, also demonstrate considerable mastery differences between basic and advanced track students. This is remarkable in its own, since algebraic skills are typically taught in junior high school, to students in both the advanced and basic tracks.

Summer course participation and the 3TU and NKBW entry tests
Since the mathematics bridging course offered by the program runs in the summer, participation is voluntary, which allows student performance in the entry tests to be compared for three different groups of students: successful summer course participants, unsuccessful summer course participants and non-participants. Figure 6 and 7 contain student performance scores in both types of entry test in the AlgebraicSkills section and, as reference material, in two other topics in the nKBW entry test.   Figure 6. AlgebraicSkills mastery in the 3TU entry test, by summer course participation

AlgebraicSkills scores
Class year of summer course participants: in three of five cohorts, their mastery level is significantly lower than the mastery level of the non-participants, indicating that these students initially made the right decision to register for the bridging course, but failed to materialize that decision.
The first panel of Figure  The strongest effects are visible in Figure 9, containing the passing rates for the QM course. Since most students score in the region of 55% (required to pass), the effects of summer course participation are stronger on passing rates than on absolute score. Differences in passing rates between successful summer course participants and non-participants are statistically significant at 1% level in all class years except for 2008, where significance is at the 10% level.

Prior education and summer course participation, and the 3TU entry test
In order to properly disentangle the combined effects of prior mathematics education and summer course participation, it is necessary to analyse the effect of bridging education separately for students of each of the different types of prior education. Figure 10 provides the outcomes of one sample of such an analysis. D u t c h A 1 2 S C F a il D u t c h A 1 2 N o S C D u t c h A 1 2 S C P a s s G e r m a n G K n A S C F a il G e r m a n G K n A N o S C G e r m a n G K n A S C P a s s G e r m a n G K A S C F a il G e r m a n G K A N o S C G e r m a n G K A S C P a s s G e r m a n L K S C F a il G e r m a n L K N o S C G e r m a n L K S C P a s s partly because many German students interrupt their studies after finishing high school and go to university only after a break of often two or more years. These students, even if educated at advanced level, regard the summer course as an effective refresher of their mathematics mastery. For all four prior education categories, Figure 10 contains three panels, corresponding to failing, non-participation and passing the summer course. As expected, we observe that entry test scores demonstrate both a prior education effect and a summer course participation effect. The summer course effect seems to be weakest among advanced track students, which is no surprise: beyond some refreshment, these students cannot gain much from participating in the bridging course. Stronger effects are to be found for students educated at the basic level. But beyond the systematic differences, there is more sampling variability present, due to smaller samples, which makes interpretations from these decomposed data less easy.

Cluster analysis of 3TU entry test scores
A very different approach to analysing data derived from entry test takes is to look at groups of students with similar score patterns for the different items in the test. We did this by applying cluster analysis; Figure 11 contains a graphical representation of the outcomes of such a cluster analysis.  The analysis is performed on all test takes together, by adding all five cohorts. In the analysis, each student is allocated to one of three clusters, where the clusters are calculated to maximize variation between clusters and to minimize variation within clusters. Such cluster analysis can be repeated within each of the prior education groups; in this contribution, we shall limit ourselves to the outcomes of cluster analysis applied to all groups together. In most of these cluster analyses, distinguishing three clusters works quite well and, in most cases, these clusters are easily interpretable. As one can see from Figure 11, the clusters represent high scoring students, low scoring students, and a group of students whose performance is between the two. The latter middle group is by far the most interesting one, especially since the students perform similarly to the high scoring students for some items, and similarly to the low scoring students for other items. In Figure 11, students in the middle cluster score similarly to those of the high cluster for items belonging to the AlgebraicSkills section, with the third item (discussed earlier) as a potential exception. In contrast, students in the middle cluster score similarly to students in the low cluster, or even lower, in items in the Log&power section. They score highly again in the Equations section, especially on the third item, which requires them to find the zeros of a standard quadratic equation.
Deviant patterns are here for the second question, which acts as a kind of trick questions: it asks for the number of different zeros of a third order polynomial, in which two zeros coincide. And the last question, where beyond solving an equation, students need to know how to find a tangent line.
In short, middle cluster students act on the same level as high cluster students when items can be solved by the straightforward application of regular solution strategies taught in high school, but fall back to the level of low cluster students when items deviate from the regular pattern of class exercises.

Conclusions and discussion
The repeated use of formative, diagnostic tests is a crucial component of any mathematics e-learning program, providing the feedback required for the optimal steering of individual learning. The use of broad 'interim' assessments for these purposes brings many additional advantages. First, it allows the strengths and weaknesses of different prior education tracks to be distinguished for programs attracting large numbers of international students educated in very different high school programs.
Second, when the heterogeneous inflow is accommodated by implementing bridging education, it allows the effects of prior education and of remedial education to be properly disentangled. Finally, it allows different clusters of students with very different patterns of mathematics mastery to be distinguished. Doing so provides important information -beyond that for the individual students -for instructional planning, for regular curriculum design, for the implementation of bridging programs, for the streaming of education and even for admission regulations. Inferential statistical analyses indicate that first-year students using these formative assessments and participating in the summer course (based on such a formative assessment strategy) are substantially more successful (in the sense of statistical significance) than students who do not. Both students and instructors evaluate the facilities of online formative assessment as highly positive. However, it is difficult to assess the evaluation of the developmental and placement functions of formative assessment separately from the evaluation of the achievement functions.
As in many other programs, online formative assessment is introduced together with online lowstakes testing in the form of quizzes. Positive evaluation of formative assessment is therefore not to separate from the appreciation of low-stakes testing and the availability of online tools to prepare these quizzes.
Future research will have a dual focus. First, formative tests, specifically entry tests, provide crucial feedback with regard to the mathematics proficiency of students from different backgrounds. Given the recent major reform in Dutch secondary mathematics education, longitudinal monitoring of the mathematics proficiency of prospective students from different secondary education systems will continue to serve an important function. Second, future research will focus on the role formative tests can play in both providing continuous and instantaneous feedback to students, and in making education itself more adaptive, with the aim -in both instances -of optimising the learning process.