Predicting first-year university progression using early warning signals from accounting education: A machine learning approach

ABSTRACT In this paper, we examine whether early warning signals from accounting courses (such as early engagement and early formative performance) are predictive of first-year progression outcomes, and whether this data is more predictive than personal data (such as gender and prior achievement). Using a machine learning approach, results from a sample of 609 first-year students from a continental European university show that early warnings from accounting courses are strongly predictive of first-year progression, and more so than data available at the start of the first year. In addition, the further the student is along their journey of the first undergraduate year, the more predictive the accounting engagement and performance data becomes for the prediction of programme progression outcomes. Our study contributes to the study of early warning signals for dropout through machine learning in accounting education, suggests implications for accounting educators, and provides useful pointers for further research in this area.


Introduction
First-year university progression is in some countries, notably those in continental Europe, an acute concern as up to 50-60% of students can drop out in the first year (Arias Ortiz & Dehon, 2013).This high dropout rate is partly a consequence of the institutional and regulatory setting in which European universities operate.In particular, many continental European universities do not require entrance exams to be admitted to university programmes such as (business) economics (Broos et al. 2020), as they operate a government-mandated 'open gate' admission policy.Therefore, the prediction of first-year university progression, in order to assess the possible dropout-rate, is important for several reasons.First, the earlier the university knows which students are at risk of dropping out, the longer the window of opportunity to remedy the situation (Wakelam et al., 2020).Instructors may use a range of remedial or corrective strategies to interact with at-risk students and provide them assistance to enhance their learning process (Chen et al. 2021).Second, high dropout rates may have financial consequences as some universities are funded by their governments on the basis of number of students graduating.Such 'no graduation, no pay' funding regimes promote selection 'after' the gate, with policies such as compulsory academic dismissal after non-performance in the first year (Sneyers & De Witte, 2017).Third, information about dropout is perhaps of most value for the students themselves.It is a profoundly negative experience when students drop out, they may suffer personal stress and trauma as a result, and face financial consequences and potential impact on families (Wakelam et al., 2020).Consequently, dropout is an important topic on the agenda of many universities.
Students entering the university embark on a journey in their first undergraduate year, with several possible destinations at the end.Broadly defined, we can identify three types of first-year university progression outcomes: students may (a) drop-out and leave the university at the end of the first year, (b) fail for some courses and decide to stay to repeat these courses, and (c) pass all courses and proceed to the next year.Several data points become available during their journey, including midterm test scores, exam scores on first semester courses, exam scores on second semester courses, and retake exam scores.
An area of longstanding interest is the role of accounting courses in the first-year business economics curriculum (e.g.Bence & Lucas, 1996).The desired contents and structure of these introductory accounting courses have been subject of study for some time (Geiger & Ogilby, 2000;Saudagaran, 1996).Financial accounting is considered to be a challenging course by students (Jaijairam, 2012, Stewart & Dougherty, 1993) and some programmes have two consecutive accounting courses.Apart from providing students their first exposure to accounting, these accounting courses may have another valuable role: they may foreshadow the overall outcome of the student in their first year of university study.Accounting educators often pick-up valuable signals of early engagement and early performance in these introductory accounting courses.We call these signals of early engagement and early performance the early warning signals.We would intuitively expect that early signals in accounting education are a predictor of first-year outcomes (dropout, repeat, pass), and the extent to which this is the case is the focus of the current study.Therefore, the first research question of our study is to explore whether the early warning signals of a financial accounting course can predict the progression of first-year university students.
The literature on dropout at universities is extensive and mainly focuses on a set of variables that students 'bring with them' when entering the university.We call these sets of variables backpack data.They include demographic variables, such as gender and age (Araque et al., 2009;Murtaugh et al., 1999;Paver & Gammie, 2005) and variables capturing their intelligence or prior academic achievement (Duff, 2004).This data may or may not be shared by the students with the university, based on privacy regulations.Furthermore, if the open-gate admission system does not set restrictions in terms of the majors attended in secondary education, heterogeneity among the students is large.Therefore, the second research question of this study is to examine whether 'early warning' signals in accounting education can be more successful in predicting the progression for students in their first year of university study than the 'backpack' variables.
Predicting student outcomes remains understudied (Namoun & Alshanqiti, 2020), and recent technological advances such as learning analytics or artificial intelligence are increasingly being used to predict academic performance.Namoun & Alshanqiti (2020) showed that models that predict student learning outcomes have been on the rise since 2017, with a significant proportion of articles published in computer science and information systems journals.In line with these recent advances, this paper uses machine learning to predict dropout in the first year.Recent literature shows that machine learning is a suitable approach to predict student dropout (Tomasevic et al., 2020).An advantage of machine learning is that some algorithms have the ability to show the relative predictive strength of each variable.This is an advantage because it allows university policy makers to focus on the most predictive variables.Machine learning has also been used to detect both at-risk and excellent students in the early stages of a course (Riestra-Gonzales et al., 2021).
A supervised machine learning approach splits the sample in two parts: one is used for training the algorithm (the training set) and the other part is used to test the predictive power of the algorithm (the test set).In this paper, we apply the random forest algorithm to the training set to generate the predictive classification function.We then apply this function to the test set to evaluate its performance.The third research question of this paper is to explore whether a machine learning approach can be used effectively to predict student progression in the first year.
In summary, to explore the predictive value of early warning signals in accounting education on first-year university outcome, we study three research questions: 1. To what extent can 'early warning' signals in accounting education help to predict progression of first-year university students?2. Are 'early warning' signals in accounting education better able to predict progression than more general 'backpack' data? 3. Can a machine learning approach be used effectively to classify outcomes using both backpack and 'early warning' signals in accounting education?
To address these research questions, we use a dataset of first-year undergraduates in business economics, with a financial accounting course in both semesters of the first year.

Contributions
The current study predicts student journeys throughout their first year, based on socalled backpack variables and early warning signals from accounting courses (tests and accounting exams).This study has three important contributions.First, this paper is one of the first papers in accounting education that employs machine learning to predict dropout.Second, unlike other studies that used machine learning to predict learning outcomes (e.g.Tsiakmaki et al., 2020), the current study focuses on the outcome of the entire first year in a business economics programme, as opposed to one single course.We also includes the three categories that are important from the students' perspective, i.e. dropout (leave the university), repeat year (repeat one or more courses) or pass (credit for all courses), instead of using a dichotomous (pass/fail) outcome variable.Third, this study combines several variables to predict outcome, i.e. backpack variables (characteristics of the student entering the university) and early warnings from accounting education (which includes academic performance on tests and exams).By combining these variables, we are able to compare the predictive value of each group of predictors, an analysis that has not previously been done in the context of accounting education.We show how it is possible to provide information about the strength of the predictors at different 'calling stations' along the journey.The sooner faculty recognise at-risk students, the sooner the university will know which students are at risk of dropping out, and the more likely it is that the situation can be remedied.'Supervised machine learning and prediction of dependent variables' section will provide information about machine learning.'Prior literature on predicting student performance' section will review prior literature.'Methodology' section describes the methodology, followed by a presentation of the results in 'Results' section.A discussion is provided in 'Discussion and conclusion' section.Finally, 'Limitations and future research' section gives an overview of the limitations, avenues for further research, and conclusions.

Supervised machine learning and prediction of dependent variables
Machine learning, an application of artificial intelligence, is widely used in a variety of application domains (Kucak et al., 2018).Relevant domains include university dropout (Lykourentzou et al., 2009) and student performance (Hämäläinen & Vinni, 2010).The number of published articles in the field of machine learning has surged since the last decade (Kolachalama & Garg, 2018), and it would be impossible to review them all here.The branch of machine learning relevant to our study is supervised learning, which enables algorithms to learn without explicitly programming functions (Samuel, 1959).In supervised learning, algorithmic models learn through a systematic study of training data, and they improve their performance on tasks through experience, as more training data becomes available (Mitchell, 1997).
Applying this approach to our domain of interest, a model learns and predict the outcomes of a first group of students (training set).At the end of the training process, a classification function is ready to use.This classification function is then used on a second group of students (the test set) to test the predictions produced by the function.Figure 1 visualises the steps of the machine learning approach.
The out-of-sample prediction approach should be distinguished from a within-sample analysis (Bao et al., 2020).A within-sample approach does not focus on prediction of outof-sample cases, but instead is primarily occupied with the relationship between independent and dependent variables.A technique such as regression analysis can be used in both approaches, either as the algorithm to connect independent and dependent variables in the within-sample approach, or to generate the prediction function in the outof-sample approach (van der Heijden, 2023).In this study, we use the random forest classification method (Breiman, 2001;Ho, 1995) to generate the classification function.This method iteratively constructs a number of decision trees (or 'if-then statements') to predict the output variable from a set of input variables.The parameters of the decision trees are iteratively fine-tuned using the training set of data, such that the final function contains a set of decision trees that best predict the outcomes from the sample.
To evaluate the efficacy of the out-of-sample prediction approach, different metrics are used, such as accuracy, precision and recall.A measure called precision is used to measure the percentage of false positives, and a measure called recall is used to measure the percentage of false negatives.F 1 is the harmonic mean of precision and recall, and gives equal weighting to false positives and false negatives.A summary performance metric used to assess whether the function accurately classified the outcome is the F 1 statistic (Geron, 2019).The F 1 balances precision and recall, and takes into account two types of errors: false positives and false negatives.

Prior literature on predicting student performance
The study of dropout at universities is extensive, and dates back to the early 70s when lack of social and academic integration were identified as early causes of dropout (Tinto, 1975).The set of variables used to predict dropout are often those that are available at the start of academic study.They include demographic variables (such as gender and age), psychological variables (such as perceived confidence) and variables related to prior academic achievement.We will refer to these variables collectively as 'backpack' variables, in the sense that students carry these variables with them on the first day of the academic year.This group is in contrast to the 'early warning' variables in accounting education, which become available over time as the students progress through their first-year journey.
Before discussing these backpack variables in more detail, it is worth mentioning that dropout is just one of several types of student departure.Students can leave university education altogether, switch to another degree at the same university, or transfer to another university.Studies show that what sometimes is considered dropout is actually transfer to another university (Hovdhaugen, 2009).This is relevant in our study too, as there are several options for dropout at the end of the first year.

Gender
In terms of demographic variables, the study of gender has had mixed results.It has been found that male and female students leave university in the US for different reasons, with female students placing more emphasis on student dissatisfaction than male students (Bean, 1980).A UK study found that gender influenced the reasons for dropping out (Johnes, 1990), and a similar gender effect was found at a recent study in a Belgian university, with men more likely to dropout than women (Arias Ortiz & Dehon, 2013).Others suggest that a gender effect is hard to tease out and may be dependent on the stage in the student's career (DesJardins et al., 1999).
The impact of gender on accounting education performance has been similarly mixed.Some studies did not generally find a significant relationship of gender with first-year academic performance (Byrne & Flood, 2008).Other studies show an influence of gender on surface versus deep learning, which in turn affects performance (Everaert et al. 2017).

Age
Age is another well-known and well-studied demographic variable.Students that are more mature have a higher chance of dropping out (DesJardins et al., 1999).Non-traditional students such as older and/or working students also have a higher chance of dropping out than traditional students (Carreira & Lopes, 2019).

Prior academic achievement
Another backpack variable is high school performance.As a proxy for ability, studies have consistently shown this to be a factor.Lower ability students dropout earlier than higher ability students (Bean, 1980;DesJardins et al., 1999).Prior studies have shown that high school performance is also a strong predictor of accounting performance (Byrne & Flood, 2008;Eskew & Faley, 1988).Similar to overall high school performance, having a strong mathematical profile during high school also reduces the probability of dropping out in the early years (Arias Ortiz & Dehon, 2013).This may be of special relevance to accounting, where the requirements for maths may not be complex, but are not trivial either.Studies show mixed results for actually studying accounting in high school (Baldwin & Howe, 1982).

Self confidence
Related to these variables are student's expectations of university study and their perceived probability of success and graduation.Studies show that students with a lack of confidence in their skills and abilities have poorer academic performance in the first year (Byrne & Flood, 2008).Studies also show that students without clear educational goals may be more likely to depart (Tinto, 1988).

Early engagement
We now move on from backpack variables to variables that are collected in the early stages of university study, specifically in courses such as accounting.We will call these variables the 'early engagement' variables.The literature in this area is more fragmented than the backpack literature as the authors of these studies did not necessarily consolidate their contributions under the 'early engagement' umbrella term.We provide a number of examples of early engagement studies that we believe are relevant to our study.
One study looked at the role of frequency of Virtual Learning Environment logins and frequency of exercise completion in an IT course.The authors were able to predict pass rates of a course with an accuracy of 60.8% (Jokhan et al., 2019).This study is of interest because it looks both at early engagement (logging in) and early performance (exercise completion).
Class attendance in more general terms has also been studied and has been identified as a dropout factor, alongside general problems, such as low identification with being a 'student', and low achievement motivation (Georg, 2009).In an accounting setting more specifically, studies also looked at the impact of learning style (surface versus deep learning) and time spent by the student on academic performance (Everaert et al., 2017).Time spent during the course, and attendance of classes are all examples of early engagement with the course and wider university context.Furthermore, some instructors introduce intermediate test into their course, to provide early feedback on how the student is engaged with the course, especially in the early and middle stage of a course (Day et al., 2018).
Of note is one study which looked, among other things, at attendance of a freshmen orientation course (Murtaugh et al., 1999).This study found that those who attended the freshmen orientation course had a reduced risk of dropping out.

Early performance: exam scores of accounting courses
The final set of variables of relevance to this study are first-year examinations.These exams are taken in the early stages of the university career, and are often the students' first ever accounting exams.Studies have shown that first-year examinations at university give much higher predictive value for non-graduation than high school performance (Johnes, 1990).As these scores are markedly different from other, 'softer' early engagement variables, we will treat these early exam scores separately in this paper.
We arrive then at a set of three groups of variables: backpack data (known at the start of the academic year), early engagement and performance (known in the early months), and early exam scores (known in the later months of the first year).We hypothesise variables in each of these groups to contribute to the prediction of the final outcome at the end of the year.
We have a number of expectations regarding the relative predictive value of each of the variable groups.First, intuition would suggest that the early warning variables in accounting education have incremental predictive value, over and above the backpack variables.Second, we would expect the highest predictive value to be generated by all three groups together, as by definition, it contains the most variance in the set.Third, we would expect the variables that are the latest in the year to be the most predictive, as they have the shortest window of opportunity to change the university progression.
The next section will discuss the details of our methodology and the setting in which our study took place.

Methodology
In this section, we will review the educational setting used in this study.We will also provide more information about the timeline of events that were included in the study.Next, we will offer information about the sample studied in this article.A description of the variables and the classification method concludes the section.

Setting
Many studies in accounting education are conducted in selective Anglo-Saxon institutions, where freshman students pass rigorous selection procedures based on academic and non-academic criteria (e.g.prior scores on national standardised exams, personal statements or selection interviews).In contrast, the current study was conducted at a large university in Belgium.The transition to higher education in Belgium does not require formal screening and, (apart from a high school diploma), it does not have admission criteria (Pinxten et al., 2019). 1 In Belgium, higher education is completely publicly financed with negligible tuition fees (i.e.under 1000 euro per year at the time of writing) (Broos et al. 2020).Access to higher education is open, and there is no 'selection at the gate' procedure.As a consequence, there is a large degree of heterogeneity of incoming students in terms of prior knowledge, attitudes and skills (Pinxten et al., 2019).A substantial number of students enrol with a weak mathematical background.This issue is particularly challenging for business economics programmes given that mathematics and statistics are among the main courses of the first-year programme, alongside economics and accounting.
Our study was conducted with data from the freshman undergraduate year during the academic year 2018-2019.This is the last full pre-Covid-19 year.Students were enrolled for the first year in economics, business economics or commercial engineering.In the first year, no distinction is made between these programmes. 2 In both the first and second semester, two financial accounting courses are scheduled, called Accounting A and Accounting B respectively.

Timeline of events in the first year
Figure 2 summarises all educational activities in accounting during the first year.Every year in September, a week before the start of the academic year, a pre-session for accounting is organised.For four days, voluntarily participating students follow an introductory accounting week.At the end of the pre-sessional accounting week, the academic year starts with a Welcome day and then the first semester starts with 12 weeks of classes.At the end of the semester, there is a study period of four weeks and then, the written exams (including Accounting A) take place over four weeks.Immediately following the exams, there is a one-week spring break, after which the second semester begins.During spring break, students receive their grades from the first semester.In February, the second semester begins.The second semester includes 12 weeks of classes and 4 weeks of study.There are final exams (including Accounting B) at the end of the second semester.For students who fail, there is a retake opportunity in August for both courses of the first (e.g.Accounting A) and second semester (e.g.Accounting B).
In this study, we take into account five measurement-moments during the academic year: (1) participation in the pre-sessional, ( 2  weeks Accounting A in the first semester, (4) exam for the course Accounting A in January 2019, (5) exam for the course Accounting B in June 2019.Of course, student who pass do not attend the resit exams.Resit data is not included in this study because of endogeneity concerns, and because this data becomes available too late in the journey to be meaningful.
A pre-sessional on accounting is organised during the week prior to the academic year.Data on attendance at this pre-sessional was collected by the administration.Attendance at this pre-sessional is an indicator of early engagement and may predict students dropout, even at this early stage.The intermediate tests are taken respectively in the first and the last week of November.The tests are addressing the material covered in the preceding weeks.The tests are online and consist of both open and multiple-choice questions.Participation is voluntary, but rewarded with partial course credit, regardless of the accuracy of the answers.Specifically, students can earn 1 point out of the 20 on their exam upfront by completing the two tests.Students get four days per test to complete them at home and after each test feedback is given in a plenary session with the possibility of asking questions afterwards.
As summative assessments, all accounting exams consists of four extended exercises (2.5 hour of exam time) and 20 multiple-choice questions (1.25 hour of exam time).Both parts of the exam are organised back-to-back.
At the Welcome day, in the beginning of the first semester, a short survey was administered.We asked students about the number of hours in mathematics in the last year of high school.In addition, we also asked students about their perceived probability of success (How high do you estimate your own success rate for the first year at university?).Students had to choose one of the alternatives to this question: 0-30%; 31-50%; 51-70%; 71-85; 86-100; no idea.In the beginning of the second semester another survey was administered during official class time, providing data on the backpack variables.In this second survey, we asked students about their high school graduation percentage (What was your graduation percentage in your senior year of high school?).Students had the option to fill in their percentage or to check the box that they had forgotten or did not want to share.The questionnaires were administered during official class hours at the university.One of the authors was present during the collection process.
Data from different surveys, test results, and exam results were aggregated and combined using unique university student identifiers.All respondents gave their informed consent and their permission to link their results and data.Students were anonymous throughout, and data was processed confidentially and for research purposes only.

Sample
Our sample consists of 609 students who all commenced a study of business economics at a large Belgian university in the academic year 2018/2019 (see Figure 3).Students who had to retake the course and consequently were also enrolled in the previous year were removed from the source data, as this study aims to predict dropout of new firstyear students.As a consequence only 'new' first-year students were included in the data.All students had two accounting courses in their first year of study (accounting A and B) for which a different number of students participated.

Measurement of the variables: dependent variable
Our dependent variable in this study is first-year university progression.The first-year university progression is coded into three discrete values: dropout, repeat year, and pass.Dropout means that the student left the university.Repeat year means that the student did not quality for pass, but decided to take a repeat year, attending the courses with a mark less than 10 out of 20.Finally, pass means that the student qualified for progression (since they earned a credit for all courses).Of the 609 students, 192 students dropped out, 166 took up a repeat year, and 251 students progressed through to the second year.

Measurement of the variables: independent variables
To predict the first-year university outcome for every student, we use three groups of independent variables.The first group is a set of background variables, or 'backpack' variables.The second group is a set of early warning variables to do with early engagement and performance in accounting education.The third group contains the exam scores of the first two accounting exams in the first year.
The backpack variables that we were able to obtain in this study were gender, high school performance, hours of maths per week in high school, and perceived likelihood of graduation.
Gender was collected from the administration, coded as 0 for male and 1 for female.
High school final performance is measured as a percentage.This performance was selfreported by the students in the February survey.As this survey was voluntary, we were only able to record 365 datapoints.
Contact hours of maths per week in high school was a self-reported measure captured at the Welcome Day.The number of datapoints for this measure is 440 as not all students attended the Welcome Day.
Probability of success describes the perceived probability of success, administered as a 5-point categorical variable (0-30%, 31-50%, 51-70%, 71-85%, 86-100%), collected at the Welcome Day.This measure is included as a 'backpack' variable because conceptually it is also a data point that is inherent to the student before university education starts.
Age: The dataset only includes students who just finished secondary education (at 18 years of age).Older and/or working students are not present in the current dataset.Therefore this variable was excluded from the analysis.
The early engagement variables included in this study were attendance to the pre-sessional and the scores on the two intermediate tests.
Pre-sessional attendance is one of the earliest signals of engagement that can be captured.The pre-sessional was voluntary and open to all students, took part in the Summer of 2018, lasted four days, and dealt specifically with concepts also covered in the first accounting courses.
Test 1 and Test 2. Both tests took place in the middle of the first term.They were formative assessments, and were scored on a scale of 0-1.
The third group contains the exam scores of the financial accounting courses.
Exam scores were collected for both Accounting A (1st semester) and Accounting B (2nd semester) courses.We included here the earliest accounting exam scores (i.e. of January and of June) and leave out the resit exam data.The exam scores were scored on a scale of 0-20.

Classification methods
The next section will also cover the results of our classifications.As mentioned in the introductory section, we used a machine learning approach to classify the results.This approach follows procedures as described in Geron (2019).The steps in this procedure first involve splitting the sample into a training and a test set.The training and test sets are stratified meaning that the percentages of the different outcomes are approximately the same in the training, the test and the total set.Given the relatively low sample size of our initial set, we used a split of 60/40 of the data.The more typically used 80/20 split (Geron, 2019) would generate a test set that would be too small to be credible.We then trained the classification algorithm on the training set (60% of the total set).This training leads to a classification function based on the independent variables.This function is then applied to the test set (40% of the data), and a measure is calculated on the basis of how many times the algorithm correctly classified an outcome in the test set.
The random forest algorithm (Breiman, 2001;Ho, 1995) was implemented using the SciKitLearn library 3 (Pedregosa et al., 2011).The algorithm was used in this study with its default parameters.Experimenting with these default parameters (a process also known as hyper tuning), would potentially have led to better results, but would also have required another test set, as we would be optimising on the test set.

Results
We will start the results sections with an overview of the descriptive results.Next, we will discuss the three research questions, namely: (1) To what extent can early warning signals in accounting education help to predict progression of first-year university students?(2) Are early warning signals in accounting education better able to predict progression than more general 'backpack' data?(3) Can a machine learning approach be used effectively to classify outcomes using both backpack and early signals in accounting education?

Descriptives
In total 609 students are included in this study, 66% male and 34% female.As shown in Table 1, in total 41% passed all courses, 32% left the university (dropout) and 27% started a repeat year with one or more courses to repeat.
Panel B of Table 1 shows the perceived probability of success by outcome and offers some insights in the students' ability to accurately assess their probability of success: 113 students who dropped out prematurely thought they were more likely to succeed than not (i.e.estimated their probability of success higher than 51%).
The descriptives of the continuous variables are shown in Table 2.It shows the mean, the variance and the number of students in each of the groups (A = Drop-out, B = Repeat Year and C = Pass).The high school performance is highest for the pass group (75%, rounded).The high school performance of the Repeat Year and Drop-out group is almost the same (70% rounded).Also, the contact hours of maths per week in high school shows the highest average number for the pass group (6.4), compared to 6.0 hours for the repeat year group and 5.7 hours for the drop-out group.
The second group of independent variables are all indicators of early engagement in accounting education specifically.They are pre-sessional attendance and two early test scores.Panel C of Table 1 crosstabulates pre-sessional attendance by outcome.For students who attended the pre-sessional 47% passed, while for students who did not attend the pre-sessional only 39% passed.Table 2 describes the mean scores, and their variance for Test 1 and Test 2. Both tests took place in the middle of the first term.They were formative assessments, and were scored on a scale of 0-1.Students who passed scored higher on both Test 1 and Test 2. The number of students providing their probability of success drops substantially due to missing values.This question is not straightforward and many students ticked the 'don't know' option.
The third group of independent variables involve exam scores of the financial accounting courses.Exam A score is as the end of the first semester, Exam B score is at the end of the second semester.These exams were scored on a scale of 0-20 and show that the mean score for Exam A is higher for the students in the pass group (mean = 14.19), compared to students in the repeat year group (mean = 10.81).In addition, this last group scored higher than students in the drop-out group (mean = 8.72).A similar pattern is found for the mean score for Exam B (mean = 14.10 for pass, mean = 9.15 for repeat year, mean = 6.76 for drop-out).Of interest is the very low mean for the drop-out group for the Exam B score.
The next section will cover the results of the statistical tests that we performed on each individual variable, to examine if the independent variables are significantly different among the three groups with different university progression.

Univariate statistics
We first examine if any of the individual independent variables produces statistically significant different values for each of the three first-year university outcomes (drop-out, repeat year, pass).For the nominal independent variables we use a χ 2 test (showing whether there is equal distribution of the values in the different groups), and for the metric variables we use an ANOVA test.Table 3 presents the results of each test.
These tests provide a number of useful insights.The gender balance is not statistically significant in each of the groups.Also, the pre-sessional ratio is not statistically significant in each of the groups.As these variables are not statistically significant in the groups, it is not likely that they will have much value in predicting the group as a backpack or early warning sign, as will be discovered later on.
In contrast, all other values show promising significance across the outcome groups.Combining these results with the descriptive tables, we may start to picture the groups as follows.Those that dropout (Group A) have lower high school performance, have had fewer contact hours of maths in high school and perceive success to be less probable.In terms of accounting education, they also score lower on both tests and score lower for the two exams than the other groups.Contrary, students that pass (Group C) have higher high school performance, have had more contact hours of maths in high school and perceive success to be more probable.In terms of accounting education, they score higher on the tests and exams than the other groups.Group B falls between these two groups.

Machine learning results
We now report the classification results.Table 4 summarises the results of six models.Each model is a different combination of the three type of variables that are collected along the journey (backpack, early engagement and exam scores).To provide a clear indication to which period the variables refer, the week numbers are indicated in the first column.For instance, model 0 only includes the backpack data, i.e. using the student characteristics in week 0. Model 1 only includes the early engagement data when the student entered the university, i.e. including information of the student, gather from weeks 1-6.Similarly, model 2 includes the early engagement data from weeks 1-9 (i.e. also including the scores on the second test, which means that the student is 4 weeks later along the journey than model 1).Model 3 includes the early engagement data, as well as the early accounting exam scores (i.e.collecting information from week 1 till week 20).The student is now at the end of the first semester (January).Model 4 adds the exam scores at the end of the second semester (June), including information from week 1 till week 40.Finally, model 5 includes all variables from week 0 to week 40, i.e. the backpack data, the early engagement variables and exam scores on both accounting classes.
We report, for each model, the total sample size for which we have data and how many of the students were put in the training and test sample.For both samples, we excluded any student with a missing value.As some of this data was captured later in the academic year (for instance the backpack data), sample sizes vary for each model.
We then report two F 1 scores.The first one is a naïve baseline, and calculates how many false positives and false negatives we would have if we used the mode (i.e.most common) value for each student.As the mode value is always 'pass', this would effectively be the same as predicting that each student would pass.The naïve F1 does not have much value in and of its own, but is useful as a basis for comparison against other classifiers.For instance, in model 1, using the prediction that 'all students will pass', only one in four students is correctly classified.Any classifier that we discuss next should at least improve on this baseline performance.
The next F 1 score is calculated when we apply the random forest algorithm on the training set, and use the resulting classification function on the remaining test data.For instance, for model 0, the random forest classifier can classify about 1 in 2 students correctly.For model 1 and 2, the random forest classifier can classify 33% and 38% (rounded) correctly to one of the three outcome groups.This is substantially higher than the so-called benchmark (naïve baseline) of 25% for each of the two models.In model 3, where the exam score of the first accounting exam is included, the random forest classifier classifies 55% of the students correct (based on only four variables), compared to the benchmark of 26%.When including also the exam score of the second accounting exam, 61% of the students are correctly classified (model 4), which cannot be improved by adding the backpack variables back in (the final model 5).Finally, the last column of Table 4 shows the predictive power for each of the individual variables, provided by the random forest classifier.The random forest function is able to provide a weighting of variables, with the highest weightings having the most predictive value (this is called feature importance).For instance for model 0, the last column show that for the backpack variables, the percentage of high school is having the highest distinguishing power (i.e.58%) to predict the correct outcome groups, whereas the number of mathematics hours in high school only has 19% of predictive power.It is interesting to note that the predictive power is a relative measure and is changing, depending on the number and type of variables that are included in the model.For instance, the percentage of high school only has a predictive power of 14% in model 5, because many other variables are included.

Summary of results
The first research question posed to what extent early warning signals in accounting education can help to predict progression of first-year students.As mentioned before, the early warning signals include both early engagement in the course, as well as early exam performance.Given that backpack data may not be readily available, data collected in the first accounting course is more readily available.To answer this research question, we look at models 1, 2 and 3, using data from the pre-sessional and Test 1 in model 1, which is expanded with Test 2 in model 2 and with the exam scores for the first semester course Accounting A in model 3.For model 1, we arrive at a classification result of approximately 33% (including data from weeks 1-6), which is better than the naïve estimation of 25% (bench mark, where we assumed that all students would pass).For model 2, the results show a classification result of 38% (including data from weeks 1-9), again better than the benchmark of 25%.For model 3, the results are improved, with 55% of the students classified correctly, and this model includes the exam scores of the first-semester course Accounting A. Thus, early engagement and early performance data can correctly classify 55% of all students.Knowing that model 3 can be constructed at an early 'calling station' in the student journey (i.e.before the end of the first semester), this result is promising.By looking only at data collected during the financial accounting course, more than half of the students can be classified in the correct outcome groupbearing in mind that these outcomes become reality more than eight months later.
Looking at the type of signals, the data show that the attendance to the pre-sessional are doing less well in predictive power, compared to the test and exam results.This is not surprising given that the frequency distribution of pre-sessional attendance was not significantly different among the outcome groups.In model 3, both test results as well as the exam performance have almost equal predictive power (Test 1: 31%, Test 2: 31%, Exam Accounting A: 34%).This shows that none of the three early warning signals in accounting have dominant predictive power, but all contribute equally in predicting progression of the student for the programme as a whole.
The second research question asked whether the early warning signals in accounting education are better able to predict progression than the more general 'backpack' data.The data show that the early warning signals in accounting education are just as effective at predicting the outcome variable than the backpack variables, but not substantially better.As shown in Table 4, model 3 (including early engagement and early exam performance) leads to a 55% accuracy, and model 0 leads to a 52% accuracy.
Looking at the predictive power of the individual variables in model 0, the high school scores, maths hours, and probability of success all contribute to this result, with high school performance as the predictor with the highest feature importance (58%), followed by hours of mathematics (19%) and self-confidence (14%).
To answer research question 2, we elaborate on two additional insights.What happens with the comparison of the backpack model, (1) if we are including only the very early signals and ( 2) if we are including also the exam performance of the second semester accounting course?When we compare model 0 (backpack data) with the early engagement variables, without considering the exam score, the superiority of the early warning model above the backpack model does not hold any longer.Here we can conclude that data from the pre-sessional attendance and the two tests (model 1 and 2) are not yet sufficient to match the backpack model (model 0).Going the other direction, when we compare model 0 (backpack data) with the early engagement and all accounting exam scores (model 4) the superiority of the accounting data above the backpack model holds and the difference in explaining power even increases (model 4: 61% versus model 0: 52%).Even before students have the opportunity to do a resit exam, the accounting engagement and performance data collected in week 1 till week 40 can classify 61% of the students in the correct group of dropout, repeat year or pass for the whole programme.
Looking at the predictive feature importance of model 4 reveals interesting results.At this point in time, we are at the end of the second semester.Many students (who failed for a few courses) need to take now an important decision whether or not to participate for the resit exams.The random forest machine learning algorithm highlights that the latest signal (exam score of Accounting B in June) provides 33% of the predictive value, and Accounting A (exam score in January) provides 19%, while the test score 1 and 2 provide a relative predictive power of 20% and 22%.
The third research question addressed whether a machine learning approach can be used effectively to classify outcomes using both backpack and early signals in accounting education.As shown in Table 4, we have the best fit with the final two models.The last model (model 5) is a combination of the backpack variables, early engagement variables and exam scores, providing 61% accuracy.The backpack variables do not add very much over just early engagement and exam scores alone (see model 4), resulting in a similar accuracy of 61%.It is worth noting that the number of students drops in this model, because not all backpack data were available.Combining all available variables (model 5) shows that the only backpack variable that holds its ground is high school performance (with a predictive power of 14%), whereas the hours of maths drops to 3%.This is a remarkable result since the number of hours maths per week is considered to be the best predictor for success in this institution.Also in model 0, with the backpack data only, number of hours maths per week in high school was contributing far less than performance in high school.Hence, the machine learning approach provided new insights into the different models and independent variables.
To further illustrate the machine learning approach, Table 5 presents, by way of illustration, the classification tables for the Test Set in Model 4 and 5.The diagonal presents all actual outcomes that were correctly predicted.The numbers off the diagonal all represent predictions that were different to actual outcomes.

Ordered logistic regression
To show the differences between machine learning and a regression, an ordered logistic regression was run for each of the six models, as shown in Table 6.The dependent variable is the outcome variable (ordered from dropout, repeat year to pass) and the independent variables are the predictors as used before in the different models.This logistic regression is applied to the whole dataset.The results show which variables are statistically significant in each of the models.The Pseudo R² shows how much of the variance in the progression variable is explained by the independent variables.The results show that model 1 and 2 have rather low R².In model 3 and 4, the explanatory power of the model as a whole, measured by the Pseudo R², is much higher (47.8% and 54.5%).These models also include exam scores and this seems to increase the explanatory power of the model as a whole.Model 5, which is a combination of all backpack, early warnings and exam scores show the highest R² (56.6%).
When looking at the independent variables, the results show that in model 1 (and in model 2), the Test 1 is significant in explaining the outcome variable (p = .000).Interesting is that in model 2, the Test 2 scores is not significant (p = .281).When adding exam scores for Accounting A in Model 3, Test 1 is no longer significant, while the exam score is significant (p = .000).Similarly in model 4, the two independent variables of exam score for Accounting A and Accounting B are significantly related to progression (p = .000each time), while the test scores are both insignificant.In model 5, high school percentage, hours of mathematics and the two exam scores are significant (p = .004,p = .016,p = .024and p = .000respectively).In sum, the different ordered logits provide insight into the significance of independent variables separately.However this analysis does not show the relative importance of the predictors.For instance, in model 5, high school performance and hours of maths per week are both significantly related to the outcome variable, however it is not clear from this analysis which of the two variables is having higher predictive power.Contrary, in the machine learning approach, the Random Forest classifier provides the predictive power for each of the independent variables, as discussed above and as shown earlier in the last column of Table 4.
Concluding the result section, we can summarise that machine learning is helpful and effective in predicting dropout.The accuracy is higher than for a simple baseline approach and the predictive feature importance provides information that can not be collected by a logit regression.

Discussion and conclusion
In this study, a machine learning approach is used to predict first-year university outcomes of business economics students, using backpack data, engagement data and accounting exam scores for two consecutive semesters.Outcome is considered as a three-level variable, with the levels dropout (i.e. the student leave the programme), repeat the year (i.e. the student has at least one course for which he/she has no credit received) and pass (i.e. the student has passed all 15 courses in the programme).The backpack data is available at the point where students embark on their journey of the first undergraduate year at the university (here gender, hours of maths, high school performance and perceived probability of success).The engagement data and the exam scores are coming from the financial accounting courses and become available along the journey of the first year.The main question is whether these accounting data can predict the outcome for the student of the whole programme.Early engagement data include participation in a pre-sessional in accounting and two midterm test scores.Early performance data include the exam score for the first semester course in financial accounting.At the end of the second semester, the exam score for the second semester course in financial accounting become available.Then the resit period start, for which no data is collected to avoid endogeneity concerns.The objective of this paper is to evaluate whether these early warning signals (engagement and performance) can predict progression (RQ 1), whether the early warning signals are better or worse than the backpack data in predicting the outcome (RQ2) and whether machine learning can be used to classify outcomes (RQ3).In the next paragraphs the eight key results are summarised.
First, the descriptive statistics show that 41% of the students included in the current sample succeed in their first year.This is in line with previous results in similar university settings (e.g.Broos et al. 2020, Arias Ortiz & Dehon, 2013).Furthermore, 27% of the students do a repeat year and 32% of the students drop out.This high dropout rate highlights that student progression should remain high on the agenda of universities.This study found that dropouts have lower high school performance, have had fewer contact hours of maths in high school, and perceive their success to be less likely.In terms of accounting education, they also score lower on the tests (in week 5 and week 9) and on the exams (in week 20 and week 40) than the other groups.In contrast, students who succeed perform better in high school, have had more contact hours of mathematics in high school, and consider themselves more likely to succeed.In terms of accounting education, they score higher on both tests during the semester and on both exams than the other groups.This is in line with expectations.
Second, in answering research question 1, the different models of the machine learning analysis show that the early warning signals (i.e.early engagement and early exam scores) on accounting can predict outcome of first-year university students.Compared to the benchmark of 25% correctly classifying, the random forest classifier was able to classify 33% of the students correct, based on data that becomes available in week 5 of the first semester (i.e.Test score 1 and attendance to pre-sessional).The accuracy of the classifier improves to 38% when including the Test score 2 (week 9) and further improves to 55% at the end of the first semester when including the exam score for the first semester accounting course.
Third, in answering research question 2, the models also show that the use of only early warning signals has more or less the same predictive value compared to using only backpack variables.This information is useful for both the teachers and for the students themselves.Based on this information, teachers can use a variety of remedial or corrective strategies to deal with at-risk students and provide them with assistance to improve their learning (Chen et al. 2021).Targeted encouragement and advice can contribute to study success, and encouraging exploration of alternative study paths may also be beneficial.Students may not lose an entire academic year when faster reorientation is possible.However, since only one in two are affected, accounting educators will still need to invest in obtaining learning data and analytics to take full advantage.
Fourth, it is worth exploring whether the prediction information can be provided to the students themselves in a manageable format, subject to appropriate caveats.As suggested by Broos et al. (2020), information about students' journeys could be placed in an online learning analytics dashboard, as it could alert students and help guide the learning process (Verbert et al., 2013).This would be especially interesting given the limited resources available to collect backpack data and the scarcity of available indicators early in the first year of study.Student advisors and faculty members could use this information to identify at-risk students who they should reach out to (Broos et al., 2020).
Fifth, the different models show that the prediction increases when more accounting data becomes available along the journey.Our findings of the machine learning technique indicate that the accuracy of the models improves when newly collected information is added.For instance, adding data on test 2 increases prediction accuracy.Adding the results of the early exam scores for the first semester course further increases the accuracy of the prediction.Finally, adding the results of the exam scores for the second semester course, increases the accuracy of the prediction to 61%.These findings suggest that early warning signals from accounting education specifically are valuable additions to solving the challenge of predicting first-year university outcomes.
As a sixth point, it is of interest to see that the closer the recording of the variable is to the end of the year, the more predictive it becomes.In other words, and perhaps unsurprisingly, the 'later' the early warning, the more predictive the variable.These 'late warning' variables are more predictive, but, equally, they are less useful given that any remedial window of opportunity will be smaller at the time these values become available.In addition, adding the backpack data at the final data collection point is not increasing the accuracy of the models.
When answering research question 3 (point seven), our findings indicate that machine learning can be used to classify outcomes using both backpack and data from accounting education.Each of the five models result in higher accuracy than the benchmark of the naïve baseline.In addition and as mentioned, the accuracy of the models improves when newly collected information is added.For instance, adding data on test 2 increases prediction accuracy of 33% (model 1) to 38% (model 2).Adding the results of the early exam scores for the first semester course, increases the accuracy of the prediction to 55% (model 3).Adding the results of the exam scores for the second semester course, increases the accuracy of the prediction to 61% (model 4).
Eight, the classification performance of random forest suggests that in the current study about 60% of the students' journey can be correctly predicted by the set of independent variables.This is supported by the ordered logistic regression analyses.However, machine learning with random forest also provides information about the relative importance of each of the variables.Consequently, this study can provide guidance as to which parameters have the best predictive value and which parameters should be surveyed or collected during the first semester at the university.When considering the predictive strength of the predictors in each of the models, the predictive value of the predictor changes along the journey.For instance, when including backpack data with the data from the two accounting courses, only the accuracy of high school performance is holding its predictive ground, and all other backpack data, including gender, probability of success and hours of maths in high school, recedes to the background.Similarly, when exam data become available on Accounting B, the predictive value of the exam score for Accounting A drops.
In summary, this leads to the following conclusions: 1. Early warning variables and early exam scores in accounting education have similar predictive value compared to the backpack variables gender, high school performance and self-confidence to predict among fail, repeat and pass (Model 3 compared to model 0). 2. Variables in the latest part of the year (i.e.including exam scores for accounting) have more predictive power than the early warning variables to predict among fail, repeat and pass (model 3 and 4 compared to model 1 and 2).3. The combination of early warning variables in accounting and all exam scores for accounting have the highest predictive power to predict among fail, repeat and pass.Adding backpack data at that point in time does not increase accuracy (model 4 compared to model 5).
There are some relevant caveats to make, because not all early warning variables considered were predictive, in particular pre-sessional attendance.It is possible that this signal is simply too early in the year, with any potential issues not having had time to manifest themselves.Moreover, because this is a binary variable, it may also carry insufficient variance to make any meaningful predictive impact.In terms of gender, our research concurs with most previous research in that no direct impact could be found (Byrne & Flood, 2008).It is possible, and an open question, that gender may have an indirect impact on other variables.

Limitations and future research
There are some limitations to the generalisability of this study.First, a limitation is that we did not take into account all potentially relevant backpack variables in this study.Age was not included as a backpack variable because the dataset only includes similarly aged students who just finished secondary education.Older and/or working students are also not present in the current dataset.Furthermore, previous studies found that dropout rises depending on peer group availability (Johnes & McNabb, 2004) and ethnical background (DesJardins et al., 1999).Students who were the first in the family to go to university are also more likely to dropout than those with parents who did go to university (Ishitani, 2003).Other studies have also pointed to university-specific variables such as the advice offered, the transparency of exam regulations, and the quality of teaching (Georg, 2009).There has been less research on these university-specific variables, possibly because most empirical studies use students from a single university, where these factors would be invariant.Second, the context in which the research took place should be recognised: the study covered student data from Belgium, and some relevant aspects specific to this setting may not translate to other contexts.These include the student demographics (mostly local students with few outliers) and the national university entry approach (admission without selection, and comparatively high first-year dropout).The validity of the conclusions may not hold in settings that are substantially different from this context.In particular, universities with gated admissions are likely to find the level of dropout much less severe than in our study.
There are some interesting avenues for future research in the study.Given the success of some early warning variables, we encourage further researchers to seek out more indicators of engagement and performance.With virtual learning environments now pervasive in university education, perhaps more evidence of digital engagement can be collected from these environments and used.Another line of further research might be in the direction of courses other than accounting in the first year.Given the success of accounting-specific performance variables, perhaps other mid-term performance in other courses will be equally informative.Furthermore, other variables might be included.The arrival of first-year students at university is often accompanied by several challenges to adjust and integrate, such as leaving home, making new friends, handling academic expectations and developing new learning styles (Conley et al., 2014;Vinson et al., 2010).These struggles with academic, social, and personal-emotional adaptation relate closely to students' learning experiences and academic satisfaction (Baik et al., 2019), and eventually their academic performance (Petersen et al., 2009) and university degree completion (Holliman et al., 2018).
This study is an extension of previous research in accounting education on the use of machine learning on dropout rates.This research also contributes to the further study of appropriate prediction methods for predicting student academic performance in the first year.It would be interesting to use second year data to examine the effects of multi-year data on prediction accuracy, using the same machine learning techniques.This could provide instructors with even more information to guide and teach students.Moreover, in addition to first-year students, it would also be interesting to see if we have the same results from more mature students, such as those in the continued professional education programmes of professional bodies.We invite colleagues to extend our study in a broader context.
In conclusion, this study has highlighted the value of early warning signals in accounting education to predict the type of student journey in the first year of university study.In doing so, this study is one of the first to focus specifically on accounting performance as a type of early warning.We hope that these promising results will encourage other researchers to look in closer detail at early warning signals and the type of outcome that they may predict.

Figure 1 .
Figure 1.Flowchart of the machine learning approach.
Figure2summarises all educational activities in accounting during the first year.Every year in September, a week before the start of the academic year, a pre-session for accounting is organised.For four days, voluntarily participating students follow an introductory accounting week.At the end of the pre-sessional accounting week, the academic year starts with a Welcome day and then the first semester starts with 12 weeks of classes.At the end of the semester, there is a study period of four weeks and then, the written exams (including Accounting A) take place over four weeks.Immediately following the exams, there is a one-week spring break, after which the second semester begins.During spring break, students receive their grades from the first semester.In February, the second semester begins.The second semester includes 12 weeks of classes and 4 weeks of study.There are final exams (including Accounting B) at the end of the second semester.For students who fail, there is a retake opportunity in August for both courses of the first (e.g.Accounting A) and second semester (e.g.Accounting B).In this study, we take into account five measurement-moments during the academic year: (1) participation in the pre-sessional, (2) intermediate Test 1 for the course Accounting A after five weeks in the first semester, (3) intermediate Test 2 after nine

Figure 2 .
Figure 2. Timeline of the research design.

Figure 3 .
Figure 3. Flowchart of the sample.

Table 1 .
Frequency table: outcome split by gender, probability of success and pre-sessional attendance.

Table 2 .
Descriptives and group means depending on outcome (A = Drop-out, B = Repeat Year, C = Pass).

Table 3 .
Testing with χ 2 and ANOVA if the three outcome groups have statistically different proportions (χ 2 ) or levels (ANOVA) in the respective variables.

Table 4 .
Machine learning using the classification method random forest.

Table 5 .
Classification Table with actual and predicted outcome.

Table 6 .
Ordered Logistic regression on total sample.