Decision Support

Uplift modeling is an approach for estimating the incremental effect of an action or treatment at the individual level. It has gained attention in the marketing and analytics communities due to its ability to adequately model the effect of direct marketing actions via predictive analytics. The main contribution of our study is the implementation of the uplift modeling framework to maximize the effectiveness of retention efforts in higher education institutions i.e., improvement of academic performance by offering tutorials. The objective is to improve the design of retention programs by tailoring them to students who are more likely to be retained if targeted. Data from three different bachelor programs from a Chilean university were collected. Students who participated in the tutorials are considered the treatment group, otherwise, they are assigned to the nontreat-ment group. Our results demonstrate the virtues of uplift modeling in tailoring retention efforts in higher education over conventional predictive modeling approaches.


Introduction
Student dropout is a genuine concern in private and public institutions in higher education because of its negative impact on the well-being of students and the community in general. Early desertion in undergraduate programs not only causes monetary losses to educational institutions in terms of tuition fees paid either by the students or by the state through scholarships, but also social costs.
Universities have focused in the past years in the design of retention campaigns as a means to prevent student withdrawal. Dropout arises from different context-specific academic and nonacademic factors. Academic achievement and institutional habitus [1], as well as demographics, social interaction, financial constraints, motivation and personality, play a vital role [2]. Each risk factor can be addressed in numerous ways such as academic assistance (e.g., tutoring, counselling and mentoring [3,4]), social engagement and individual attachment to the institution [5][6][7], purpose for completing school (e.g., vocational education, part-time job placements, internships), and financial assistance.
The success of retention campaigns is subject not only to appropriately understanding the factors associated to student attrition, but also to accurately identifying and targeting students who are most likely to respond to interventions. Unlike research on factors related to student withdrawal, this paper aims to extend the current student dropout literature by introducing uplift modeling as a decision-making tool to support the design of student dropout prevention strategies.
Uplift modeling is a predictive analytics technique that estimates the effect of a treatment on the behavior of an individual. Conventional predictive models for churn prediction aim at identifying and targeting individuals who are more likely to attrite. However, targeting on the basis of risk does not consider that each individual responds differently to retention strategies, as risk of dropping out and sensitivity to the intervention are not necessarily related [8].
This study proposes a novel framework for preventing student attrition using uplift modeling. Our main contributions can be summarised as follows: • We apply uplift modeling to the student dropout problem. Two special considerations are made: addressing self-selection bias and the low risk of triggering student attrition. The former refers to the design of the retention campaign as the university makes an open call to all students who want to participate. The latter alludes to the low risk of targeting students with tailored programs. program designed to improve academic performance and engagement. Although the program has a positive effect on student retention, we show the benefits of designing a customized program i.e., targeting only students who are the most likely to be retained by the program.
• Model comprehensibility is addressed by segmenting students according to their estimated uplift, and later observing the characteristics of each segment. This allows us to gain insight into the application and develop better retention policies. The analysis of the variables that are relevant for defining a segment of students to target is discussed in the empirical section.
The remainder of this paper is organized as follows: A literature review on student dropout is presented in Section 2. The proposed framework for student retention using uplift modeling is presented in Section 3. Experimental results are given in Section 4. Finally, Section 5 provides the main conclusions, while also addressing future developments.

Prior work on student dropout
Student dropout in higher education has been studied for several years. Dropout occurs when an individual enrolled at an educational institution decides to voluntarily abandon studies [9,10].
The foundations of the research on student dropout are established since the 70's and 80's by [9][10][11][12], whose approaches are still used nowadays as a starting point for new developments [13][14][15][16][17]. [9] proposed an interdisciplinary approach by using psychological variables to model student attrition as the interaction between a student and the educational environment. Later, [10] proposed a parsimonious model that reflects the relation between prematriculation attributes and the interaction with the environment (academic and social systems). Last, [12] extended the former approaches by incorporating additional elements related to the interaction between the students and the educational institution.
The early work on student dropout stimulated different research approaches that were widely discussed in the last decade. A first research stream refers to the time of attrition, since associated attributes may vary throughout the academic program [16]. That is, the scope may include freshman to sophomore years [18], from sophomore to junior years [19] or different predetermined periods [17]. By contrast, other studies do not focus on the time perspective, but rather analyze the dropout phenomenon from a systematic standpoint [20].
Furthermore, studies discussing factors associated with dropping out, such as the influence of socioeconomic determinants, have yielded contradictory results. While [15] confirmed the influence of gender in student attrition [21] did not find a significant relationship. Similarly, [17] claimed that low-income students are less likely to abandon their bachelor program, in contrast to [22], which suggested that this group has a higher risk of churn. These divergences may indicate that student dropout depends largely on contextual elements.
Predicting student dropout via statistical and machine learning techniques has gained increasing attention, leading to a rich research area known as learning analytics or educational data mining (EDM) [23]. Recent studies on student dropout mainly focus on applications of different machine learning techniques for this task. Examples include semi-supervised learning [24], unsupervised learning [25], and ensemble learning [26].
As mentioned above, the majority of machine learning applications on student dropout prevention have focused on identifying students with high propensity to attrite, i.e., targeting those at high risk. However, to the best of our knowledge, no study has focused on customizing the assignment of retention actions to students on the basis of their expected sensitivity to an intervention. Therefore, we propose uplift modeling as a tool to support the design of student dropout prevention strategies.

The uplift modeling framework and student dropout
This section formally defines uplift modeling, and subsequently presents the main approaches for estimating and evaluating uplift models. At the end, we discuss some considerations that must be addressed when applying uplift modeling to the context of student dropout.

Uplift modeling
Uplift modeling is a predictive analytics technique that estimates the individual treatment effect (ITE), i.e., the effect of an action or treatment on an outcome of interest. This task differs from the estimation of average causal effects, since it considers that causal effects vary with observable characteristics. Uplift modeling is analogous to the problem of treatment effect heterogeneity [27] and individualized treatment rule estimation [28], as it aims to determine the degree to which treatments have differential causal effects on individuals. The goal is to customize the assignment of treatments by prescribing the action that maximizes a given objective. Therefore, uplift models identify individuals for whom the exposure to an action is expected to lead to a favorable outcome.
Uplift modeling has been applied in a wide variety of domains. Initial applications mainly focused on maximizing the effectiveness of marketing campaigns [29][30][31][32]. Nonetheless, uplift modeling has received also attention in the fields of personalized medicine [33], and price optimization [34,35].
Formally, uplift models predict the ITE in terms of the potential outcomes framework [36]. The problem consists of learning based on a sample of N students independent and identically distributed, whether student i should be treated (i.e., take the tutorials), given the set of pretreatment characteristics, X i , the binary indicator of treatment, T ∈ {0, 1}, i.e., T = 1 meaning treatment, and the binary outcome variable, Y ∈ {0, 1}, where Y = 1 represents no attrition. The potential outcomes, Y i (1) and Y i (0) , are the future states of the outcome for the ith student with and without the treatment, respectively. Then, the ITE of treatment against nontreatment on Y is the difference between the two potential outcomes, . Since the ITE varies with observables characteristics, it can be defined in terms of the conditional average treatment effect (CATE), : Eq. (1) defines the CATE i as a comparison between the conditional likelihoods of no attrition under two different regimes, i.e., treatment and nontreatment. Although predictive modeling consists of estimating outcomes as a function of observed variables, the uplift modeling task is not to predict the outcome variable, but its variation due to the treatment. The do(⋅) operator [37] is commonly used in causal calculus to indicate that T = t denotes an intervention, i.e., interventional conditional distribution, rather than the observed values taken by T, i.e., observational conditional distribution. Uplift modeling employs machine learning techniques to estimate the potential outcomes. Hence, the difference of the two conditional probabilities is a continuous score known as the uplift score, i . Students whose > 0 i are considered treatment responders and, therefore, should be targeted.
We estimate the CATEs under certain assumptions, since the tutorials are not assigned at random. A first assumption is that the treatment assignment is as good as random once we control for the observed variables, that is, under unconfoundedness, the potential outcomes are independent of the treatment conditional on the observed variables, Y (T) ⊥ ⊥ T | X [38]. A second assumption, common support, guarantees that the conditional treatment probability is non-zero, P(Treat-ment| X) > 0 which is necessary to find appropriate matches of treated and untreated students. Finally, satisfying the stable unit treatment value assumption (SUTVA) [39] ensures that the potential outcomes are D. Olaya, et al. Decision Support Systems 134 (2020) 113320 not influenced by treatments given to other students. The uplift modeling literature distinguishes between two approaches to estimate the uplift: data preprocessing and data processing [40]. The data preprocessing approach consists of modifying, prior to training, (1) the outcome variable or (2) the input space. In contrast, the data processing approach alludes to methods that estimate the uplift indirectly or directly. Uplift is estimated indirectly when two separate predictive models (i.e., one for the treated and one for the untreated) are trained. The direct estimation refers to modified machine learning algorithms that have been adapted to directly estimate the uplift.
The modified outcome approach (MOA) was introduced by [41,42] to estimate conditional average treatment effects on the treated based on the difference-in-differences estimator and the ITE, respectively. In the uplift modeling literature this approach was presented by [43] and have been extended by [29,44,45]. For example, the MOA by [43] relabels the outcome variable by taking into account the four types of individuals present in the data set: treatment responders (TR), treatment nonresponders (TN), nontreatment responders (NR) and nontreatment nonresponders (NN). The transformed outcome variable considers TR and NN as positive cases, since these individuals are positively affected by the treatment. In contrast, NR and TN are regarded as negative cases, since these individuals either have a unfavourable or no response to the treatment. Hence, the uplift modeling problem is reduced to a conventional classification model, with the uplift computed as follows: Although the main advantage of the MOA is that it can use existing learning methods entirely off-the-shelf, the approach can also be inefficient, since the information of the treatment indicator is not used more than for the construction of the transformed outcome [46].
The second preprocessing method is the modified covariate approach (MCA), also referred to as S-learner [47] or R-learner [48]. The MCA was presented by [49] to model interactions between the treatment indicator and the observed pretreatment variables. In the uplift literature, Lo [32] proposed to train a single predictive model that includes within the input space aside from the pretreatment variables, the indicator of treatment as a dummy, D, and interaction terms between the dummy and the pretreatment variables, D × X. So that, . A potential drawback of the MCA is that the enlargement of the input space can result in multicollinearity problems [29].
Estimating uplift indirectly refers to the separate model approach (SMA), also referred to as T-learner [47,48] or Q-learner [28]. The SMA is the most intuitive method, as the predictive models learned on the treated and untreated are used to predict the uplift score of each test case based on the conditional probabilities of treatment and nontreatment. This methodology is simple and implements standard machine learning techniques, but can be suboptimal since the modeling objective of the two predictive models is not to estimate directly the uplift [50]. Nonetheless, [45] concludes that the SMA may perform competitively under certain conditions. Methods to model uplift directly aim to offset the main drawbacks of the previous approaches. The objective is to implement a single training scheme to estimate the uplift by adapting the objective function of conventional machine learning algorithms (a complete overview is given by [40,44]). The literature on heterogeneous treatment effects has proposed the causal tree [46], causal bayesian additive regression trees (BART) [51], causal forest [52], causal boosting [53], and generalized random forest [54] algorithms with modified splitting procedures that partition the data according to treatment effect heterogeneity.
Similarly in the uplift literature adapted K-nearest neighbour classifiers are proposed by [55,56]. Modifications to the splitting and pruning criteria of decision tree classifiers are found in [31,50,57,58].
Modified random forest algorithms are suggested to offset the instability of a single decision tree by [30,34,35]. For example, [59] employs a modified random forest algorithm to estimate the effect of motivational e-mail campaigns. Last, [60] presented a support vector machine for uplift modeling and [61] proposed a reinforcement learning approach.
The evaluation of uplift models cannot be performed by means of a loss function as conventional predictive models, due to the fundamental problem of causal inference [62]. The difficulty relates to the impossibility of observing the true effect of the treatment for each student. The data set, however, is split into a training set and a test set preserving the proportions of treated and untreated students. The uplift model is constructed on the training set, and later, the model is applied to the test set, to obtain the potential outcomes for each test case. The predicted uplift score is then computed as illustrated in Eq. (1). Subsequently, test cases are ranked according to the predicted uplift score in descending order. Last, test cases are segmented in groups of equal size (i.e., bins) and the segment-wise treatment effect is calculated as the difference between the response rates of treated and untreated subjects. The intuition behind this approach is that a model with an outstanding performance is expected to allocate in top segments students whose propensity to attrite will be reduced subject to participating in the tutorials.
Formally, test cases are ranked in descending order according to their uplift scores, . Let k be the k segment of test cases in , so that the amount of treated and untreated test units within the segment can be calculated respectively as follows: where the Iverson bracket is equal to one if the logical proposition between the brackets is satisfied. In addition, the number of treated and untreated test cases who do not drop out within the π k segment is obtained as follows: and Last, the segment-wise uplift [63] is calculated as follows: (1) The performance of an uplift model can be visualized by an uplift curve [50] (see Fig. 1). The uplift curve shows the cumulated segmentwise uplift as a function of the fraction of targeted students. It illustrates the trade-off between the action of targeting larger proportions of the population and the resulting uplift. The overall effect of the treatment is the uplift resulting from targeting 100% of the test set. A straight line connecting the two extremes of the uplift curve (dash line in Fig. 1) serves as a baseline and represents the uplift that is achieved when students are randomly exposed to the treatment (i.e., random selection instead of selection using an uplift model). The farther is the uplift curve above the diagonal line, the better is the model. Moreover, it is expected that the curve has a steep increase until all responders are identified, to later flatten and potentially move downward as treating more students becomes ineffective. Decision-makers can use the uplift curve to decide the optimal proportion of students to target given the straightforward interpretation. The Qini measure is a quantitative performance metric that facilitates the comparison of the performance of different uplift models. Similarly to the area under the receiver operating characteristic curve (AUC or AUROC), as computed for evaluating binary classification models, the Qini is the area between the uplift curve and the diagonal. Thus, the larger the Qini, the better the performance of the model.

Important considerations for student dropout
Uplift modeling considers different segments of individuals depending on their response to an intervention [64]. Treatment responders are labeled as persuadables, whereas individuals who are harmed by the treatment are known as do-not-disturbs. Moreover, there are also individuals who either will never be persuaded -the lost causesor will respond no matter the action -the sure things. Therefore, our interest is in identifying persuadables and refraining the treatment to the other categories.
There are important differences between the student dropout task and conventional uplift applications. First, there is a low risk of triggering students to drop out by targeting them with a retention effort, i.e., do-not-disturb students. This is because students with already outstanding performance who take tailored programs will still benefit from the intervention, and neither their performance nor their engagement will be negatively affected.
Furthermore, conventional uplift modeling avoids to treat lost causes which could pose an ethical problem. The reason for this is that underperforming students can be considered as "lost causes", and therefore not be treated. Our objective, however, is not to refrain the action to students who might benefit from it, especially the ones with low academic performance. We seek to design customized programs for a specific segment of students using an uplift model. This does not imply cancelling the current program that is available to all the students.
Another issue that arises in this particular case is self-selection bias. The university makes an open call for all students who are willing to participate in the tutorials. The students, however, are autonomous to decide their participation. Nonetheless, students with grades below average on the standardized test for college admission and/or relatively poor performance in the first semester of the bachelor program are more likely to accept the invitation to the tutorials. This may affect the estimation of causal effects, and provide an erroneous estimation of the uplift. Therefore, we verify the presence of selection bias before training, and if needed, correct the imbalance between the pretreatment characteristics of the treatment groups.

Experimental analysis
First, this section describes the data set and the data preprocessing and transformation methods. Later, it presents the preliminary analysis consisting of the assessment and correction of selection bias. Last, it displays and discusses the Qini values of the twelve different uplift models and the uplift curve of the model with the best performance.

Data set description
We gathered a data set of 3362 students who enrolled between 2012 and 2016 in three bachelor programs of a business school. The complete list of variables is depicted in Table A1 in the appendix. Two main sources of variables were combined to perform this study: prematriculation information and academic performance. These variables were collected during the first year of the programs and their two sources are described next.

• Prematriculation information
-Sociodemographic data consists of information about gender, age, marital status, occupational status, working hours, expected type of funding in higher school, family income level and residence. -Family background refers to data about the number of family members, educational level of both father and mother, number of parents that are alive, occupational status of both parents, number of members working, number of members enrolled in educational institutions, and an indicator of head of the family (e.g. father, mother, uncle, grandparent, among others). -Standardized admission test data indicate the scores of the Chilean standardized test (known as PSU) used for university admissions. It includes the subjects of mathematics, verbal, science, and history. -High school features include the type of high school (i.e., singlegender or mixed-gender education) and the type of funding received by the institution (i.e., public, private, or state-subsidized private).
• Academic information was collected along the first year of the bachelor programs. It consisted of data related to: dropout (i.e., temporary leave or absence, voluntary dropout, and expulsion), academic performance (i.e., final grades and credits for each course), entry type, declared bachelor program preferences, and the participation in the tutorials offered by the program for academic support (PAA).
The PAA seeks to improve the academic performance of students and to reduce the risk of student dropout by offering tutorials in subjects such as economics, mathematics, statistics, and English. The PAA started in 2012, and since then, students are invited each year at the beginning of the second semester to participate. Offering tutorials is a retention strategy whose objective is to prevent student dropout. We introduce a binary variable in the data set to indicate whether a student participated in any of the tutorials. Participants were labeled as treated, whereas nonparticipants as untreated.
The data set comprises 60 variables and includes academic performance information up to the end of the first semester. The outcome variable (i.e., dropout indicator) is defined on the basis of whether a student voluntarily abandoned the bachelor program within the oneyear period after finishing the first semester. This time frame is chosen based on the starting point of the tutorials (i.e., at the beginning of the second semester) and the literature on student dropout.

Data preprocessing and transformation
First, we remove 72 observations without records on academic performance during the first semester (e.g., students who did not take any course), and two variables with several missing values. Second, we impute missing values with the average or the mode, depending on the variable type. That is, missing values in numerical variables are replaced by the conditional average based on the entry cohort, whereas for categorical variables an additional category, "unknown", is created (those with more than 10% of missing observations) or substituted with the mode. An overview is provided in Table A2 in the appendix. Last, we apply dummy encoding to nominal variables and aggregate similar categories. The final data set includes 60 pretreatment variables.

Results
This subsection presents the preliminary analysis and the results of the uplift models trained on the student dropout data set. We assess and correct the balance across the pretreatment covariates of the treatment and nontreatment groups to mitigate the effects of selection bias. Then, different uplift models from the data processing and the data preprocessing approaches are employed to estimate the uplift, and later their performance is evaluated.

Preliminary analysis
Predicting the effect of treatments on an outcome of interest cannot be accurately determined when treatments are not randomly assigned to individuals, since the treatment groups may not be comparable. We assess the balance across the pretreatment variables of the treatment and the nontreatment groups to mitigate the effects of selection bias, since the tutorials were not assigned to students at random. The balance assessment can be performed either by considering theoretical evidence or by applying statistical tests. Statistical approaches consists of the estimation of: 1) the normalized difference between the treatment and nontreatment groups for each of the pretreatment variables, and 2) a chi-square test that performs an omnibus test whose null hypothesis states that at least one variable is significantly different between both groups.
Formally, the normalized difference Δ x is defined as the difference in averages by treatment status, scaled by the square root of the sum of the variances divided by two (Eq. (8)). The X t and s t are the sample mean and the sample standard deviation of the x pretreatment variable for the treatment group, respectively. Analogously, X c and s c represent the mean and standard deviation for the untreated individuals. A rule of thumb in the literature is that a normalized difference larger than one quarter is an indication of imbalance in that particular pretreatment variable.
Table 1 displays ten of the pretreatment variables with the largest normalized differences. We observe considerable differences between the students who participated in the tutorials and those who did not in terms of their family income, PSU scores in mathematics, type of healthcare affiliation, parents educational level, among others. This disparity may indicate that the beneficiaries of the program were mainly students from low-income families. Therefore, an unbiased uplift estimation requires that we reduce the imbalance between the treated and untreated students by making as homogeneous as possible both groups.
In addition, we performed the omnibus test proposed by [65]. This implementation is available in the RItools package in R. The p-value of 5.55e-156 indicates that at least one pretreatment variable has imbalanced. This result is consistent with the normalized differences analysis.
This study employs propensity score matching (PSM) to reduce the effect of imbalanced pretreatment variables on the uplift estimation [38]. Among methods, such as multivariate regression, synthetic control, and instrumental variable estimators, PSM is one of the most widely used approaches for causal inference in observational studies since it: separates the adjustment of confounding and the estimation of treatment effects phases, excludes from the analysis individuals for whom no comparison can be made, allows to formally verify whether the resulting data set is balanced [66], and is not sensitive to the number of pretreatment variables [67]. Nonetheless, PSM requires large samples to achieve overlap between the treated and untreated individuals, as well as it only controls for observed confounding variables [68].
PSM seeks to balance the overall distribution of the pretreatment variables by pairing "similar" treated and untreated students. The measure of similarity is the propensity score (PS). The PS is the likelihood of an individual to be treated as a function of the pretreatment characteristics, P(T = 1| X i ) [38]. Thus, treated and untreated individuals whose PS is relatively equal are considered similar and matched. This is done by means of nearest neighbour matching which selects the untreated individual whose PS is closest to the PS of a treated individual. Table 2 illustrates the improvement in balance for the previously shown ten pretreatment variables after using PSM. The imbalance of some variables such as gross family income, PSU score in mathematics and private school remains large, but it is considerably reduced by PSM. The p-value of the omnibus test (0.987) demonstrates that the matched set is statistically balanced.

Uplift modeling techniques
This study includes a selection of uplift models from the data preprocessing and the data processing approaches. Two well-established learners were chosen for classification: random forest and boosted trees (i.e., xgboost), as it has been empirically observed that ensemble methods reduce considerably the risk of overfitting without  compromising the bias error [69]. The baseline uplift methodology is the SMA. The MOA and MCA are implemented as in [32,43], respectively. Last, uplift is estimated directly by implementing the CTS [35], KL, ED, Chi [58], Xlearner and Rlearner [70] algorithms.

Uplift models performance
We use ten-fold crossvalidation to evaluate predictive performance, which is a well-established approach for model validation. The data set is split into ten folds of the same size. Stratification is applied to preserve within each fold the observed overall response rate. In every iteration, one fold is left out for testing and the remaining folds are used to train the model. The overall performance is the average of the results of each round and the standard deviation of the results indicates the stability of the model. Section 3.1 defines uplift modeling as the estimation of the net effect of a treatment on an outcome of interest. The predicted individual uplift score can be seen as an indicator of how sensitive an individual is to the treatment. Hence, the treatment assignment consists of targeting individuals whose uplift score is positive. A good performing uplift model identifies accurately treatment responders and prioritizes the treatment allocation to those whose net treatment effect is the largest. This study empirically evaluates the performance of uplift modeling by ranking in descending order test cases based on their predicted uplift scores, later segmenting the sorted test set in four bins of equal size, and calculating the "observed bin uplift" as indicated in Eq. (7). The larger the observed bin uplift, the larger the net effect of the treatment within that particular bin. Hence, best performing uplift models would have a larger observed uplift in top bins than in subsequent bins.
Response modeling and uplift modeling aim to optimally identify individuals to maximize the effect of a targeting decision by employing predictive analytics. The difference lies in that response models build a predictive model only using the treatment group, while uplift models incorporate the information available in both the treatment and the nontreatment groups into the analysis. That is, uplift modeling targets students based on the predicted net effect of the tutorials, whereas targeting in response modeling is made on the basis of their likelihood to attrite.
We corroborate the advantage of uplift models over response modeling in treatment customization by contrasting the predicted uplift achieved when the tutorials are assigned as suggested by the MOAxgboost uplift model (Panel A) and by two conventional response models: the random forest (Panel B) and the XGboost (Panel C) algorithms (Fig. 2). We display the results of the MOAxgboost, as it will be seen below that this approach outperforms other uplift modeling methodologies trained on the student drop out data set. The performance of these models is evaluated according to the per bin observed uplift. In Fig. 2, the y axis shows the observed uplift, whereas the x axis indicates the bins. We can see that the uplift model prioritizes the assignment of tutorials to students who are expected to be positively affected. As more students are targeted, the effect of the program decreases and even become negative or null in the last bins. By contrast, response models fail in ranking correctly treatment responders, as the effect of the tutorials in the first quartiles is inferior or even negative than the one predicted by the MOAxgboost. The study by [8] aligns with this finding as it concludes that individuals whose risk of churning is high, are not in all cases the best targets for retention campaigns.
The Qini metric summarizes the performance of the different uplift modeling approaches employed in our experiments. The larger the Qini, the better the performance of the uplift model. Table 3 presents the Qini values for each technique at different targeting percentages, since uplift models may perform differently depending on the targeted fraction of students. The Qini values of the model with the best performance are in bold. Nonetheless, it is observed that the MOAxgboost achieves an optimal performance in targeting both small and large groups of students. That is, the MOAxgboost is the most appropriate approach to personalize the assignment of tutorials to students. Moreover, the variability of the Qini values for all models increases as more students are selected to participate in the tutorials, indicating the instability of uplift modeling when targeting large samples.
As it was previously mentioned, the performance of uplift models is visualized by means of an uplift curve. The uplift curve illustrates the cumulated uplift as a function of the proportion of targeted individuals. Fig. 3 shows the uplift curve of the MOAxgboost. Overall, targeting according to the predictions of the MOAxgboost model boosts the effect of the program compared to targeting at random. The uplift curve increases irregularly up to targeting 80% of the sample, as the uplift model correctly identifies students who will not drop out due to their participation in the tutorials. Subsequently, it moves downward as the tutorials would be given to students whose intention to attrite will not change. The advantage of using an uplift curve is that it favors model comprehensibility and supports program designers in choosing the optimal fraction of students to target.
The results of the MOAxgboost model motivate the usage of uplift modeling as a support tool to assist decision-makers in designing retention strategies to reduce student dropout. The MOAxgboost uplift technique performs well in the majority of analyzed segments and, therefore, personalizing the assignment of tutorials according to its predictions can prevent future cases of student dropout. We, however, advice training different uplift modeling approaches, as there is no single uplift modeling technique with outstanding performance for all problems (i.e., No free lunch theorem) [40]. Last, we are also interested in analyzing in terms of the pretreatment variables to what extend students identified by the MOAxgboost uplift model as treatment responders differ from those who are treatment nonresponders. Our interest lies in the accuracy of the model, but also in its comprehensibility. Therefore, we form four different profiles on the basis of the quartiles shown in Fig. 2, Panel A. Since test set students are ranked according to their likelihood to respond positively to the program, i.e., individual uplift scores of the MOAxgboost model, segments of students with high treatment effects are those at the top.
The student profiles result from averaging the values of the ten most important predictors of the MOAxgboost. The importance scores of each attribute are obtained by fitting over ten folds a MOAxgboost, and subsequently averaging the values. We use the Gain criterion, as it indicates how valuable is the attribute at the splits during the construction of the trees. Although more variables can be selected, inspecting few variables allows us to maintain clarity in visualizations. Fig. 4 illustrates the variation of the average across the four quartiles for the chosen variables. Students with the highest likelihood to respond to the retention program are those in the top segments. We can conclude from the figure that the main differences among treatment responders and nonresponders are observed in their PSU scores in mathematics, the number of members in the family, attendance to a private school, the overall performance in the first semester, and the performance in English courses.
The studies by [15,17,19] indicate the association of academic performance variables from the first semester of bachelor programs and student dropout. Although the model suggests targeting students with relatively good performance, their proficiency in mathematics is among the lowest. Particularly, math test scores are related to dropout during the first academic year [71]. In addition, treatment responders are part of households with relatively few family members, as well as graduated from nonprivate high schools. This indicates that retention campaigns can take a proactive rather than a reactive approach, since prematriculation information may also be used for treatment customization.
The radar charts in this study intend to facilitate the interpretation of uplift modeling estimates for decision-makers. They are valuable to understand the needs of students and emphasize the attributes that differentiate treatment responders from nonresponders.

Conclusions
This article applies the uplift modeling framework to the problem of student dropout prevention. We demonstrate that focusing retention efforts, i.e., offering tutorials, on students with the largest likelihood to be retained due to the intervention boosts the effect of the program. We are able to reach higher uplift with the best machine learning model designed for this purpose, than with the alternative of targeting at random.
Self-selection bias is tested and corrected as part of the modeling process to avoid bias in the uplift estimation. Subsequently, we train twelve different uplift modeling approaches to predict students'  Fig. 3. Uplift curve MOAxgboost. The MOAxgboost model boosts the effect of the program compared to targeting at random, as students who will not drop out due to the intervention are prioritized. The uplift curve bends downward, as the tutorials are assigned to students whose intention to attrite will not change.
D. Olaya, et al. Decision Support Systems 134 (2020) 113320 response to the retention strategy, and assess feature relevance to better understand the characteristics of students who are likely to be retained with such program. This knowledge translates into a better design of tailored retention efforts. Particularly, the importance of prematriculation attributes indicates that the design of retention efforts can take a proactive rather than a reactive approach. There are several opportunities for future research. First, a further step is to target students according to the uplift model, and subsequently corroborate the effectiveness of the customized targeting assignment. This task, however, requires setting aside a holdout set of students who should not be treated, as estimating causal effects requires the comparison of alike individuals. Particularly, in terms of their characteristics and likelihood to respond to the intervention. Second, student dropout is a context-specific phenomenon and retention strategies comprise, but are not limited to, offering tutorials. Therefore, applying the uplift modeling framework to different institutional contexts, i.e., data collection on prematriculation and academic information at other universities, would enrich the understanding on the effectiveness and limitations of this approach in the customization of retention programs. Third, incorporating academic information from subsequent semesters may enhance model estimates and the comprehension of long-term program effects. Last, profit metrics for business analytics can be adapted to assess the benefits and costs of student dropout, as retaining students leads to social benefits and positive externalities.