Effectiveness of Artificial Intelligence Models for Predicting School Dropout: A Meta-Analysis

School dropout is a major concern in the educational systems of all countries. In recent years, artificial intelligence is playing an important role in predicting school dropout in the different educational stages of formal education. In this context, it is crucial to know that these predictions are accurate and understandable. This meta-analytic study aims to investigate the effectiveness of dropout prediction models conducted until May 2022. The databases used are Web of Science, Scopus, PubMed, ERIC, PsyInfo, Dialnet and Scielo. 15 studies with a sample size of 199,015 participants are analyzed. The meta-analysis uses a random-effects proportions model with 95% confidence interval. Statistical evidence indicates that artificial intelligence models performed well (91%; 95% CI = 89-93%) in predicting dropout; specifically, the Decision Tree model significantly (95.3%; 95% CI = 93-98%) predicts dropout better than other models such as Random Forest, Artificial Neural Network, Support Vector Machines, Logistic Regression and Stacking Ensemble. Consequently, more models should be applied in the dropout field with larger numbers of participants to confirm these findings and improve the quality of education.

rtificial intelligence (AI) used for prediction is an ever-expanding field of current relevance for creating both early detection and personalized recommendation systems that optimize the teaching-learning process and educational quality (Du et al., 2020;Xiao et al., 2018).According to the 2019 Horizon Report (Educause, 2019) the application of AI to educational processes will grow by more than 43%.
AI is a diverse set of methods and technologies (data mining, neural networks, algorithms, etc.) that allow computers to simulate tasks generally associated to a human mind (Baker & Smith, 2019).With this technology, computers should offer answers no worse than those that a human being could provide (Dobrev, 2012).Together with Turing's thesis, which states that what is relevant is not the fact that a machine may think or not, but that it really seems to think, AI is reduced to a behaviorist process based on imitations (Mira et al., 2003).These authors argue that, in order for a technology to be defined as AI, learning must take place.
This learning occurs through a series of algorithms, which are abstract machines that follow a sequence of steps to obtain results (Moschovakis, 2001) and, additionally in the case of supervised AIs, through a training process.In this process, the AI is provided with a series of input data, as well as the response it is expected to provide and, through the steps indicated in its algorithm, its internal structure is modified to provide the expected results.In this way, it can provide appropriate outputs for new input data for which its output is unknown.This is defined as generalization (Aldabas-Rubira, 2002).
One of the methods of AI is machine learning (Popenici & Kerr, 2017) which facilitates the creation of, among other things, predictions from data, such as school dropout rates.
School dropout refers to excused or unexcused absences from classes at any stage of education (Kearney, 2008) and completion in the educational system without a degree (Mahoney, 2018).This paper focuses on non-degree completion as an unresolved problem in the educational system (Kim & Kim, 2018).
In the context of students dropping out of school in the different educational stages that conform formal education, the relevance of AI is important for educational systems in all countries (principals, managers and politicians).School dropout is one of the most concerning issues for educational policies, especially after the implications of the measures taken to contain the COVID-19 pandemic with the closure of schools and the change from face-to-face to distance learning (Day et al., 2021).In this sense, Reuge et al. (2021), within the framework of a research proposed by UNICEF, explain how students who belong to groups at risk of social exclusion drop out of school permanently, aggravated by the impact of the measures taken to stop the spread of the COVID-19 pandemic.However, belonging to a group at risk of social exclusion is not the only risk factor for dropping out of school.Other circumstances, such as individual factors, or those of the educational systems themselves, which are not able to respond to this problem, must also be taken into account (UNICEF, 2017).Studies on school dropout (Heublein, 2014;Wilcoxson et al., 2011) identify several factors involved such as academic performance, cognitive potential, economic limitation, lack of motivation, study habits, personal circumstances, failure in the choice of study or wrong expectation.In fact, UNICEF itself proposes a warning system based on a series of indicators (Early Warning System) to detect the risk of dropping out of school in contexts where dropout exceeds 10% of total amount of students (UNICEF, 2018).Educational institutions can take action to reduce the dropout rate, which leads to the professional, economic and social impoverishment of countries.These actions require a significant allocation of financial and organizational resources.For this reason, the most accurate early identification and preventive action (Balfanz et al., 2007) would be essential for any educational institution and country.One way to address this issue is through data mining (Pereira & Zambrano, 2017) and, in particular, prediction-based machine learning explainable models.
In the formal educational context, different studies were found about school dropout prediction.Alban and Mauricio (2019), using Artificial Neural Networks (ANN) with data from 2,670 university students over three years, found dropout prediction with 96.3% and 96.8% accuracy.Chung and Lee (2019) used Random Forest (RF) with data from 165,715 high school students with 95% accuracy.Pereira and Zambrano (2017), using Decision Tree (DT) obtained an accuracy above 80%.Fernández-García et al. (2021) used several models from the beginning of enrolment to the fourth semester identifying 91.5% who would drop out by the end of the fourth semester.Hutagaol and Suharjito (2019) used K-Nearest Neighbor (KNN), Naïve Bayes (NB) and DT reaching 79.12% accuracy in predicting college dropout.Kiss et al. (2019) using Gradient Boosted Tree (GB), XGB and ANN had an accuracy of up to 85.8%.Adnan et al. (2021), in a sample of 32,593 university students, achieved an accuracy of 91.9% with RF.The same algorithm was used by Berriri et al. (2021) on 154 students and by Uliyan et al. (2021) on 949 students obtaining, respectively, 80% and 93% accuracy.Ahmad andShahzadi (2018), El Fouki et al. (2019) and He et al. (2020) used the ANN algorithm and, with samples of 300, 496 and 32,593 university students respectively, obtained accuracy rates of 93.2%, 92.54% and 91%.The DT algorithm was also used in the research of Barros et al. (2019) (n = 7,718), Freitas et al. (2020) (n = 1,549), Hamim et al. (2021) (n = 395), Nuankaew (2019) (n = 389) and Tan and Shao (2015) (n = 62,375) obtaining accuracy data of 95.39%, 99.34%, 94.07%, 87.21% and 94.63% respectively.Mourdi et al. (2020), with a sample of 49,551 university students achieved an accuracy of 89.8% with the Support Vector Machines (SVM) algorithm.Nabil et al. (2021) reached 76.2% with 4,266 students using Logistic Regression (LR) and Niyogisubizo et al. (2022) reached 92.89% with a smaller sample of 261 university students using a Stacking Ensemble (SE).Jadric et al. (2010) indicated that the most accurate models for predicting dropout were DT, LR and ANN.In a systematic review on the subject, Cardona et al. (2020) point out that the most commonly used AI models for predicting dropout in formal education are ANN, DT, LR, SVM, KNN, RF, NB; these same authors indicate that the most accurate models are ANN (71.5%-94%),DT (65.3%-81.3%),LR (50.1%-83%) and SVM (57.6%-86.4%).In summary, it is evident that in recent years the most frequent explanatory machine learning models that are being applied to the prediction of school dropout are ANN, RF, DT, KNN, NB, GB, SVM, LR and SE.
In this context, the central research questions are: are the predictions of AI models of school dropout accurate?and which are the most accurate AI models in predicting the dropout rate?To address these questions, the aim of this meta-analysis is to analyze the effectiveness of dropout prediction models.To do so, the empirical evidence from various international databases that employ explainable machine learning models in the prediction of school dropout in formal education is reviewed.The increase in the scientific literature published in recent years requires an updated assessment of the subject.The justification for this meta-analysis lies in the fact that it is the first research which analyzes the effectiveness of the prediction of different AI models for school dropout and compares different models, revealing the most effective ones for this prediction.
However, a meta-analysis combines all the results of the collected AI studies on school dropout with the same focus, omitting those studies whose analyses are not based on concrete statistics.Moreover, the papers included in the meta-analysis imply a high heterogeneity in the way the original research is conducted, and this in practice is always a complex challenge for the meta-analysis itself which makes the answers to the questions approximately certain.In the present research, the meta-analysis is immersed in the variability of multiple factors such as the nature of the selected variables or factors used for training (academic, demographic and social), the type of parameters and/or hyperparameters used in each algorithm, the type of target group, the heterogeneity of the educational systems, the size and characteristics of the sample, the format and typology of the questions, as well as all those contextual variables that directly or indirectly are influencing modulating the results.In summary, these aspects highlight the difficulties of conducting a meta-analysis on the detection of school drop-out by means of AI.However, the scientific and academic field needs objective information that brings together all the existing research to know the accuracy of the models of AI in school dropout.This is why the results of the meta-analysis will provide knowledge to help principals, managers and politicians make decisions to prevent school dropout and keep students in educational institutions.Specifically, they provide information on the most accurate explainable AI model they can use to predict the dropout rate of their educational institution.Thus, they can anticipate their students' school dropout and design the most appropriate measures to face that situation.

Methodology
In order to achieve the objectives of this study, a meta-analysis was carried out following the methodology proposed by Glass (1976), which allows an objective statistical analysis and greater generalization of the results when studies from different geographical areas with different backgrounds are used (Borenstein et al., 2009).The selection of articles for the metaanalysis was carried out following the guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) methodology (Page et al., 2021).PRISMA contains 27 indicators that constitute a checklist and a flow chart with three phases (identification, screening and included) to facilitate the design and development of a protocol and critical evaluation (Page et al., 2021).

Search Criteria
This study uses both inclusion and exclusion eligibility criteria.Inclusion criteria are: (a) empirical research (articles, conferences) of AI in education, (b) focused on the prediction of school dropout, (c) in the context of formal education, (d) whose content is open access, and (e) in the social sciences field of knowledge.
The exclusion criteria are: (a) theoretical research (articles, book chapters and books), (b) in the context of non-formal or informal education, (c) not open access, (d) with a noneducational focus, and (e) the research does not provide the accuracy data necessary to perform the meta-analysis.
The reason for choosing these inclusion and exclusion criteria lies in the fact that this is a quantitative study, so we had to exclude theoretical research (exclusion criteria "a") and select exclusively empirical research (inclusion criteria "a").On the other hand, although AI, after appropriate training, can be used to predict a wide range of situations, the present research has focused on the educational context (exclusion criteria "d"); in particular, the research has been directed at school dropout (inclusion criteria "b") and, therefore, within the formal education school context (inclusion criteria "c" and exclusion criteria "b").Inclusion criteria "d" and exclusion criteria "c" and "e" ensured that the research content and, in particular, the effectiveness data of the trained models could be accessed in order to successfully carry out the meta-analytical study.

Search Strategies
The search, carried out in May 2022, was conducted in English and Spanish in the databases Web of Science (WOS), Scopus, PubMed, ERIC, PsyInfo, Dialnet and Scielo without limiting it to any time interval or language.In addition, the bibliographical references of the selected articles were reviewed to locate possible studies that met the inclusion criteria.The descriptors used in the title, abstract and keyword search were: "artificial intelligence", dropout and prediction in English and Spanish combined with Boolean operators AND and OR.No secondary sources/grey literature have been explored in this work.Thus, the resulting search string was the following:

("artificial intelligence" OR "inteligencia artificial") AND (prediction OR predicción) AND (abandono OR dropout)
The search returned 494 publications in all databases, specifically 354 from WOS, 117 from Scopus and 23 from ERIC.Dialnet, PubMed, PsyInfo and Scielo returned no results.This total of 494 scientific productions is reduced to 15 articles that finally constitute the meta-analysis.The process followed from 494 is as follows: the articles were checked for full access (n = 211), they were articles (n = 111), the field of social sciences was selected (n = 91), they focused their research on formal education (n = 24) and they contained the data required for the meta-analysis (n = 15).No duplicate studies were detected in the sample.Fig. 1 shows the flow chart according to the PRISMA guidelines (Page et al., 2021) for the selection of publications included in the meta-analysis.

Codification Procedure
This study follows the guidelines of the systematic review manual (Higgins & Green, 2008) in which clear objectives, use of specific search terms and pre-defined eligibility criteria are indicated.PRISMA (Page et al., 2021) was used to search for empirical publications on the topic.The search process was conducted by two independent researchers with 100% agreement.A coding protocol is used to solve ambiguities, reflect on the proposals and resolve disagreements between the two researchers to 98%.
Table 1 systematizes a synthesis of the information of all the studies included in the work indicating authors/year, country of affiliation, country of data, sample size, educational level, type of data, AI models, metrics and effect size.This table is created by the researchers independently in Microsoft Excel to match the data.
For the moderator variables the coding was done as follows: secondary educational level (value = 0) or university educational level (value = 1); the number of variables used for the AI models, based on the average of the variables used in the 15 selected items, is less than fifteen (value = 0) or greater than fifteen (value = 1); the number of algorithms used in AI, based on the average of the algorithms used in the 15 selected articles, is less than five (value = 0) or greater than five (value = 1); the number of moments used is one (value = 0) or greater than one (value = 1); and, the small sample size is less than one thousand (value = 0) or greater than one thousand (value = 1).

Effect Size and Statistical Analysis
This work had a sample of 15 papers using a continuous variable (accuracy) and proportions as effect size.Although other measures of AI effectiveness better compensate for the deviations produced by imbalanced samples, the authors of this article decided to analyse accuracy ((True Positive+True Negative)/(True Positive+True Negative+False Negative+False Positive)) because it was the only continuous variable shared by all the studies.In addition, calculating other measures better suited to imbalanced samples such as the F1-Score ((Precision*Sensitivity)/(Precision+Sensitivity)) proved impossible as several studies did not publish True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN) data.All accuracy data were transformed to absolute frequencies and a random-effects model was used to calculate the effect size (Borenstein et al., 2009).This type of model allows for greater generalization of findings and an estimation of the effects of different sources of variation (moderating variables) (Borenstein et al., 2009).In this meta-analysis, the DerSimonian and Laird method (Egger et al., 2001) The raw proportion was calculated as well as for each proportion, the standard error and the confidence interval (CI) was of 95% and p.
To test the null hypothesis of homogeneity among the studies in the sample, Cochran's Q test was used, indicating with a significant value the presence of heterogeneity among the studies in the sample.To calculate the magnitude of this heterogeneity, variability I 2 was used to indicate the proportion of variability (proportion of observed variance that manifests the non-random difference between the effect sizes of the papers in the sample) (Higgins et al., 2003;Higgins et al., 2008).If I 2 reaches 25%, it is considered low level (differences due to random sampling and similar results between studies).However, if it reaches 50%, it is considered medium level and thus, 75% a high level (real differences due to different study designs) (Higgins et al., 2003).To analyze the effect of moderator variables on accuracy, a meta-regression analysis was used; for each moderator variable the estimation parameters, standard error, CI, Q and I 2 were calculated.
In the significant comparison between the different prediction models, the Chi-squared test recommended by Campbell (2007) and Richardson (2011) and the confidence interval according to the method of Altman et al. (2000) were used.
Although several international databases were researched, it is possible that relevant works may have been missed due to the search strategies used or any kind of bias.Two methods were used to assess publication bias: Rosenthal's fail-safe number and regression test.According to Borenstein et al. (2009) Rosenthal's test indicates the number of missing studies with null effects that would make the above correlations insignificant; when the values are large there is no bias.In regression test there is no bias when the regression is not significant.
The software used for data analysis in this work are Microsoft Excel, Jamovi version 1.6 and MedCalc (free statistical calculator from the internet).

Results
The results of the study are organized in three sections: the descriptive analysis of the data, the statistical analysis of the meta-analysis and the analysis of publication bias.

Description of the Studies Included
Fifteen papers were selected for the meta-analysis, all of them experimental studies.The publication range is between 2015 and 2022, most of them being published in 2021 (33%, 5/15).6.6% were published in 2015, 6.6% in 2018, 20% in 2019, 26.6% in 2020 and 6.6% in 2022.Sample data for the research were obtained from 12 different countries.Data from two research studies were obtained from Brazil and the United Kingdom, while data from only one research work were obtained from China, France, Morocco, Pakistan, Portugal, Saudi Arabia, Slovakia, Spain, Thailand and the United States.In one of the studies (Nabil et al., 2021), the sample of the study belonged to 13 different countries.

Figure 2 AI Algorithms Used in the Sample
In 11 of the 15 studies analyzed, the algorithms were experimented in isolation, without using any procedure that combined and decided between the results of several algorithms (ensembles).In those where some form of ensemble was used, it was treated as another algorithm that was compared with the rest.For example, in the study by Barros et al. (2019) a Balanced Bagging technique with different DTs is used.In the paper by Fernández-García et al. ( 2021), one of the algorithms used for comparison is an ensemble that combines the results of a GB, a RF and a SVM.Similarly, a Stacking Ensemble is considered as a further option for comparison in the paper of Niyogisubizo et al. (2022).However, the study by Mourdi et al. (2020) is the only one that uses an ensemble in which the results of the SVM, KNN, DT, NB and LR algorithms are voted, but since the study provides the data independently per algorithm, these have been analyzed in isolation for the present research.

Results of the Random-Effects Model Meta-Analysis
In the meta-analysis, a random-effects model with the accuracy of all studies was used.The accuracy shows a mean proportion effect size of 91% (95% CI = 89-93%).The forest plot is shown in Figure 3.The variability between the different samples was significant (Q = 2843.805,p < 0.001; I2 = 99.51%).The type of model used (p < 0.001) had a significant contribution to heterogeneity, indicating that the dropout effectiveness ratio decreases using RF, ANN, SVM, LR and SE models.The other moderating variables used in this paper did not have a significant contribution to heterogeneity: educational stage (p = 0.158), number of algorithms used (p = 0.836), number of moments used (p = 0.682), number of variables (p = 0.468), sample size (p = 0.849).This high variability is explained by other factors not considered in this study.
In the DT model, which is the most effective at predicting dropout, possible moderating variables have been analyzed, but none of them contribute significantly to explain the heterogeneity: educational stage (p = 0.742), number of algorithms (p = 0.309) and number of variables (p = 0.334).This high variability is explained by other factors not considered in this study.In the RF and ANN models, as there are fewer than 3 studies, the analysis of moderating variables cannot be performed.

Analysis of Publication Bias
Rosenthal's fail-safe number (Fail-safe N = 7837746.00,p < 0.001) and regression test (Z = -1.321,p = 0.187) were used for this analysis.Neither procedure showed any evidence of publication bias in the studies.Figure 4 shows the funnel plot, each point representing a study included in the meta-analysis.

Figure 4 Funnel plot dropout accuracy
For studies using DT (Fail-safe N = 1670610.000,p < 0.001) and regression test (Z = -0.235,p = 0.814) there is no evidence of publication bias in the studies.

Discussion
In recent years, several AI models have been applied to predict the dropout rate in different educational levels (mostly higher education) of formal education.So far, there is not any metaanalysis that systematizes all empirical research on the subject and compares the AI models used.Therefore, finding out whether AI models are being accurate in different countries and identifying the best predictive model of early school dropout is an internationally relevant issue for all governments and educational leaders.This research aimed to analyze the effectiveness of dropout prediction models in formal education.Specifically, this meta-analysis found that the most commonly used AI machine learning models in scientific research are highly accurate.Moreover, the most accurate and effective AI model for dropout prediction is DT followed by ANN and RF.

Are the Predictions of AI Models of School Dropout Accurate?
The results of this paper show that the explanatory AI models used in the studies perform well in predicting school dropout, although this statement should be taken with caution since the only continuous variable shared by all the studies analyzed was accuracy, whose limitations are described in studies such as that of Sokolova et al. (2006), and that, as previously described, most of the studies analyzed did not provide the data to calculate other more appropriate measures.However, this finding is in line with Pereira and Zambrano (2017) who indicate that the analyzed models offer high confidence predictions in school dropout (Delen, 2011;Dissanayake, Robinson & Al-Azzam, 2016).They also indicate that they are efficient methods for predicting the dropout rate in educational institutions.Studies such as Martinho et al.'s (2013) and Kostopoulos et al.'s (2017) show a high success rate in detecting dropout at a very soon stage, exclusively with data extracted from the enrolment process or at an early point in the course schedule.This finding is internationally relevant because this dropout indicator represents a measure of the success of the educational institution and of the students.
Five factors were quantitatively assessed in the analysis of moderating variables.Only one of them, the type of AI model, was relevant as far as it indicated that the dropout prediction ratio improves using DT compared to other unspecified models (RF, ANN, SVM, LR and SE).According to Adejo and Connolly (2018) the advantage of DT over other models is based on the flexibility to model non-linearity and computational speed; it also offers a more visual and easier to understand the results.For Dissanayake et al. (2016) the DT model performs better when using original data rather than principal components and for Cardona et al. (2019) the DT is an effective AI model for predicting dropout.
The remaining moderating variables were not significant.The educational stage (high school vs. university), the number of variables used for the AI models, the number of algorithms used in AI, the number of data collection moments used, and the sample size did not explain the variability in the prediction of school dropout.Therefore, there are other factors not captured in this meta-analysis that moderate and explain the heterogeneity in dropout prediction.This implies that the results of this paper should be interpreted with caution and in the context of the sample used.

Which are the Most Accurate IA Models when Predicting the Dropout Rate?
When comparing the different AI models, the DT model, when predicting dropout, has a significantly higher proportion (95.3%) than the other models (RF, ANN, SVM, LR and SE) used in the study of the subject.The findings of the meta-analysis indicate that the DT model is the most accurate predictor of dropout in formal education.This result is in line with the research conducted by other authors such as Sreenivasa et al. (2018) who found that the DT model was the most accurate model compared to other models such as RF, NB and J48 when analyzing first undergraduate students; Freitas et al. (2020) demonstrated that DT achieves the best results in accuracy, recall, precision and F1 Score when analyzing undergraduate students using socioeconomic data and different algorithm (DT, LR, SVM, KNN, MLP, DNN).For Hamim et al. (2021) DT provided the most accurate performance in the context of traditional education.On the other hand, the results of this work are in contrary to those obtained by Delen (2011) who, with a sample of 25,224 first-year university students and using the ANN, DT and RL models, attained that ANN is the model that performed best when predicting the dropout rate, indicating that the economic factor and past and present academic performance are the most relevant parameters.Oztekin (2016) used data from university students in a time span of four years, and compared three models: ANN, DT and SVM to predict school completion.He found out that the most accurate model was SVM.
As for the comparison between RF and ANN models, the most accurate prediction was made by ANN.These results are in line with several studies.The research by Mduma and Machuve (2021) obtained higher results in the case of ANN (83.6%), compared to RF (77.6%).The study carried out by Sani et al. (2020) showed that there was hardly any significant difference in accuracy values between RF and ANN algorithms (95.93% and 95.86% respectively).

Conclusion
This meta-analysis identifies the most accurate machine learning (ML) models for predicting dropout in formal education.This meta-analysis found that the AI models currently being used in dropout rate prediction are accurate and perform well in predicting dropout rates.Moreover, the prediction accuracy ratio is higher using the DT model versus other models (RF, ANN, SVM, LR and SE) and using DT versus ANN and RF, with the use of factors of the type academic (academic achievement, previous academic achievement, e-learning performance, educational background), social (economic, parental education, living place) and demographic (gender, age, nationality, academic level, marital status).If we look at the characteristics of these factors, which are heterogeneous and have different scales of measurement, we can find one of the reasons why the DT algorithm has proved to be more effective than others in predicting school dropout.In this sense, Pal and Mather (2003) state that DT can handle data measured with different scales.Moreover, it does not behave as a black box, as in other algorithms, but rather an analyst can examine the decision tree after the learning process, where each of the previously described factors (academic, social or demographic) can constitute a node of the tree that leads to the final prediction of dropout or not dropout.
The practical implications are that educational systems can implement appropriate measures at the optimal time for students to continue studying and finally, graduate.In the field of management by principals and managers of educational institutions, the main contribution is to guide and improve the decision-making procedure by controlling the supply of updated information and assessment of the documentation of the centres.In this sense, it can be a tool to reduce risk and uncertainty when making an investment decision on school enrolment and prognosis of vulnerable dropout conditions.For example, making a decision on the promotion or repetition of students who have failed a grade and proposing other alternatives for failing students, such as re-examination or repeating the failed grade.At the same time, this field of management entails the design of educational strategies and the implementation of academic actions such as, on the one hand, early warning systems can be set up to detect students who may be at risk during their first year at university or high school.Tsao et al. (2017) designed an early warning system using DT and a heuristic model that is effective for their analyzed educational institution.On the other hand, individualized intervention measures can be designed through counselling for each student by providing appropriate actions to keep them in the educational system, improve their academic performance and increase the success rate of the educational institution, e.g.motivational and attention-grabbing strategies to keep students focused on their studies.Heublein (2014), in his review, proposes to extend the support offered to students during the start of studies, better information and more flexible curricula in higher education.
In general, AI models will be applied to the educational context more and more frequently, so leaders, teachers and researchers should be constantly updated to provide the most appropriate actions for each student, avoiding school dropout and optimizing their academic and personal success.According to the results of this meta-analysis for dropout prediction the use of DT is pertinent in empirical investigations when using DT with optimal hyperparameters as identified or recommended in previous studies (Gomes et al., 2018).In this way they can obtain maximum information by improving the model architecture, because the adjustment of hyperparameter values is an alternative to achieve optimal predictive performance, i.e.DT will make better predictions on invisible data, which it cannot learn from, helping to prevent overfitting.

Limitations
To interpret the results of this work, it should be noted that this meta-analysis focuses on 15 empirical articles that met the inclusion criteria.Therefore, the sample was limited to 15 independent primary studies.The results cannot be generalized to other research that does not indicate accuracy.This limitation of the accuracy data is relevant because some articles do not provide it and use F1-Score instead because it has fewer problems with imbalanced data.The issue of imbalanced data is a major challenge in sample classification problems by AI algorithms, because the effectiveness of the algorithm increases for the category with higher frequency of occurrence, while it decreases for the category with lower frequency of occurrence in AI training processes (Jiang et al., 2012;Su et al., 2006).This problem has been addressed in different ways in some of the papers studied.For example, the research by Barros et al. (2019) describes the downsampling or upsampling techniques used to mitigate the imbalance problem.The paper by Berriri et al. (2021) uses two ways to balance the sample prior to AI training: on the one hand, grouping the more numerous classes of students into subclasses with a better balance and, on the other hand, adjusting the parameters of the RF algorithm so that the classes with fewer samples have a higher weight, adjusting their weight inversely proportional to their frequency of appearance in the data.The study by Nabil et al. (2021) compares different techniques for sample balancing, such as SMOTE, ADASYN, ROS and SMOTE-ENN.
A limitation also related to the primary school sample is the educational level, mostly in higher education.Research focuses more on university students and to a lesser extent on secondary or other lower educational stages.The main reason is the fact that the highest dropout rates are found in higher education; the OECD (2019) estimates a 30% dropout rate at university in all OECD member countries.This implies an economic and social cost for universities and governments.
In the search descriptors, AI is included but machine learning and supervised/unsupervised learning are not included.AI is used in this research as it is considered to be a more generic descriptor and could provide more research results.However, this is possibly a limitation and some studies may have been overlooked, so it may be useful to take this into account in future meta-analyses on the subject.
One aspect that depends on the primary studies used in the meta-analysis is the higher representation of certain Asian countries, since 46.15% of the authors, according to their affiliation data, belonged to Asian countries, followed by 21.54% of authors from countries on the African continent.This may be related to the funding received for this type of AI-based studies or the interests of certain countries in developing this type of technology.This may bias the results and should be taken into account in the interpretation of this meta-analysis.
Another factor to take into account, is the high heterogeneity among the studies, which is not explained by the moderating variables analyzed in this paper, except for the type of model.This implies interpreting the results in the context of the sample used and conducting an indepth analysis.
The last important point refers to the variables that each study uses to construct the AI model of dropout prediction.Some studies focus more on academic factors (Berriri et al., 2021;El Fouki et al., 2019;Nabil et al., 2021;Niyogisubizo et al., 2022;Nuankaew, 2019), others combine academic and demographic data (Adnan et al., 2021;Ahmad & Shahzadi, 2018;Fernández-García et al., 2021;He et al., 2020;Mourdi et al., 2020;Tan & Shao, 2015;Uliyan et al., 2021), whereas a final, smaller group analyses academic, demographic and social data (Barros et al., 2019;Hamim et al., 2021).As a particular case that has not been included in any of the three previous groups, the study by Freitas et al. (2020) used demographic and social data, but not academic data.This heterogeneity in the use of different factors in the articles analyzed means that the results are not entirely homogeneous because they do not originate from the same type of data, i.e. findings from academic and demographic data alone are mixed with demographic and social and academic and social data.In this sense, Tsao et al. (2017) concluded in their study that the selection of variables to generate the data for the AI model considerably influences the performance of these prediction models.At the same time, we are aware of the relevance of generating an AI model from as much data as possible from different factors.In fact, it can be seen that dropout prediction depends not only on socioeconomic factors but also on the presence of other factors such as academic performance, personal circumstances or lack of motivation.For this reason, studies on AI models for dropout prediction could possibly try to obtain the largest volume of data from different factors and categorise the results by independent and combined factors, so that data of the same nature can be compared (even knowing that, within the same factor, there is a diversity of data).In the coming years, when there are more articles on the subject, meta-analyses can be carried out that evaluate the same data, for example, demographic and social data alone, or academic and social data alone, and can be compared to see which factors and categories within these factors are the most effective for dropout prediction alarm systems.

Prospective
Future lines of research need to be directed towards the application of DT as the most accurate model and the combination of DT plus other models that are the most accurate ones in each educational institution; personalization in schools and universities is essential.At the same time, more research is needed in other educational stages different from higher education to increase the transition of students to higher education.
For future work, it is recommended to incorporate grey literature, expand the search databases and extend research to recommendation systems and adaptive learning.Knowing the prediction data is paramount and requires later recommendation actions for each student and institutions to improve the quality of education.

Table 1
Results of the Empirical Studies Included in the Meta-Analysis