Assessment of Students’ Achievements and Competencies in Mathematics Using CART and CART Ensembles and Bagging with Combined Model Improvement by MARS

: The aim of this study is to evaluate students’ achievements in mathematics using three machine learning regression methods: classiﬁcation and regression trees (CART), CART ensembles and bagging (CART-EB) and multivariate adaptive regression splines (MARS). A novel ensemble methodology is proposed based on the combination of CART and CART-EB models in a new ensemble to regress the actual data using MARS. Results of a ﬁnal exam test, control and home assignments, and other learning activities to assess students’ knowledge and competencies in applied mathematics are examined. The exam test combines problems on elements of mathematical analysis, statistics and a small practical project. The project is the new competence-oriented element, which requires students to formulate problems themselves, to choose different solutions and to use or not use specialized software. Initially, empirical data are statistically modeled using six CART and six CART-EB competing models. The models achieve a goodness-of-ﬁt up to 96% to actual data. The impact of the examined factors on the students’ success at the ﬁnal exam is determined. Using the best of these models and proposed novel ensemble procedure, ﬁnal MARS models are built that outperform the other models for predicting the achievements of students in applied mathematics.


Introduction
The quality of mathematics training in higher education is essential for competitive future professional achievements of students in engineering, software, economics and other specialties. Alongside traditional teaching and learning methods in mathematics, increasingly various information technologies, computer-and mobile-based methods are applied using specialized software, as well as methodologies that involve project and team work, group discussions, role playing, blended learning etc. [1,2]. In particular, in the last two decades, an overall vision for teaching mathematical subjects in connection with their possible practical applications has been actively developed regarding the concept of competence. The concept is defined in [3] as follows: "Mathematical competency is understood as the ability to understand, judge, do, and use mathematics in a variety of intra-and extra-mathematical contexts and situations in which mathematics plays or could play a role." In the context of higher education in engineering [3,4], the following eight key mathematical competencies are formulated: C1-thinking mathematically; C2-reasoning mathematically; C3-posing and solving mathematical problems; C4-modeling mathematically; C5-representing mathematical entities; C6-handling mathematical symbols and formalism; C7-communicating in, with, and about mathematics; C8-making use of aids and tools. Based on these competencies, specific teaching and learning methods for mathemat-ness Information Technologies at University of Plovdiv Paisii Hilendarski, Bulgaria, which also includes as its principal component a small practical project. The main predictors used in the analyses are students' grades from ongoing testing during the trimester (control works and home assignments), attendance at lectures and laboratory practice, as well as the scores on individual problems in the exam and a small practical project. The modeling of the empirical data is performed with the methods CART and CART ensembles and bagging (CART-EB). To improve the result of the prediction of the exam test points, the best of these models are assembled with MARS. This is the first time that the CART-EB method is applied for statistical modeling of data in the field of education. Another contribution is the use of the MARS method to generate new ensemble models from other ensemble models.

Methodology
The main part of each training process in education is the assessment of knowledge and skills, acquired by the students at a given stage. Depending on the curriculum for a given mathematical subject, in order to pass the exam, the student attends a certain number of lectures and laboratory practices, takes intermediate tests (control tests), solves assignments at home, works on individual or group projects, prepares presentations etc. Usually this type of control is evaluated with a certain score. This combination of activities is denoted as preparatory. At the end of their education, the students take a final exam, which can be written, oral or a combination of the two, or another type of assessment. It is assumed that the grade from the exam is influenced by the combination of preparatory activities during the course of the education. In order to apply a competency-oriented approach to the assessment of acquired knowledge and skills, a small practical-oriented project is used as a component of the final exam. All preparatory activities and the components of the exam test can be assigned a certain type of measurement and the respective dataset can be derived, where the grades are presented as variables. Exam grades can be considered to be a dependent or target variable, and the rest are predictors. Potential predictors are, for example, homework grades, course project grades, reports, gender of the student, the high school he or she graduated from.
Our experimental empirical study sets out to perform the following tasks: • Construction of the integrated competency-based test for the final exam in mathematics; • Construction, analysis and improvement of predictive models for evaluation of students' achievements using ML techniques; • Application of the models for determining the importance of the preparatory activities and individual components of the exam to its assessment and, in particular, the importance of the project.
In essence, these tasks point to finding hidden similarities and patterns in the data using ML regression-type modeling techniques.

Machine Learning Methods Used for Statistical Analyses
The term ML (also referred to as learning analytics) denotes a class of methods and algorithms of artificial intelligence. Usually ML is used for classification and regression problems, and self-learning is achieved through various algorithms for cross-validation, improvement of model accuracy and fitting quality. This is achieved by combining features of computational statistics tools, numerical methods, optimization methods, probability theory, graph theory etc. ML methods are nonparametric and allow the detection of nonlinearities and relationships in the data without the need to model them explicitly; that is, they are data driven. Their core advantage is the generation of numerous distribution free and robust models, among which the most adequate and optimal model in a given sense can be selected. The following ML methods are widely used to model educational data: logistic regression, cluster analysis, decision trees (CART), support vector machines (SVM), multivariate adaptive regression splines (MARS), random forest (RF), neural networks (NN), fuzzy logic and others [10,24].

Classification and Regression Tree (CART)
The CART method [26] is a typical representative of decision tree algorithms and can be used for classification or regression. The main concept of the method is to classify the data from the training dataset through a recursive procedure into a binary tree structure with nodes. At each stage, the cases in the current node, called parent node, can be split into two child nodes according to the threshold value of some predictor variable. The predicted value in a terminal node is simply the average of the response values located in that node [24]. The threshold value is determined by a greedy algorithm, which checks all variables and their values so that the model minimizes the current selected type of summary error of the predicted values or other criteria in the terminal nodes of the tree. The splitting criterion for regression trees can be least squares or least absolute deviation. Once the tree is constructed, branches that do not contribute to the improvement of the model are removed and a final pruned tree is obtained. The researcher presets the settings and hyperparameters to select an optimal model, the type of cross-validation or other ML procedure, and adjusts the algorithm. For more details, see [27,28].

CART Ensembles and Bagging (CART-EB)
There are cases in which regression CART models may show instability in prediction under the influence of outliers, unsignificant predictors, predictors with small variation and others. There may also arise overfitting of the model [29]. Then, it is appropriate to use an ensemble of trees in combination with a bagging (also known as bootstrap aggregation) algorithm. There are many ML methods involving these techniques. In the current study, the algorithm of the CART-EB method was applied using the analysis engine CART ensembles and bagger of the Salford Predictive Modeler software suite [30]. Some other implementations in literature can be found in [31,32]. The initial CART tree of the ensemble is constructed with the entire data sample and all predictors. Then, it is pruned using 10-fold cross-validation. Bagged trees are built independently one from the other on bootstrap samples with or without repeated cases. They use a random subset of predictor variables at each decision split as in the RF algorithm. Ensemble trees can be built as exploratory (unpruned) maximal trees or they can be pruned by cross-validation. The case-predicted value is the average of the predictions of all the trees in the ensemble.

Multivariate Adaptive Regression Splines (MARS)
MARS is a nonparametric data mining and machine learning method, developed in [33]. If the dependent variable (here Exam) is y = y(X) and X = (X 1 ,X 2 , . . . ,X p ) are p predictors with dimension n, the regression MARS modelŷ =ŷ [M] has the following form: where b 0 ,b j , j = 1,2, . . . ,M are the coefficients in the model, BF j (X) are its basis functions (BF), M is the number of BFs. The one-dimensional BF is written in the form where the nodes c k,j ∈ X k are determined by the MARS algorithm. For the nonlinear interactions, BFs are built as products of other BFs. The control parameters chosen by the researcher are the maximum number of basis functions and the maximum number of their multipliers (i.e., degree of interactions) in BFs. The algorithm involves two steps. The first step starts by setting b 0 (for example, b 0 = min 1≤i≤n y i ) and then the model is complemented consistently by BFs of type (2). For each Mathematics 2021, 9, 62 5 of 17 model with a given number of BFs, the MARS algorithm defines variables and nodes so as to minimize a predefined loss function, such as the root mean square error. In the second step, BFs that do not contribute significantly to the accuracy of the model are removed. For more details, see [33].

Model Evaluation Metrics
In this study, the best ML models were selected using the highest coefficient of determination R 2 and the minimum values of the root mean square error (RMSE) given by the expressions whereŷ i and y i stand for model predicted and Exam values, respectively. The performance of the models was also evaluated using the Theil's forecast accuracy coefficient U II [34]: The lower the value of the coefficient, the better the accuracy of the model. The coefficient U I I is dimensionless and is used to compare models obtained by different methods, as well as to identify large values. The model is considered to be of good quality when (4) is less than 1.
When choosing from nested models, the parsimony principle was applied [35].

Test Design
An experiment was conducted with the Applied Mathematics course. The final exam test combines three main components with problems in mathematical analysis, probability theory and applied statistics. It includes: • Problems in math analysis (5 problems), 15 points, 50%; • Problems in probability (2 problems), 5 points, 17%; • A small practical project in applied statistics, 10 points, 33%.
The percentage indicates the relative weight within the total number of 30 points for the entire exam. Unsolved problems are evaluated with 0 points. A sample version of the exam test with 7 type variations is given in Figure 1. Each student works on an individual test. It needs to be noted that the problems in the first two components are of traditional type; these problems have been used in exams in this course of studies over the last 7-8 years. The added project includes some general instructions without explicitly stating how the problem is to be solved.
The exam was taken by 68 first-year students in the specialty of Business Information Technologies at the Faculty of Mathematics and Informatics, University of Plovdiv Paisii Hilendarski, Plovdiv, Bulgaria. According to the first trimester curriculum, these students have taken a linear algebra and analytic geometry course, and during the second trimester, the course in Information Technology for Mathematics, where students are trained to use Wolfram Mathematica to solve mathematical problems using a computer. The current course in Applied Mathematics is in the third trimester.
The results of the preliminary activities and the final exam in number of points are described with the variables: Exam (total exam points, up to 30), Math_An (mathematical analysis, up to 15), Stat (statistics, up to 5), Project (up to 10), A1_12 (home assignment 1, up to 12), A2_20 (assignment 2, up to 20), CW1_30 (homework 1, up to 30), CW2_30 (homework 2, up to 30), Attn_Lect (attendance to lectures, up to 10) and Attn_Labs (attendance to labs, up to 10).
trained to use Wolfram Mathematica to solve mathematical problems using a computer. The current course in Applied Mathematics is in the third trimester.
The results of the preliminary activities and the final exam in number of points are described with the variables: Exam (total exam points, up to 30), Math_An (mathematical analysis, up to 15), Stat (statistics, up to 5), Project (up to 10), A1_12 (home assignment 1, up to 12), A2_20 (assignment 2, up to 20), CW1_30 (homework 1, up to 30), CW2_30 (homework 2, up to 30), Attn_Lect (attendance to lectures, up to 10) and Attn_Labs (attendance to labs, up to 10).

Measurement of Competencies by the Exam Test
Following the recommendations of [4], the experience in [5,36] and with the aid of a three-dimensional scale, we defined the correspondence between the eight competencies and the elements of the exam test, as shown in Table 1. Here T1-T5 mean subproblems A-E in problem 1; S1 and S2 in problem 2; P1-P4, the instructions to the project. It is shown that all competencies are included with the exception of C7 because the exam is individual and does not allow for any communication with other students and/or external sources. In addition, Figure 1 shows that the probability theory problem Stat requires a solution with pen and paper and does not duplicate the project. As a whole, the project is independent in terms of curriculum covered and supplements the competencies, which are not included in the first two test components. The level of solution of the project indicates the degree to which the students have acquired the necessary knowledge and skills in

Measurement of Competencies by the Exam Test
Following the recommendations of [4], the experience in [5,36] and with the aid of a three-dimensional scale, we defined the correspondence between the eight competencies and the elements of the exam test, as shown in Table 1. Here T1-T5 mean subproblems A-E in problem 1; S1 and S2 in problem 2; P1-P4, the instructions to the project. It is shown that all competencies are included with the exception of C7 because the exam is individual and does not allow for any communication with other students and/or external sources. In addition, Figure 1 shows that the probability theory problem Stat requires a solution with pen and paper and does not duplicate the project. As a whole, the project is independent in terms of curriculum covered and supplements the competencies, which are not included in the first two test components. The level of solution of the project indicates the degree to which the students have acquired the necessary knowledge and skills in statistics in order to solve on their own a complete mathematical problem-from the data, through the analyses to the interpretation of the results obtained.
It should be noted that the students have solved the project in different ways, with different methods. Some managed to make only descriptive statistics, with different statistics selected. Other students continued with cluster analysis, factor analysis or principal component analysis. More often, regression analysis was performed in one-dimensional or multidimensional case.

Initial Processing and Analysis of the Data
Descriptive statistics of the variables used in the study are given in Table 2 and the distributions in the form of box plots with unstandardized data are shown in Figure 2a,b. Table 2 and Figure 2b show that the mean values of the results for Stat and Project are quite low, and their median is 0. The reason is because only 31 students, or 45%, worked on the project. In addition, Table 2 and Figure 2a,b lead us to the conclusion that most of the variables are not normally distributed (A2_20, CW_30, Stat, Project etc.). This is evidenced by the relatively high values of the ratios of skewness/std. error of skewness, kurtosis/std. error of kurtosis, as well as from the box plots. For example, for the target variable Exam we have the ratios Skewness Std. Err. Skewneess = 1.14 0.29 = 3.931 > 1.96 and Kurtosis Std. Err. Kurtosis = 1.85 0.57 = 2.643 > 1.96, which is an indication for non-normal distribution of the variable. In addition, a onesample Kolmogorov-Smirnov test with Lilliefors significance correction was applied, which reaffirms that Exam did not follow a normal distribution as the calculated p-value is 0.000. Furthermore, the relationships between the variables are hidden and possibly highly nonlinear.

Results from the CART Models
Multiple regression trees were built using the CART method. The dependent variable is Exam and the factors on which its values depend are the remaining nine variables, i.e., Math_An, Stat, Project, A1_12, A2_20, CW1_30, CW2_30, Attn_Lect, and Attn_Labs. The objective was to define which independent factors have the strongest influence on the Exam and to what extent. Before applying the algorithm, hyperparameters m1 (minimum cases in parent node) and m2 (minimum cases in child node) were set. Regression tree procedure on the learn (training) set is carried out using 10-fold cross-validation, which is recommended for small samples [27,28]. The least squares method was selected as a splitting criterion.

Results from the CART Models
Multiple regression trees were built using the CART method. The dependent variable is Exam and the factors on which its values depend are the remaining nine variables, i.e., Math_An, Stat, Project, A1_12, A2_20, CW1_30, CW2_30, Attn_Lect, and Attn_Labs. The objective was to define which independent factors have the strongest influence on the Exam and to what extent. Before applying the algorithm, hyperparameters m1 (minimum cases in parent node) and m2 (minimum cases in child node) were set. Regression tree procedure on the learn (training) set is carried out using 10-fold cross-validation, which is recommended for small samples [27,28]. The least squares method was selected as a splitting criterion.
For m1 = 5, m2 = 2 in Figure 3a, a diagram is shown of the relative error of the generated CART models depending on the number of their terminal nodes. For the case of m1 = 6, m2 = 3, the scheme with the relative errors of the constructed models is presented in Figure 3b. Relative errors are calculated as the ratio of the least square error of the current model divided by the root node error. Models with relative errors distinguished by one standard error (1 SE) are colored in green. This means that all models in green from Figure  3a,b can be considered as a set of competing models. From Figure 3a, it is evident that the model with a minimum relative error of 0.321 has 13 terminal nodes. We denote it by M1. In addition, two models were analyzed-the M2 model with a minimum number of 9 For m1 = 5, m2 = 2 in Figure 3a, a diagram is shown of the relative error of the generated CART models depending on the number of their terminal nodes. For the case of m1 = 6, m2 = 3, the scheme with the relative errors of the constructed models is presented in Figure 3b. Relative errors are calculated as the ratio of the least square error of the current model divided by the root node error. Models with relative errors distinguished by one standard error (1 SE) are colored in green. This means that all models in green from Figure 3a,b can be considered as a set of competing models. From Figure 3a, it is evident that the model with a minimum relative error of 0.321 has 13 terminal nodes. We denote it by M 1 . In addition, two models were analyzed-the M 2 model with a minimum number of 9 nodes and the maximum M 3 model with 22 nodes. Besides these, in the same way we denote the model with a minimum relative error of 0.310 and 11 terminal nodes with M 4 , the model with 9 terminal nodes with M 5 , and the model with 19 terminal nodes with M 6 , respectively. Table 3 contains summary statistics for the competing six CART models M 1 , M 2 , . . . , M 6 that are selected. We compare the two optimal models M 1 and M 4 . Although model M 4 has larger constraints of m1 = 6 and m2 = 3, it shows the highest value of R 2 test = 0.698, and the minimum value of RMSE Test = 2.694. At the same time, this model is inferior to the prediction statistics, especially with the relatively large RMSE = 1.517, which is 16% higher than that of M 1 . From the first group of "finer" models, M 1 is comparable to Mathematics 2021, 9, 62 9 of 17 M 4 with R 2 Test = 0.621 compared to 0.698 for M 4 (1%), RMSE Test = 2.743, with a small difference of 0.049, or less than 2%. The goodness-of-fit R 2 Learn = 0.928 of the M 1 model is 3% higher than the M 4 statistic (0.902). Next, we make a comparison between M 1 and M 6 . The statistics of these two models are almost identical, but the M 6 model is more complex as it contains 19 terminal nodes compared to the 13 of M 1 . From the set of competing models considered, we should reject M 2 and M 5 . This is due to the most unsatisfactory statistics-the smallest R 2 and the largest RMSE Learn of the prediction. Model M 3 has a less favorable relative error compared to M 1 (with 4%), outperforming M 1 narrowly for the main indicators (4) by 1 to 9%. Since the M 3 model is the most complex, having 22 terminal nodes, compared to the 13 of M 1 , we have to apply the parsimony principle [34,35] (see also Figure 3a). We will further consider CART models M 1 and M 4 . Note that all Theil's coefficients are sufficiently small.  Table 3 contains summary statistics for the competing six CART models M1, M2, …, M6 that are selected. We compare the two optimal models M1 and M4. Although model M4 has larger constraints of m1 = 6 and m2 = 3, it shows the highest value of R 2 test = 0.698, and the minimum value of RMSE Test = 2.694. At the same time, this model is inferior to the prediction statistics, especially with the relatively large RMSE = 1.517, which is 16% higher than that of M1. From the first group of "finer" models, M1 is comparable to M4 with R 2 Test = 0.621 compared to 0.698 for M4 (1%), RMSE Test = 2.743, with a small difference of 0.049, or less than 2%. The goodness-of-fit R 2 Learn = 0.928 of the M1 model is 3% higher than the M4 statistic (0.902). Next, we make a comparison between M1 and M6. The statistics of these two models are almost identical, but the M6 model is more complex as it contains 19 terminal nodes compared to the 13 of M1. From the set of competing models considered, we should reject M2 and M5. This is due to the most unsatisfactory statistics-the smallest R 2 and the largest RMSE Learn of the prediction. Model M3 has a less favorable relative error compared to M1 (with 4%), outperforming M1 narrowly for the main indicators (4) by 1 to 9%. Since the M3 model is the most complex, having 22 terminal nodes, compared to the 13 of M1, we have to apply the parsimony principle [34,35] (see also Figure 3a). We will further consider CART models M1 and M4. Note that all Theil's coefficients are sufficiently small. Table 3. Summary statistics of the selected regression CART models for assessment of students' achievements.  4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Relative Error    Table 4 presents the values of the relative importance of the nine factors studied on the Exam points according to their participation in the exploratory Learn CART trees. Here, too, stability is clearly visible, with small differences. The largest factor of importance (100 relative points) is the Project factor-the points from the exam obtained for solving the small practical project. The next main factors, in descending order of their relative importance, are the scores on Math_An (93-95 relative units) and A2_30 (50-54 units). The points from solving the problems in the Stat statistics have a small impact, within 20-22 relative units. Table 4 also shows that the influence of the predictors obtained in the chosen optimal model M 1 is almost identical to that of model M 3 , as well as the other maximal model M 6 . The calculation of the coefficients of importance in Table 3 is performed using sequential aggregation. At level 0, the mean value of target (Exam) as predicted by the model is calculated for the entire sample and the RMSE is calculated. At the first split (as shown in Figure 4), the CART algorithm selected the Math_An as a splitter predictor and its threshold value is 11.25. After the split, the predictions (mean values) are calculated in both child nodes along with their RMSEs. The relative improvement of the current accuracy achieved is calculated against the root, and the value obtained is added to the coefficient of importance of the predictor. The process is repeated until the tree is built. The predicted scores obtained from the six CART models in Table 4 were statistically examined with a Wilcoxon signed rank test for paired samples. It was found that all Wilcoxon signed rank tests were statistically unsignificant and the differences of each two models had symmetrical distributions. This is an indicator that the models do not differ significantly from each other.   The rule for splitting each root case into two is "cases with a value of Math_An <= 11.25 go into the left child node and the rest into the right one". Here the threshold value is Math_An = 11.25. At the next (first) level, the splitting variable for both nodes is Project with the same splitting rule for the cases: "Project <= 4.5" etc. The process which generates the CART tree M 1 , shown in Figure 4, is stopped to a depth 5, having 13 terminal nodes, marked with a colored square. The value predicted by the model for each case of the given terminal node is the arithmetical mean of Exam points from the cases classified in that node.
The tree of the M 4 model shown in Figure 5 has an identical structure as the tree of the M 1 model from Figure 4. An almost complete match is observed; therefore, stability of the CART models is obtained.    The predicted scores obtained from the six CART models in Table 4 were statistically examined with a Wilcoxon signed rank test for paired samples. It was found that all Wilcoxon signed rank tests were statistically unsignificant and the differences of each two models had symmetrical distributions. This is an indicator that the models do not differ significantly from each other. Figure 6 shows the actual versus predicted values of the Exam obtained by model M 1 .

Results from the CART Ensembles and Bagged Models
The same values of the hyperparameters as in the CART models were used in the construction of the CART-EB models. For the minimum number of cases in a parent node (m1) and the minimum number of cases in a child node (m2), two options were set: 5-2 and 6-3, respectively. All trees in the ensemble were trained with 10-fold cross-validation. Bagged trees were built with repeated cases. Due to the small sample size (n = 68), the ensembles were built with 10, 15, 20 and 25 trees. The family of these models are denoted by E1, E2, …, E8. Of these, the first group E1, E2 and E3 are built using values of the hyperparameters m1-m2 equal to, respectively, 6-3. For the second group with the remaining models, these parameters are 5-2. The first models of each group, namely E1 and E4, are initial CART trees. Table 5 presents the statistical indicators obtained for these models. The models E2 and E3 show relatively low R 2 Test and the largest errors RMSE Test and Learn. Also the corresponding Theil's coefficients are larger than those of the models with m1 = 5 and m2 = 2. Model E6 has the best test statistics with R 2 Test (0.922) and RMSE Test (1.838). The next model E7 has the best indicators for the training sample-R 2 Learn (0.961) and RMSE

Results from the CART Ensembles and Bagged Models
The same values of the hyperparameters as in the CART models were used in the construction of the CART-EB models. For the minimum number of cases in a parent node (m1) and the minimum number of cases in a child node (m2), two options were set: 5-2 and 6-3, respectively. All trees in the ensemble were trained with 10-fold crossvalidation. Bagged trees were built with repeated cases. Due to the small sample size (n = 68), the ensembles were built with 10, 15, 20 and 25 trees. The family of these models are denoted by E 1 , E 2 , . . . , E 8 . Of these, the first group E 1 , E 2 and E 3 are built using values of the hyperparameters m1-m2 equal to, respectively, 6-3. For the second group with the remaining models, these parameters are 5-2. The first models of each group, namely E 1 and E 4 , are initial CART trees. Table 5 presents the statistical indicators obtained for these models. The models E 2 and E 3 show relatively low R 2 Test and the largest errors RMSE Test and Learn. Also the corresponding Theil's coefficients are larger than those of the models with m1 = 5 and m2 = 2. Model E 6 has the best test statistics with R 2 Test (0.922) and RMSE Test (1.838). The next model E 7 has the best indicators for the training sample-R 2 Learn (0.961) and RMSE Learn (1.278). As the number of terminal nodes increases, the statistics become less favorable, as seen from those of model E 8 . Therefore, in further analysis we consider the models E 6 and E 7 . The predictive properties of model E 7 are illustrated in Figures 6b and 7. The results of the Wilcoxon signed rank tests show that the built CART-EB models do not differ significantly from each other.

Combination of CART and CART Ensembles and Bagged Models Using MARS
To improve the quality of prediction, MARS regression models of the dependent variable Exam were built using the selected four best models M 1 , M 4 , E 6 and E 7 as predictors. Due to the almost linear behavior of the Exam curve, only a linear MARS method was applied. The MARS models generated are denoted by MM 1 , MM 2 and MM 3 . Furthermore, by finding the importance of individual regression trees models, it can be determined which of them has the best predictive properties. The models were trained with 10-fold cross-validation. The summary statistics of the models obtained are presented in Table 6. The statistics of the built models in Table 6 are almost the same. Using all four models (see MM 3 ), we found that the greatest influence was exerted by model E 7 (100 relative units),

Discussion with Conclusions
This study presents, models and analyzes results from a competency-based exam in applied mathematics together with the results of preparatory academic activities. They are modeled using three ML methods-CART, CART ensembles and bagging, and MARS.
The CART method was first applied. Table 3 indicates that the six CART models se-

Discussion with Conclusions
This study presents, models and analyzes results from a competency-based exam in applied mathematics together with the results of preparatory academic activities. They are modeled using three ML methods-CART, CART ensembles and bagging, and MARS.
The CART method was first applied. Table 3 indicates that the six CART models selected have high goodness-of-fit indicators with coefficients of determination R 2 over 90% and RMSE around 1.5, or within 5%. As optimal models, we selected M 1 and M 4 . M 1 shows R 2 = 0.928, RMSE = 1.298, as well as a small value of the Theil's forecast accuracy coefficient U I I = 0.0089.
The CART models allow for determining the influence of individual educational components on exam results for the specific subject of applied mathematics. The importance of the predictors in the M 1 model in relative units is Project (100), Math_An (95), A2_20 (54), CW1_30 (38), A1_12 (36) etc., as presented in Table 4. Therefore, the solution of the project and the tasks of mathematical analysis determine to the greatest extent the achievements of the students. Other important factors are the grades from the second homework, the first control test etc. The influence of students' success with problem 2 in statistics (variable Stat) reaches only about 22% relative weight. This indicates an unsatisfactory level of theoretical knowledge in probabilities and statistics. Using these results with Table 1 of competencies, it is apparent that following a reduction of competencies for the two problems from Stat, the exam test can be assessed mainly by the acquisition of competencies with the most "+" and "0". These competencies are C3, C4, C6 and C8.
The data were also modeled using the CART ensembles and bagging method. Six CART-EB models were built. The analysis of these models showed that the best statistical evaluation indicators were for models E 6 and E 7 , with 15 and 20 trees in the ensemble, respectively. As an optimal model, E 7 achieved R 2 = 0.961, RMSE = 1.278, as well as a Theil's accuracy coefficient U I I = 0.0086. Although the EB models do not derive the influence of individual predictors, they serve as confirmatory and complementary to CART models for predicting student achievement.
The idea arose to combine the four best models-two CART and two CART-EB models for regression of the dependent variable Exam. A linear MARS method was applied. Three models with predictors M 1 , M 4 , E 6 and E 7 were built. The models showed very close goodness-of-fit indices. The final model selected, MM 1 , used M 1 and E 7 and achieved R 2 = 0.972, RMSE = 0.804 and Theil's accuracy coefficient U I I = 0.0034. This model showed a significant improvement in the prediction of the lowest and highest exam scores.
The results obtained are comparable to those obtained by us in [6], where regressiontype CART models were constructed to predict the final exam results in linear algebra and analytical geometry for students in two other specialties at the same university, using a short mathematical competency test and mid-term test results. In [6], the models reached to fit the actual data with R 2 = 93%. The results obtained here are also similar to those in [11,13].
It should be noted that for the first time the CART ensemble and bagging method for data from education was applied. In addition, for the first time, combining the predictions of the individual competing models using the MARS method is used. The combined MARS models obtained exceed the qualities of the predictors included in all statistical indicators.
Essentially, the approach for modeling the data we present consists of two consecutive steps: (1) building regression trees and regression trees ensemble models (Sections 3.4 and 3.5) using the initial predictors, and (2) building MARS models (Section 3.6), where the predictors are the resulting variables with values predicted in step (1). Based on this, we can formulate some advantages and potential capabilities of this approach, namely:

•
As part of the family of regression trees, the CART and CART-EB methods we use can successfully deal with uncertainty, qualitatively stated problems and incomplete, imprecise or even contradictory data sets, as stated in [10]. These can process both nominal and numerical data, handle multidimensional and multivariety data, easily identify patterns and nonlinear complex relationships between the predictors, thus facilitating the interpretation of models. • At step (1), the variable importance of initial predictors in models is assessed directly, which allows us to ignore/screen unsignificant predictors. This would be especially useful in the case of a large number of predictors for reducing the dimensionality of the problem. • At step (2), numerical-type data are used, enabling the implementation of the MARS method, whereby it is combined with the predictions from (1). In particular, the results of our study showed that MARS models improve the predictions of the smallest and largest values of the target variable, including its outliers. In this manner, it is possible to eliminate or reduce the effect of this type of flaw, typical for all ensemble methods.

•
The importance of the models from step (1), used as predictors, is determined with the help of MARS in step (2). In this manner, the best regression trees model is identified. Indirectly, if it determines the influence of the initial predictors, additional useful information may be obtained to interpret the overall statistical analysis.
In addition to this, the proposed research method has several limitations. The models can be built if at least 50 data records are available. Furthermore, the CART-EB algorithm used does not deduce the relative importance of the predictors in the model, which makes any direct interpretation of the results difficult. Another disadvantage typical for all ML methods is that results depend to a certain degree on accuracy criterion, variable and model selection.
The proposed methods and models in this study can be used to direct and improve exam tests for students in subsequent courses, making changes at the tutor's discretion. Changes can be made both in the educational content, tests and the academic programs, and the management of other basic factors that influence grades, as determined using the models. This approach promises to find hidden relationships between factors contributing to learning and teaching, and also benefits tutors/authorities by making predictions and helping them make better decisions. Future research can be planned in this regard. By applying the approach we propose, another crucial practical issue for further research is determining the factors and predicting which students may drop out.
We can conclude that the use of small practical projects as a competency-oriented approach and combined with the application of powerful ML methods for processing the data set related to the learning process are effective for assessment of students' knowledge and competencies in mathematics.