Predicting Student Performance in Higher Education Institutions Using Decision Tree Analysis

understandable


I. Introduction
R ECENTLY, the increasing volume of data and using them to increase students' performance is one the major challenges of higher education institutions (HEIs). Higher education institutions are generally interested in the success of students during their study [1,2]. Higher education institutions, in a lifetime of teaching, have a large data set of student information stored in their databases. However, storage is not a problem. Handling data, extracting relevant patterns, and discovering knowledge stored in the massive database are tremendously difficult. Accordingly, data mining can be considered as a very promising tool to attain these objectives [3].
Data mining is used to detect and extract relevant as well as worthy information from a very huge volume of data. Currently, this process has acquired a great deal of attention as well as concerns from the information industry and society. This technique is also receiving significant attention in data analysis as well as it been recognized as a newly emerging tool for analysis [4]. Predicting student performance is a very beneficial in providing students as well as lecturers with the necessary assistance in the learning process. Predicting the possible outcomes of the learning process on the basis of the results of prediction can help an organizations change the outcome of new students and performance by adjusting the factors that contributed to the past performance [5].
Predicting of student performance also aids educational planners and decision makers and administrators to adequately plan for changes in student population in any direction (i.e., find factors which increase performance or decrease it). However, coming up with a manual set of rules required to predict student performance is commonly difficult. For these reasons, there is an extreme need to achieve these goals [6]. Particularly, data mining tools became very popular among researchers and users because of their ease of use and availability such as the following tools (most of them are available online) Microsoft Excel, SPSS, Weka, Protege (a knowledge acquisition system) and Rapid Miner. A number of these tools (e.g., MS Excel-based) are freely available to HEI professors with which they can benefit with their existing knowledge of the Excel application. The accessibility, availability, ease of use and understandability are the most reasons for the inclusion these tools in the research. Weka is chosen as a data mining analysis tool for supporting conclusion because of its highly readable and understandable results [7].
We aim to analyze collective student information through a questionnaire (based on LimeSurvey and Google Form) as well as classify the collected data to predict and categorize student performance. We also seek to elucidate the different factors that affect student performance (success and failure rates) in relation to other variables in the data set of students by applying decision tree algorithms. The work seeking to find the possibility of depending decision tree algorithms' results in support academic decisions related to improve academic performance and give a proper road map for students and lecturers. Therefore, this paper uses multiple classification and prediction methods to confirm and verify the results with multiple Weka classifiers. We select the best result in terms of accuracy and precision based on performance results. This newly discovered knowledge can help students as well as instructors in carrying out enhanced educational quality. Identifying possible underperformers at the beginning of the semester/year and increasing the attention allotted for them will aid their educational process as well as improve students' grades. In fact, underperformers, as well as good performers, can benefit from this study by exerting extra effort in conducting quality projects and research through obtaining help and attention from their instructors.
The paper organized as follow: in section 2, the recent related works to academic performance are presented; section 3 explains some proposed decision tree algorithms that used in the model (J48, REPTree and Random Tree). Section 4 lists and explains the proposed model while the final section explains the conclusion points that concluded from the model.

II. Related Work
Natek Srečko, and Moti Zwilling [8] focused on data mining for small data sets of student and answered the research questions by comparing two data mining tools. The best model was chosen based on results evaluation. They compared their prediction of student performance for the academic year 2012/2013 using the attribute: ''Final grade'', with the actual Final grade of the same students. After that, the authors chose the best parameters for filling the data set from sample data mining techniques. The conclusions of the work were very promising and encouraging for HEIs in terms of integrating data mining tools as very important part of their knowledge management systems in higher education.
Alaa Khalaf Hamoud [9] constructed a model on the basis of an experimental data set on Portuguese students from two courses (Mathematics; 395 instances) and Portuguese (Portuguese language course; 659 instances). Paulo Cortez and Alice Silva, from the University of Minho, Portugal, collected and analyzed the data. Three decision tree algorithms (J48, RepTree, and Hoeffding Tree (VFDT)) were applied to this work. The results confirmed that the J48 algorithm was most suitable for classifying and predicting the willingness of students to complete higher education and success in their courses.
Mishra Tripti, Dharminder Kumar, and Sangeeta Gupta [10] used different classification algorithms to construct a prediction model of performance on the basis of the academic and social integration as well as various emotional students' skills which other studies, thus far, have not considered. The J48 (implementation of C4.5) and Random Tree algorithms have been applied to students' records of colleges affiliated with Guru Gobind Singh Indraprastha University to predict third semester performance. Clearly, Random Tree was more accurate compared with J48 in performance prediction. Muluken Alemu Yehuala [11] built models and tested them using a sample data set of 11,873 regular undergraduate students. WEKA application software was used for the building model and analyzing it. The results offered supportive and constructive recommendations to academic decision makers in universities to enhance supporting and making decisions. Furthermore, the results will aid in rebuilding and modification process of curriculum structure to improve student academic performance. Based on previous research results and findings, students are able to decide about their preferred field of study before they register and enroll. These findings verify that the Ethiopian Higher Education Entrance Certificate Examination (EHEECE) results, number of courses per semester, gender and number of students in the class, as well as the majority of study are the consedrable factors that affect student performance. Accordingly, based on the results, the level of student success can be controlled. Therefore, preventing educational institutions from incurring serious financial strains is possible.

III. Decision Tree
A decision tree is one of data mining classification technique which used to build a top down tree like model based on the attributes of a given data set. Moreover, a decision tree is a predictive modeling technique that used to predicate, classify, or categorize given data objects on the basis of a previously generated model using a training data set with the same features (attributes) [12].
The structure of the generated tree includes a root node as well as internal and leaf (terminal) nodes. The first or root node is the first top node which has no incoming nodes and one or more outgoing edges. An internal or middle node has one incoming edge and one or more outgoing edges, where each internal node denotes a test on an attribute and each edge represents a result of the test. Finally, the leaf node represents the final suggested (predicted attribute (label) of a data object [13].
The decision tree classification technique is performed in two stages [14]: tree building and pruning. Tree building stage follows top to down manner. During this stage, the tree is recursively partitioned until the data items belong to the same class label. This stage is very tedious and consumes a lot of computation processes since the training data set is repeatedly reprocessed. Tree pruning stage is done in bottom up manner. This practice is performed to improve the prediction and classification accuracy results of the algorithm by minimizing over fitting (noise or considerable detail in the training data set). Over fitting may result misclassification error in the decision tree algorithm. Decision tree offers many advantages to data mining, some of which are the following: • Decision tree can be clearly understood by the analyst and any end user.
• Decision tree can handle different kinds of input data, namely, nominal, numeric, and textual.
• Decision tree can process erroneous data sets or missing or uncompleted values.
• Decision tree has a high level of performance with a minimal amount of effort and time.
• Decision tree can working in data mining applications over a variety of platforms [15].
In this study, three decision tree algorithms were used on collective student data, namely, J48, Random Tree, and RepTree.

A. J48
J48 is used for both of classification and prediction operations. For classification, J48 was chosen (on the basis of the C4.5 algorithm from machine learning) given that this algorithm is one of the most used in Weka tools that offers stability among the precision, speed, and interpretability of results. In addition this algorithm classifies data in the form of a decision tree with which we can easily identify weak students. Classification learning as a part of Educational Data Mining (EDM) was also implemented to predict the performance of students [16].

B. REPTree
The reduced-error pruning (REP) tree as a decision tree-learning algorithm can be considered a fast classifier based on the principles of computing information gain with entropy and minimizing the error arising from variance [17]. REPTree generates many of trees and applies regression tree logic in changing the iterations. Subsequently, the algorithm selects the best one from all spawned trees. Based on variance and result information, the algorithm builds a regression decision tree. Further, this algorithm prunes the trees based on using back-fitting method and reduced-error pruning. As in C4.5, this algorithm can also works with missing or uncompleting values by splitting the identical instances into pieces [18].

C. Random Tree
The random tree algorithm selects a test on the basis of a specific number of random features at each node without pruning. Commonly, Random trees refer to those randomly built and have nothing to do with machine learning. The merit of building a random tree is the efficiency of training and minimal memory requirements. To create a random decision tree, Random Tree algorithm uses only one pass over the data [17,19].

A. Data Preprocessing 1) Data Collection
The questionnaires that were built on Google Forms and an open source application (LimeSurvey) were used conduct a survey on students from the College of Computer Science and Information Technology (CSIT), University of Basrah. The first questionnaire (built Lime Survey) was used to locally collect data from CSIT, whereas Google Forms was used to collect data over the Internet. A total of 161 questionnaires were completed after combining the CSV files from Google Forms and the LimeSurvey questionnaire. The research sample (161 answers) represents an acceptable sample CSIT population with a 10% margin of error for the result [20]. Table I shows the description of all answers to the questionnaire. Question possible value was shortened and converted from a nominal to a numeric type for ease of use and understanding. Response values of questions (Q18-Q60) are of the form {1; 2; 3; 4; 5} where 1; 2; 3; 4; 5 represents the answers "Strongly Disagree'', "Disagree", "Neutral", "Agree", "Strongly Agree" respectively. The initial step in data preprocessing is preparing data by removing rows with empty values and converting data for evaluation as well as processing. A total of 8 rows with empty values are under one or more columns. After removing these rows, we obtain a total 151 answers. Subsequently, row values are converted for data processing in the Weka 3.8 tool with its built-in classifiers.

2) Reliability
Reliability is a feature of data set used for characterizing the overall uniformity of a measure. A measure is said to have a high level of reliability if it gives similar results under consistent conditions. For example, the measurements of people's height and weight are generally highly reliable [21]. In statistics, the coefficient alpha method is the most frequently used for the measuring of internal uniformity that is used as an indicator of reliability for the dependent variable of the study. Based on [22], it can be said that the value 0.7 indicates satisfying internal consistency in reliability. Table II shows that the coefficient alpha is 0.85 for the scaled variables that contain 60 items and 161 respondents.

B. Attribute Selection
Finding the most correlated attributes (questions) to the final class (Failed) and how much they affect the final class is important. Significantly, this stage shows the average correlation of the attributes to the final class. In turn, this average will help us find the questions with low correlation and remove them to improve the accuracy of the results. Questions with high correlation can be considered as recommendation points for students and academic staff.
In this step, the filter CorrelationAttributeEval is used to evaluate the correlation between the class and other attributes. This step is crucial because we want to find the most closely related attributes that affect the class and ignore the less-related attributes from the model. The attribute evaluator algorithm (CorrelationAttributeEval) evaluates the worth of an attribute by measuring the correlation (Pearson's) between the attribute and the class.  Table III shows the average correlation between questions/attributes and the final class with the mode of evaluation (10-fold cross validation) to ensure a high level of accuracy. The questions were arranged in an ascending order on the basis of the average rate of correlation to the final class. Questions with high average rates are most correlated to the final class. Later, the last twenty questions will be removed to increase the accuracy of the result.

C. Applying Algorithms
The Weka tool provides built-in algorithms that help in the application of different classifiers and obtain results in an easy and flexible process. Three algorithms will be used in this stage (J48, Random Tree, and RepTree) before (less correlated questions) and after the removal of attributes. The attribute removing process is profoundly helpful in discovering the effect of these attributes on performance and how they can increase or decrease accuracy. Table IV shows details on performance before the removal of attributes from the data and further shows the three algorithms (J48, Random Tree and RepTree) according to their attributes, namely, True Positive (TP), False Positive (FP) rates, Precision, and Recall. The first row reveals the performance of the algorithms without an attribute filter (CorrelationAttributeEval). The second presents the performance after applying the attribute filter combined with the decision tree algorithm. A meta classifier (Attribute Selected Classifier) allows us to combine the classifier algorithm with the attribute evaluator and the search method to get highly accurate results.
The effectiveness of the attribute filter on the algorithms (Random Tree and RepTree) is evident. TP Rate, Precision, and Recall in both algorithms are increased, while FP Rate decreased after applying the attribute filter with algorithms (Random Tree and RepTree) while it decreased with the J48 algorithm.

D. Result Evaluation
Result evaluation is the final stage in the model construction process. Based on Tables IV and V, the accuracy of the J48 classifier after the removal of the less correlated attributes is apparently higher compared with that of the RepTree and Random tree classifiers. The TP rate attribute takes a value of 0.634, which is the highest value for this attribute, whereas Precision (refers to a positive predictive value) also gets the highest value at 0.629. Recall (refers to a TP rate) with a value of 0.634 and an FP rate with a value of 0.409 makes the RepTree classifier as the nominated algorithm for the model (see Fig. 2).  Fig. 3 shows the result tree of the J48 algorithm, which is the best approach to show the most correlated factors/questions to the final class (failed). Each node in the tree is a question, and its branches are drawn on the basis of the answers. Each node can be considered a decision. The tree can be used also for predicting failure or success by answering questions in the nodes of the tree.

V. Conclusion
This study aims to explore and test the process of applying the decision tree algorithms with questionnaire of students to seek the factors that affect student success/failure. Data mining algorithms and especially decision tree algorithms can be the best solution for predicting students' performance because they provide high accurate results and give road map for both academic stuff and students. Based on the results of the model, it can be said that a number of factors (attributes) can affect the accuracy of the result tree and overall student academic performance. Attributes such as Age, Work, Gender, Stage, and Status had less effect on student success, whereas GPA, Credits, List Important Notes, Father Work, and Fresh Food had the most significant effect on the final class. The attribute evaluator algorithms can be used to find closely related questions that negatively or positively affect the success of students. The questionnaire contains many unimportant questions that can be discovered by the data-mining algorithms. Large data set and number of attributes in the data set affects the accuracy of the decision tree. The model can be utilized by students and academic staff to decide which questions/answers will enhance academic performance and improve the success of institutions.