A Review on Predictive Modeling Technique for Student Academic Performance Monitoring

. Despite of providing high quality of education, demand on predicting student academic performance become more critical to improve the quality and assisting students to achieve a great performance in their studies. The lack of existing an efficiency and accurate prediction model is one of the major issues. Predictive analytics can provide institution with intuitive and better decision making. The objective of this paper is to review current research activities related to academic analytics focusing on predicting student academic performance. Various methods have been proposed by previous researchers to develop the best performance model using variety of students data, techniques, algorithms and tools. Predictive modeling used in predicting student performance are related to several learning tasks such as classification, regression and clustering. To achieve best prediction model, a lot of variables have been chosen and tested to find most influential attributes to perform prediction. Accurate performance prediction will be helpful in order to provide guidance in learning process that will benefit to students in avoiding poor scores. The predictive model furthermore can help instructor to forecast course completion including student final grade which are directly correlated to student performance success. To harvest an effective predictive model, it requires a good input data and variables, suitable predictive method as well as powerful and robust prediction model.


Introduction
Nowadays, student's academic performance becoming more crucial especially in a higher learning institution. Expanding students performance in learning are the main goal in all academic institutions [1]. The main purpose of this study is to review current research activities related to academic analytics focusing on predicting student academic performance. Student academic performance provides valuable information for educational authorities which offers diverse opportunities for decision making to and assisting the students to achieve a great performance in their studies [2].

Student Academic Performance
According to the definition provided by Tuckman [3], student performance used to label the observation of skills, concepts, understanding, knowledge and ideas. The success of student in educational institutions is measured by academic performance or how well students meet the standards that set out by the educator and the institution itself. Besides, student performance can be assessed to which extent a student, teacher and institution have achieved their short and long term educational goal [30]. There has been a surge of interest in utilizing the large amount of information collected in educational system [4]. There are certain problem arises when huge volumes of data are store in student database which is known as Educational Data Mining (EDM). According to Abu Saa [49], EDM is the field of study about mining an educational data to find out interesting patterns and knowledge in educational organizations. EDM is concerned with developing methods and analyzing educational content to enable better understanding of students performance. EDM focuses on the collection, archiving, and analysis of data to enhance teaching and learning process [40].
These massive student data normally are large to be handled, highly unbalanced and can causes complexity [5]. The main objective of EDM is to discover new knowledge and hidden pattern exists in student data [6]. Useful information can be used to predict student performance and assist educator in providing effective teaching method [7]. Predicting student performance is a difficult task because most of the predicting method does not deal with dynamic environment [8]. Currently, lack of existing an efficiency and accurate prediction model on student performance monitoring is not well being address. The lack of analysis to determine the variables or attributes that impacted the most is also one of the major issues in student academic performance. Due to that, many researchers proposed various methods to develop the best student performance model using variety of students data, techniques, algorithms and tools [6]. This could gives valuable information for educational organization to improve quality of services and make insightful direction on decision making [9]. For instructor, the information will offer huge opportunities to improve their quality of teaching [2]. Accurate prediction of student performance will be helpful in order to provide a guidance in learning process [10] and will benefit to student in taking positive steps and avoiding poor performance in their scores [8]. Students performance have been measured by a few selected variables as a parameter. Most of the researchers used student cumulative grade point average (CGPA) as a parameter because it has very substantial and significant values for future educational as well as career mobility [7]. It's also been a benchmark whether a student will be graduating on time without reappearing in the extended semester and grasp career opportunity at a right time [8].

Predictive Analytics
The rise of analytics in recent year is sensible. Analytics is the process of using computational methods to discover influential pattern in data. The goal of analytics is to gain insight in decision making [11]. The idea of using the analytics is not a brand-new and being represented by different sector including data analysis, neural network, pattern recognition, knowledge discovery, data mining and data science [11]. According to the definition provided by Abbott [11], predictive analytics is the process of discovering interesting and meaningful pattern in data. A further definition is given by Barneveld, Arnold and Campbell [12] describes that predictive analytics is a process that serves all level such as higher education and deals with extracting information using diverse technologies to reveal patterns and relationship in data. Predictive analytics could discloses latent relationships that might not be clear with a descriptive model such as demographic and student completion rates [13]. Predictive analytics can provides institution with intuitive and better decision based on data. It can be used on monitoring early stage of student semester and make intervention to increase their performance. The predictive analytics furthermore can help instructor to forecast course completion including student final grade which are directly correlated to student performance success [13]. In educational data mining, predictive modeling mainly used in predicting student academic performance [7]. Predictive modeling is a process that involved running one or more algorithms on a dataset where prediction is going to be carried out. It is the process of creating, testing and validating the model to best predict the probability of the outcome [14]. Predictive modeling is born whenever data is used to train predictive modeling technique [15]. In this research of student performance, there are many predictive modeling techniques available including Bayesian Network (BNs), Decision Tree (DTs), Artificial Neural Network (ANNs), Support Vector Machine (SVMs), clustering, association rules and others.

Academic Analytics (AA)
To reflect the role of predictive model at institutional level, academic analytics is essential components that enclosed all activities in higher education [16]. Academic analytics (AA) is an emerging of data mining technique that applied to monitor student performance by combining statistical analysis, institutional data and predictive modeling that can be used to change academic behaviour [17]. Analytics goes beyond traditional reporting systems by providing decision-support capabilities [18], gathering and organizing information, as well as analyzing and manipulating the data [13]. There are a few of the applications on AA that been used in student academic performance such as performance prediction, course recommendation, career path planning, behaviour detection and more [19]. The implications of student performance prediction are useful in many contexts. By providing accurate prediction, it will gives significance influence on faculty level [20] to detect undesirable student result and improving the forecasting outcome [8] whether students will survive to finish their study or not [20]. It is clear that academic analytics is driven in gaining momentum and great approach to predict student performance, enables the target of student intervention and improve student success [21].

Student Performance Applications
A lot of researchers have shown a several practical applications on student performance pertaining to performance prediction, course recommendation, career path planning and more.

Performance Prediction
The performance prediction model can be built by applying data mining technique to an available collected data such as student CGPA / GPA and other variables [22]. The previous work on student performance prediction used methods such as Bayesian Network [2], [10], [22]- [27], Decision Tree [6], [9], [22], [28], Support Vector Machine [1], [8], regression [29] [39] and others. Some of the models had categorized students into a few categories such as below satisfactory, satisfactory, and above satisfactory. Student performance can be predicted using student interaction with other students, instructors or teachers [19]. By using performance prediction, underperforming students can be well identified hence, provide them with relevant academic guidance [19] to improve their study progress as well as final grade.

Course Recommendation
Previous studies have reported that student performance prediction will benefit to institution by improving learning process and course recommendation. Course recommendation can be proposed to student by analysing their previous result on CGPA or result on entry mode. By using course recommendation, it will identify course based on student qualification and interest [19]. This recommendation will ensure that students are not misguided in choosing the field that are equivalent to their result and their interest [19]. Another researcher however looked on generated recommendation system that could give feedback and benefit on student performance [21]. The subject or course recommendation has been inspired by Austin Peay State University by developing subject recommendation system call 'degree compass' which pair current students with the best course, that fit their talent and upcoming study program [31]. The results from 'degree compass' algorithm are successfully predicts more than 90% of subject accuracy [31]. In future time, student reaction plan towards recommendation should be considered as one of important application input in student academic performance.

Career Path Planning
Another studies have acknowledged that student academic performance had benefited higher educational institutions. Most of the researchers are using CGPA as a parameter to measure student performance. CGPA is a prominent value for future education and can be highlighted as a medium to determine potential candidate for job hunting as well as students career mobility [7]. Usually, the performance of CGPA will consider as an important criteria for student to complete their study and get back to career on time as planning [8]. The student performance application also suitable for predicting student career and statistic information for relevant occupation and job seeking opportunity [31].

Methodology and Performance Modelling Technique
To unveil hidden information and knowledge from student's data, published studies have been identified a few of elements such as methods, techniques, models, variables, and tools which need to be considered as key conveyance in order to construct and assemble the best predictive model for student academic performance.

Methodology on Student Academic Performance
Student academic performance has been studied by many researchers using the numerous data mining techniques in supervised and unsupervised learning. To predict student performance, methodological approach has been reviewed in this paper based on a few stages. H. Itoh, Nishiwaki and Funahashi [25] have been using Bayesian Network technique to support academic guidance. The method for identifying student who requires guidance consists of five phases to define target variable, explanatory variable and convert original data using normalization phase. To improve the accuracy, feature selection (FS) and principle component analysis (PCA) have been used for synthesis useful information from student database and employ PCA technique to reduce unnecessary variables. The method assessment phase will be used to calculate the accuracy, recall, and precision as criteria estimation of predictive model. On the other hand, Sugiharti, Firmansyah and Devi [32] described the methods into three stages which are data collection, technique of data collection and data analysis. The authors had applied Naïve Bayes classifier to make prediction about student graduate on time (GOT) based on the performance from 1 st semester until 4 th semester. Christian and Ayub [20] have managed to used hybrid techniques known as Naïve Bayes Tree or NBTree classification to explored on predicting student academic performance. To validate the proposed method, Sorour et al. [1] used k-means clustering technique to improve basic approached using latent semantic analysis (LSA). The methods for predicting student performance are from free-style comment. There are about five initial stages on the process which is term weighting comment, LSA, noisy data detection (test phase), similarity measurement method and overlapped method. By adding an overlapped method, the result have shown increasing in accuracy from 73.6% to 78.5%. In developing student academic performance prediction, Ahmad, Ismail and Aziz [6] had using a few of techniques such as Naïve Bayes, Rule-Based and Decision Tree to discover the best classification model for prediction. The method had been generated on a seventh steps consist of data integration, data processing, training target data, testing until last step which is taking an action. Ahmad, Ismail and Aziz [6] have developed student performance prediction process in the first semester of study programme. From observation, they could be used a few of historical data to access study background such as SPM grade to predict from the very early of the semester. Xu, Moon, and Schaar [33] have developed novel method to predict student performance using ensemble learning technique with the combination of exponential weighted average forecaster (EWAF) implemented using different of algorithm. In difference circumstances, Rebekah and Ramakrishna [34] had predicted the student first year result using internal marks by employing Bayesian estimation technique. There are three sections which are first phase is data collection for calculating mean and standard deviation. The second phase using Bayesian estimation to combine a few direct of sample evidence, and the last phase is the posterior distribution to calculate the student final marks. From the above studies, it can be concluded that researchers has been using several different methods to predict best accuracy of student performance such as early prediction [25], [6], [34], final prediction [2], [20] and novel predictive method [33].

Predictive Student Performance Technique
In order to build the predictive modelling, there are several tasks used, which are classification, regression and clustering. Among the algorithms used are Bayesian Network (BN), Decision Tree (DT), Artificial Neural Networks (ANN), Naive Bayes (NB), K-Nearest Neighbour (KNN) and Support Vector Machine (SVM) and others.

Bayesian Network (BN)
Student Academic performance has been studied by many researchers using machine learning methods and Bayesian Network is one of the popular proposed methods used by the researchers. Bayesian Network is graphical models that expresses the dependence relationship between variables. According to Ting, Cheah and Ho [35], constructing a BN model is not a straight forward task and it's inherent uncertainty on prediction problem [22]. Sharabiani et al. [2] has been used BN to forecast student academic performance of 300 engineering students by assuming joint probability distribution (JPD) of variables. Further research made by Misiunas et al. [27] had shown that BN gives advantageous in their structured data, suitable for data analysis, pattern classification as well as modeling [10].

Decision Tree (DT)
Decision Tree is one of a most popular techniques for prediction because it is one of the simplest and easiest method which can work comprehensively in small or large data [7]. Khobragade and Mahadik [9] have implemented NB rules in DT algorithm to classify the students into different classes such as high or low performance. Comendador, Rabago and Tanguilig [36] are applying DT in their research because it is simple and powerful way of knowledge representation to predict final grade. Besides, Kumar and Radhika [37] used DT using C4.5 and ID3 algorithm where C4.5 is the best algorithm for prediction with accuracy of 83.66%.

Artificial Neural Network (ANN)
According to Chudasama and Joglekar [38], ANN is able to forecast future result based on former data with a complex relation between input and output, and help to find pattern in data relationship [21]. Marchandia and Navdeep [40] have been used ANN for performance evaluation by classifying student according to input data as best, average and worst. Furthermore, Dole and Rajurkar [22] have compared ANN to Bayesian Network and Decision Tree, where Chudasama and Joglekar [38] have been used ANN method to produce reasonable results which resulting the performance accuracy of 78.94%.

Support Vector Machine (SVM)
In the published studies described by Kavipriya [8] has reported that SVM has competent in delivery high accuracy as compared to the other data classification algorithm. Sembiring et al. [41] stated that SVM has a good generalization and faster than other method. Apart from that, a few of the researchers have been using a few methods including SVM to find effectiveness technique in predicting student academic performance [4], [42].

K-Nearest Neighbour (KNN)
K-Nearest Neighbour (KNN) is one of the easiest algorithm in machine learning. Gray, McGuinness and Owende [29] have been using six classification algorithm including KNN implemented in RapidMiner. Mousa and Maghari [43] applied KNN for the comparison with other method by classifying academic future and social future. The classification result by using social attributes had shown that KNN classifier is relevant for data with a few changes by comparing training cases with the testing cases.

Student Performance Model and Variable
Student academic performance predictions are developed from various methods and algorithms to generate best and accurate predictive model [45]. Predictive models will help students, educators, and staff in managing vast and massive educational student's record [46]. Predictive model is created from a training data and algorithm which is used to make a prediction [14]. To build predictive model, input dataset will be used for training the algorithm learning in a way it educate relationship of input and output [15]. The Relationship is captured in the result which allowed for constructing an accurate student performance model [15] that can be applied to new cases based on prediction generated by the model.

Student Performance Predictive Modeling
The student academic performance is very important to provide an information about student to the instructor [46]. In an educational institution, predictive modeling usually used in predicting student performance which related to several tasks such as classification, regression and clustering [7]. Predictive model is built using an algorithm on the training set. After the data had been prepared, model construction were performed [20]. Singh and Singh [47] stated that the training dataset was used to train on building a model. Once a predictive model was built using training dataset, the performance of the model must be validated using new data. Smith, Lange and Huston [4] have mentioned, a predictive model was intended to produce an estimation of course success which could be translated into a warning level; low, moderate and high. Model accuracy was assessed using cross validation and the results are compared to subsequently apply for an academic success [29].

Attribute and Variable
In developing a predictive model, it is essential to transform original data into a variable [23]. Variable contain certain information about demographic and academic details such as gender, race, family background, medical information, CGPA, programme mode, and etc. In most recent studies, CGPA/GPA, and student mark have been reported to be used by a quite number of researchers to measure student academic performance in particular subjects and courses. Traditionally, student GPA is used as predictive or dependent variable and assign as an output or the outcome [46]. To harvest an effective predictive model, it is required a good input data and variable, a suitable predictive method as well as powerful and robust prediction model [45].

Challenges and Limitation
Providing better education will need a lot of parameters to understand the process upon level of student understanding [47]. A lot of researchers used predictive technique and tools to discover hidden characteristic [52] to minimize the failure rate among the student. Ting, Cheah and Ho [35] have discussed the challenges and limitations on the current model which not performs modeling engagement level across the time. Meanwhile, there are a systems indeed rich with data but poor with information management. Researchers need to find useful indicator as well as parameter for the recommendation to enclosed the evaluating analytics result in practice [53]. Another challenges in predicting student academic performance are selecting the right factor and relevant attributes with a correct prediction method [7]. To choose an appropriate method, most of the researchers applied mixed method approach to integrate best prediction technique to increase the robustness of the model [53]. However, selection an appropriate method is also depends on the availability of  [52] for model to perform the calculation for accurate prediction [14]. Another challenges are concerned about the small size of data due to incomplete and missing values [44], therefore the researchers have to access to more comprehensive data that offer more compelling result in future [39].

Trends and Future Direction
An overview on how data mining techniques have been used in educational systems are presented. The major areas of research and trends are identified. It will helps the educational system to monitor the student performance in an efficient way. The research work can be further extended to model and validates the variable [35] which helps the student improve their performance and assist instructor to identify student who need special attention and taking appropriate action at a right time [30]. The researchers can focus on early intervention by providing needed support and make sure that students have developed a positive goal toward their studies [54]. It is recommended that in future, further research might explore student performance in early stages, at real time, as well as develop a method that dealing with dynamic environment [35].

Conclusion
This paper reviewed recent research on predictive model of student academic performance. Predicting student performance is most effective way to improve educators and leaners improving their learning process as well as student graduating within normal program duration. Previous studies have used various methods to build best predictive model. To achieve the best model, a lot of variables have been chosen and tested to find most influential attributes to perform prediction. Most of the researchers have used CGPA, attendance, gender and assessment mark. There are a few of the attributes which give most significant impact whether the student will be survived to finish their study or not. The prediction methods is one of the critical components for analyzing student performance. Most of the researchers are using classification, regression and clustering method such as BN, DT, ANN, SVM, KNN and others. Some of them are using mixed methods in order to provide robust mechanism with a better predictive accuracy model.