Prediction Model of Student Achievement in Business Computer Disciplines Learning

— An educational program that does not accept the change of disruptive technology will inevitably result in future destruction. There are two objectives including (1) to construct reasonable students’ dropout prediction model for business computer disciplines, and (2) to evaluate the model performance. Data collected consists of 2,017 records from students who enrolled in the business computer program at the School of Information and Communication Technology, the University of Phayao. Research tools are divided into two parts. (1) Modelling; it consisted of the Artificial Neural Network Algorithm, Decision Tree Algorithm, and Naïve Bayes Algorithm. (2) Model testing; it consisted of the confusion matrix performance, accuracy, precision, and recall measurement. It is a clear innovation in the research that the researcher combines the knowledge of data science in analysis to improve the academic achievement of students in higher education in Thailand. From the analysis results, its show that the model developed from using Artificial Neural Network algorithms has the highest accuracy in the first three data sets (89.04%, 92.70% and 93.71%), and the last model is appropriate for Naïve Bayes algorithm (91.68%). Finally, it is necessary to conduct additional research and present research results to relevant parties and organizations.


Introduction
In the field of education, there is a significant need for change which is called "Disruptive Education Technologies" [1], [2]. This innovation is most likely to revolutionize the learning structure. It will not only change the educational content and experience of students but will influence students' learning interest and retention [3]. What is about to change the world of education is digital technology that will disrupt higher learning [4]. Therefore, using the right innovation will support and create benefits for the users.
From this situation, the university's education program has also been affected. According to UNESCO's 2017/8 Global Education Monitoring (GEM) Report, there are still many challenges for raising the quality of education in Thailand: The number of students who have completed primary education is 99%, but only 85% have completed a lower secondary education. At the lower secondary level, only 50% of the students have passed the reading exam, and only 46% of the students have passed the mathematics exam. Moreover, there are 3.9 million adults who are unable to read a simple sentence [5]. This report is the starting point of our research, in which the researchers found that the preliminary quality of students in the university is diverse. Some students have the potential and readiness for academic learning. On the other hand, some students do not have easy access to knowledge. Preliminary results are a problem of students' academic performance at the university level, which results in students having a low performance leading them to drop out of the program [6], [7].
This research aims to present and reflect the problems about student achievement and dropout of university students by limiting the study scope to only one program at the University of Phayao. The expected results are aimed to know the causes and factors that reflect the problem of students' dropout in the Business Computer Disciplines at the School of Information and Communication Technology, at the University of Phayao and bring this research to further development as shown in the conceptual framework in Figure 1. There are two main objectives on this paper: (1) to construct reasonable students' dropout prediction model for Business Computer Disciplines at the School of Information and Communication Technology, at the University of Phayao, and (2) to evaluate the performance and select the best predictions of the dropout model of students who did not gain academic achievement for the Business Computer Disciplines at the School of Information and Communication Technology, at the University of Phayao.

Research approaches
The researchers approached the work with a design according to the data mining principle called CRISP-DM [8]. It consists of six steps: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment. The data collected and accessed for study in the research are the data of 2,017 business computer students from the academic year 2001-2019 at the School of Information and Communication Technology, at the University of Phayao, Thailand.
Research tools are divided into two main parts. The first part is the development of predictive models [9]- [12], it consisted of the Artificial Neural Network Algorithm (A-NN), Decision Tree Algorithm (DT), and Naïve Bayes Algorithm (NB). The second part is the model performance testing as described in the evaluation section, it consisted of the confusion matrix performance, accuracy, precision, and recall measurement [10].

2
Literature Reviews and Related Works

Educational system in the university
The education system in the university is in accordance with the National Education Act of Thailand which includes the National Education Act B.E. 2542 (1999) and Amendments of the Second National Education Act B.E. 2545 (2002) [13], [14], and The National Qualifications Framework for Higher Education in Thailand [15].
The National Education Act requires three types of education: Formal education specifies the aims, methods, curricula, duration, assessment, and evaluation that are conditional to its completion. Non-formal education has flexibility in determining the aims, modalities, management procedures, duration, assessment and evaluation conditional to its completion. The contents and curricula for non-formal education shall be appropriate, responding to the requirements, and meeting the needs of individual groups of learners. Finally, informal education enables learners to learn by themselves according to their interests, potentialities, readiness and opportunities available from individuals, society, environment, media, or other sources of knowledge.
In addition, formal education is divided into two levels: basic education and higher education. The formal education structure in Thailand is shown in Figure 2.  Figure 2 shows the structure and duration of the Thai education system, which takes an average of almost two decades. In addition, the university education system must refer to the National Qualifications Framework for Higher Education in Thailand [13]. It has been designed and defined with many qualifications and standards for getting students to complete their studies within a specific time frame. For example, there are five characteristics of programs and expected outcomes in the domains of learning: ethical and moral development, knowledge, cognitive skills, interpersonal skills and responsibility, and analytical and communication skills. Moreover, the Thai Qualifications Framework for Higher Education (TQF: HEd), which is modified from The National Qualifications Framework, has specified too many requirements, according to numerous research reports [10][11][12].
Therefore, it can be concluded that the Thai education system was not truly designed in line with the learner's context.

University's competitiveness and dropout's impact
Currently, higher education in Thailand consists of 83 public higher education institutions, consisting of 26 autonomous universities, 10 public universities, 38 Rajabhat Universities (developed from teacher training colleges but now offering a broad range of programs beyond focusing on developing local communities), and 9 Rajamangala Universities of Technology (developed from institutes of technology, and focusing on science and technology). In addition, there are 72 private higher education institutions.
There are 155 universities and more than 59 campuses in Thailand. The number of students enrolled in the university has decreased steadily as shown in Table 2. The explanation of the decline comes from the competition of new students trying to get a place in the university of their choice. In 2017, two educational institutions in Thailand closed down, namely Asian University and Srisophon College. While the number of students in higher education institutions that entered the university during the year 2013-2017 is shown in Table 1.  Table 1 shows the number of students decreasing continuously. In 2017, the total number of students decreased by 216,561 students. In addition, the bachelor's degree level has decreased significantly with the number of students reduced to 165,840 students. Moreover, the Ministry of Education in Thailand has reported a decrease in the number of students enrolled in higher education in 2016. There are 81.26 percent enrolled in the first year of higher education, or 541,720 students. From the situation, educational institutions are needed more and the population has been continuously decreasing, thus concluding that the competition for the learners will be more intense in the future.
The Business Computer Disciplines at the School of Information and Communication Technology, at the University of Phayao has clearly felt the impact [10], [12]. The institution has discussed the statistics of students continuing to decline in this research. This research is needed to solve the dropout problem of university students. Which, the researcher therefore pays special attention to this problem.

Research Methodology
This research methodology was designed and implemented in accordance with the CRISP-DM methodology [8]. It has six phases to follow: business understanding, data understanding, data preparation, modeling, evaluation, and deployment.

Business understanding
This phase focuses on questioning and studying the needs of the Business Computer Disciplines at the School of Information and Communication Technology, at the University of Phayao from the past twenty years. The objective is to use research to find answers that have been developed as a model for problem solving, the number of students in the business computer disciplines, and the dropout problem of students. How should it be solved?
In the past, the state of the business computer disciplines used to have a large number of students but it has continued to decrease over the past twenty years, while the higher dropout rate makes it a very serious threat to the discipline's existence. This is converted to the viewpoint of the success of an organization which is an educational institution that wants to promote and support human resource development which is the country's strength.

Data understanding
The data understanding phase aims to define the scope of data collection in response to the research questions. Normally, there are four-steps which includes collecting the initial data, describing data, exploring data, and verifying data quality [8].
The purpose of this phase is to gather a comprehensive and complete information of the Business Computer Disciplines at the School of Information and Communication Technology, at the University of Phayao, Thailand. It consists of statistics of students enrolled in the program during the twenty years (academic year 2001-2019), dropout statistics of students in the program during the twenty years (academic year 2001-2019), current student statistics, and details of individual study results including grade point average (GPA), registration history for each semester, academic results in each course, student status, and other related information. Figure 3 below displays a diagram showing the process of data collection, the process of analyzing and determining the scope of the data, and the process of planning the data management. Moreover, Figure 4 shows the process of obtaining data from the university. Figure 3 shows the working process of the data understanding phase. It shows the relationship between CRISP-DM [8] theory and research operations. It can be seen that this phase takes a lot of time due to the fact that there are sub-processes and details that need to be handled to obtain quality data. Figure 4 shows the data acquisition procedure by referring to the procedures in the research methodology. The steps shown in Figure 4 consist of five steps: 1) requesting permission to access and use student data from the university, 2) login process, 3) student data collection, 4) statistical summary, and 5) prepare data for developing models. The preparation and distribution of the data are shown in section of data preparation and the selection of tools for developing the model are shown in the modeling section.

Data preparation
The data preparation phase covers all activities to complete the final data collection. It is intended to prepare data that will be fed into the modeling tool from the initial raw data. It consists of five main steps which include selecting data, cleaning data, constructing data, integrating data, and formalizing data [8].
The data collected were 2,017 students in the Business Computer Disciplines at the School of Information and Communication Technology, at the University of Phayao, Thailand. The data is divided into five sets from the scope of research shown in Table  2 to Table 4, respectively.
Notation, the scope of the research is divided according to the revised curriculum, which will be updated every three to five years.  Table 2 shows data compiled for the Business Computer Disciplines during the past twenty years (academic year 2001-2019). In which it is divided into five sets of data according to curriculum updates. In addition, the researcher presented a graph showing the consistency of the research problem in Figure 5.  Figure 5 clearly shows that the overall student's tendency to decrease in each data set is shown in the dash line. While the solid line details the total number of students in each data set. Moreover, the bar graph showing the proportion of students who graduated and dropped out found that the trends of both parts were equal in volume (176:163 in the fourth series). In order to have more insight, the researchers explain in detail each set of data, as shown in Table 3 to Table 4.
Notation, the details of the non-graduate program (dropout) consist of the following three main points: Type 1 is problems leaving university due to timing and management, Type 2 is dropout with academic results, and Type 3 is retired. From the data collected, it can be concluded that students of Business Computer Disciplines at the School of Information and Communication Technology, the University of Phayao have the most dropout during the 1st year (382 students or 60.25%). The reason that most students' dropout is due to their academic results (394 students or 19.85%). This is shown in Table 4. From the data collected, it was found that the tendency for admission to the business computer disciplines tends to decrease significantly, as mentioned in Figure 6. The solid lines in Figure 6 show an increasing trend of student enrollment in 2003 and 2007, after 2008 the trend continued to decline clearly. In contract, the tendency of dropout shows the stability of the drop out as well as increases as shown in the dash line. It can therefore be concluded without any doubt that the situation should be resolved based on the students' academic performance results and finding preventive measures for first year students to stay in the program.

Modeling
From the data gathered, it is known that the number of students enrolled in the program tends to decrease continuously as displayed in Figure 6. In addition, the increasing trend of dropout reflects a number of problems related to the number of learners. It can be seen from the gap between the number of admissions and the number of dropouts which is necessary to find a solution to the problem urgently.
Therefore, this process presents a process for choosing the appropriate theory and machine learning tools to be used to develop the best solution model for suggestions to the School of Information and Communication Technology, at the University of Phayao, Thailand.
The theory and machine learning tools chosen to develop the model were divided into two parts. The first part is the development of predictive models, it consisted of the Artificial Neural Network Algorithm (A-NN) [7], Decision Tree Algorithm (DT) [7] [12], and Naïve Bayes Algorithm (NB) [7]. While the second part is the model performance testing as described in the evaluation section, which consisted of the confusion matrix performance, accuracy, precision, and recall measurement [8].
The model development process in this phase consisted of three steps. The first step is the researchers selecting and developing the model with machine learning techniques. The second step is that the researchers took the model from the first step to test the model's performance. Finally, the third step was to compare the performance of the model by selecting the appropriate model. An overview of modeling phase is shown in Figure 7.

Evaluation
This stage is part of the modeling phase which is the second step of model development. In principle, the performance evaluation of the model uses the crossvalidation method [8]. The cross-validation method divides the data into two parts. The first part is used for model development and the second part is used for testing that model, as shown in Figure 8.  Figure 8 describes the division of data for testing the model. In addition, model testing requires a tool called a confusion matrix [6], [17], [18] to test the model's performance. An important benefit of the performance of the confusion matrix is the ability to determine the model's ability to predict results, such as the predictive ability or accuracy, model precision, model sensitivity, and model specificity (recall measurement). These values are used to determine the actual performance model.

Deployment
Rüdiger Wirth said that in the CRISP-DM "Creation of the model is generally not the end of the project" [6]. Therefore, the researcher plans to deploy this research for being developed into a future recommendation application. An important goal is to test the real system with users in different universities.

Research Results
This research results and discussion phase has been divided into two stages as planned: Stage 1 is model performances and results, and Stage 2 is the best solution model and model selection.

Model performances and results
Models performance are the results of the testing by the confusion matrix performance, accuracy, precision, and recall measurement [8], which is separated by five research tools.

Modeling and results of the Artificial Neural Network Algorithm (A-NN):
These results are part of a summary analysis for the selection of the reasonable mod-els based on artificial neural network algorithms [7]. This process tests various patterns of cross-validation methods to find reasonable models with the highest accuracy.  Table 5 illustrates the four artificial neural network algorithm models that should be selected as follows: The suitable model for the student data enrolled in the academic year of 2001-2003 is the 5-Fold cross validation method. The performance details and model prediction results are shown on the website (https://bit.ly/36LU2d9).  The suitable model for the student data enrolled in the academic year of 2008-2011 is the leave-one-out cross validation method. The performance details and model prediction results are shown on the website (https://bit.ly/2R8AB7L). The suitable model for the student data enrolled in the academic year of 2012-2016 is the leave-one-out cross validation method. The performance details and model prediction results are shown on the website (https://bit.ly/2TeJXl3).

Modeling and results of the Decision Tree Algorithm (DT):
These results are part of a summary analysis for the selection of the reasonable models based on the Decision Tree Algorithm (DT) [15][16]18]. The four models received have been summarized in Table 10 to Table 14, and the most effective model selection are concluded in Table 20 respectively. Table 10.
DT performance testing results  Table 10 illustrates the four decision tree algorithm models that should be selected as follows: The suitable model for the student data enrolled in the academic year of 2001-2003 is the 5-Fold cross validation method, with the 2nd level depth of the decision tree model. In addition, the courses that have a significant influence on prediction in this model consist of one course: 001245 Science in Everyday Life. Table 11 shows the performance of the model.  Table 12 shows the performance of the model. The suitable model for the student data enrolled in the academic year of 2008-2011 is the leave-one-out cross validation method, with the 4th level depth of the decision tree model. In addition, the courses that have a significant influence on prediction in this model consist of six courses: 001103 Thai Language Skills, 001111 Foundations of English I, 001112 Foundations of English II, 001134 Civilization and Local Wisdom, 001173 Life Skills, and 231101 Business Statistics. Table 13 shows the performance of the model. The suitable model for the student data enrolled in the academic year of 2012-2016 is the leave-one-out cross validation method, with the 2nd level depth of the decision tree model. In addition, the courses that have a significant influence on prediction in this model consist of one course: 221110 Fundamental Information Technology. Table 14 shows the performance of the model. Modeling and results of the Naïve Bayes Algorithm (NB): These results are part of a summary analysis for the selection of the reasonable models based on the Naïve Bayes Algorithm (NB) [7]. This process tests various patterns of cross-validation methods to find reasonable models with the highest accuracy.  Table 15 illustrates the four Naïve Bayes algorithm models that should be selected as follows: The suitable model for the student data enrolled in the academic year of 2001-2003 is the 5-Fold cross validation method. The performance details and model prediction results are shown on the website (https://bit.ly/2QHS08k); they are included within the model in Table 14, and Table 15 shows the testing results of the model. The suitable model for the student data enrolled in the academic year of 2004-2007 is the 15-Fold cross validation method. The performance details and model prediction results are shown on the website (https://bit.ly/2sYSNJh); they are included within the model in Table 44, and Table 41 shows the testing results of the model. The suitable model for the student data enrolled in the academic year of 2008-2011 is the leave-one-out cross validation method. The performance details and model prediction results are shown on the website (https://bit.ly/2QH2Mfd); they are included within the model in Table 16, and Table 17 shows the testing results of the model. The suitable model for the student data enrolled in the academic year of 2012-2016 is the 15-Fold cross validation method. The performance details and model prediction results are shown on the website (https://bit.ly/2QKks9V); they are included within the model in Table 18, and Table 19 shows the testing results of the model.

Best solution model and model selection
This section is a discussion of previous analysis results, with the objective of selecting the most reasonable model to present to relevant parties, School of Information and Communication Technology, and University of Phayao. Summary of analysis results in the previous section summarized in Table 20.

Research discussion
From Table 20, it can be concluded that the first model for the students enrolled during the academic year of 2001-2003 is the Artificial Neural Network Algorithm (A-NN) with an accuracy equal to 89.04% as shown model performance in Table 6. The second model for the students enrolled during the academic year of 2004-2007 is the Artificial Neural Network Algorithm (A-NN) with an accuracy equal to 92.70% as shown model performance in Table 7. The third model for the students enrolled during the academic year of 2008-2011 is the Artificial Neural Network Algorithm (A-NN) with an accuracy equal to 93.71% as shown model performance in Table 8. Finally, the last model for the students enrolled during the academic year of 2012-2016 is the Naïve Bayes Algorithm (NB) with an accuracy equal to 91.68% as shown model performance in Table 19.

Research conclusion
There are two main objectives of the research. The first objective is to construct reasonable students' dropout prediction model. The second objective is to evaluate the performance and select the best predictions of the dropout model. The data set used in the research is the 2,017-student data that has been enrolled for the past twenty years For the future, it is necessary to conduct additional research and present the research results to relevant parties and organizations for making critical decision and planning.