Sustainable Development of College and University Education by Use of Data Mining Methods

To improve the education efficiency of the students, the studentcentered education plan is explored. First, the Apriori algorithm of association rules is used to mine the potential related patterns in the score data of college students and establish a reasonable teaching method. Second, aided by the decision tree model, the factors affecting students' academic performance are studied, and the potential relationship between different courses is studied. Finally, the Apriori algorithm of association rules combined with decision tree model is used to generate the early warning mechanism of students' achievement, and the course performance of college students is empirically analyzed. The results show that: C language has two sides of dependence on many subjects; higher mathematics linear algebra mathematical statistics computer composition principle computer network. The teaching scheme of C language C++ Java more conforms to the learning mechanism of college students. Through empirical analysis, the early warning mechanism of association rule Apriori algorithm and decision tree model can effectively analyze student's course and give student's achievement. It is found that the method proposed can provide theoretical basis for students, teachers, and university administrators to carry out education reform and education management decision-making, improve students' performance and education quality, and realize the "student-oriented" education concept, so it can be applied to the actual education management. Keywords—“Students-oriented”; data mining; association rules; Apriori algorithm; sustainable development


Introduction
The concept of student-oriented is a people-oriented idea, which emphasizes respect and support for students. It is student-centered and applicable to all students. In school teaching management, the guiding principle of student-oriented must be adhered to, and corresponding reforms in school system, policy and environment should be implemented. As the "people-oriented" concept is continuously implemented, it has positively deepening the reform of higher education management system. In order to grasp the scientific connotation of "people-oriented" in higher education, the necessity and internal requirements of "people-oriented" in higher education should be deeply understood, which will help further exploring the value orientation of "people-oriented" in higher education. The concept of human-oriented provides some sustainable development strategies for implementing higher education in university [1][2][3].
The school information management system stores much potentially valuable performance data information. For much achievement data, association rules in data mining technology can be used to obtain the association relationship between subjects and find the neglected content in learning, which can provide targeted help and academic early warning for each student, and provide teaching guidance for teachers and administrators. Data mining is regarded as a set of technologies that allow automatic or semi-automatic extraction of much useful information, models and trends from many datasets, such as "clustering", "classification", "association" and "regression"; intelligent artificial algorithms such as Apriori algorithm, Bayesian algorithm, and neural network will be used to extract patterns from data, and these patterns are realized to interpret and predict their behaviors [4][5][6][7][8]. Association rule is an important task to discover frequent patterns in data mining. It has been successfully used in computer network, recommendation system, and medical care [9][10]. Wang et al. (2020) studied an improved Apriori algorithm for time series of frequent itemsets, applied it to mining association rules based on time constraints, and concluded that this algorithm was superior to the traditional algorithm in storage space [11]. Sun et al. (2020) built 0-1 transaction matrix by scanning transaction database to gain weighted support and confidence. The method can shorten the running time, reduce the memory demand and the number of operations, and effectively extract the hidden and valuable items [12].
At present, due to the lack of data mining analysis of students' scores and related application research, the data is stored and processed by establishing a data warehouse of students' academic performance, and reliable experimental data is provided for the subsequent analysis of students' academic performance. The Apriori algorithm of association rules is used to fully mine the association between subjects and the hidden valuable information, which plays an early warning role in students' subject scores, and provides technical service support for teachers' teaching plan reform and university administrators.

Materials
1286 students majoring in computer science from 2016-2019 are selected from the academic affairs system of Zhejiang Industry Polytechnic College as research samples. Advanced mathematics, computer network, computer composition principle, C++ and other core courses are selected as experimental samples. Table 1 and Table 2 iJET -Vol. 16, No. 05, 2021 show the original data of different data sources, and Figure 1 is the flow chart of this experiment.    1. Establishment of data warehouse: The data warehouse is subject-oriented. The experimental data warehouse takes the analysis of students' scores as the theme, and adopts the top-down implementation mode and logic structure mode of centralized management to ensure its high operation efficiency in the rapidly increasing data related to students' scores. Before the establishment, the abstract subject is determined according to the needs and the data are extracted, transformed, and loaded under its control. Figure 2 is the data warehouse structure. The construction of data warehouse is realized by SQL Server 2019 in the actual design. there may be some problems in the data. Before loading the data into the data warehouse, it is necessary to process the original data to obtain more accurate data. The steps are as follows. Error data processing: in data collection, transfer, and loading, there will be various errors, such as the loss and dislocation of some data fields. These records need to be processed in advance. In the design of the system, in this case, the missing or misplaced fields in the data will be re-associated and matched by other means. If the corresponding matching cannot be achieved, it will be treated as missing value. Format correction: to ensure data normalization, data with the same attribute in different tables need to be corrected. Separation and combination of fields: in order to conduct more efficient data mining analysis, some fields in the original data need to be separated in advance. Data discretization: the student's score is a continuous measurement parameter, which is divided into five standards and simply discretized. Noise: it refers to the deviation or random error of a certain variable in the data. If it is not handled, it will affect the accuracy and quality of the data. Smoothing technology is used to deal with the noise. c) Loading and storing data: After transformation, the original data will be loaded into data warehouse and stored, and OLAP tools will be used to extract decisionoriented data, such as reports and various views.
2. Data mining model: a) Data mining: The experimental data mining model uses the Apriori algorithm to explore the relationship between students' courses and scores. The frequent itemset that meets the minimum support is extracted from the data, and the strong association rules that meet the preset minimum confidence and support are generated from it. The goal, data collection, and preprocessing of data mining model have been processed in the above data warehouse. b) Data transformation: Acquired data are transformed into the data form suitable for data mining of this experiment. The Apriori algorithm is a Boolean algorithm, so the continuity of students' scores will lead to a lot of troubles in the operation process of the algorithm and often prolong the work. The continuous score data are transformed into Boolean data (0,1). Due to the differences in the evaluation standards of different courses, the conversion standard of the experimental data is the course average score. 1 means that students' score is higher than the course average score, and 0 means that students' score is below the course average. Table  3 is the results of partial data conversion.
c) Definition of association rule: = { 1, … , } is the itemset in the data warehouse DB (transaction set of data warehouse). If the number of items in is , is set into -itemset, the association rule is → , , is the true subset of , and ∩ ≠ ∅. The association intensity of association rules can be expressed by support and confidence that are defined as follows.
Where: indicates the support count of the itemset, and indicates the number of total transaction sets.
If ( → ) ≥ min _ and ( → ) ≥ min _ , association rule → will be strong association rule, and min_conf and min_sup indicate the minimum confidence and support.
According to the data analysis and actual needs, the minimum support and the minimum confidence are set as 0.2 and 0.4, respectively. This data mining model compares the set minimum support and confidence with the confidence and support of each frequent itemset after preprocessing to get the corresponding association rules. d) Decision-tree model of association rule merging: There is not only valuable information between different courses, but also hidden information between students' course scores and course information, such as time attribute information. Therefore, decision-tree is used to distinguish the course related attributes of different years and semesters. For the factors related to the early warning of students' scores, the influence of the correlation between courses, teachers, students, courses and other factors should be considered.
From the above association rules, the strong association rule with high credibility is obtained as the preselected new attribute, and it will be distinguished through the information gain. Finally, the new attribute is generated and it will be combined with the original data attribute to construct the decision-tree. Figure 3 is the flow chart of association rule merging decision-tree model.

Fig. 3. Flow chart of association rule merging decision-tree model
According to the needs of mining, the scores of computer-majoring students are obtained from the educational administration system. After preprocessing, a score group (class) is added to the students' curriculum score table. All the scores are divided into five sections, namely ≥ 90, 90 > b ≥ 80, 80 > c ≥ 70, 70 > d ≥ 60, and e < 60. In the production process of the above association rules, the performance judgment of students in semesters and academic years is added, and Figure 4 is the flow chart. In order to narrow the mining scope and improve the mining quality, after the original data of different data sources are selected and processed, the analysis attributes set in this experiment are: course category, student source, gender, professional direction, absence of examination and course attributes. The minimum confidence and support of association rules are 0.5 and 0.2, respectively. Precise rules are generated, association rules that cannot meet the teaching plan are deleted, and the new attributes are evaluated. The attributes with high reliability are merged into the original attributes and the decision tree is constructed.   Table 4 shows that learning advanced mathematics first is conducive to learning linear algebra, with a confidence of 0.92 and a support of 0.35; learning linear algebra first benefits studying mathematical statistics, with a confidence of 0.86 and a support of 0.47; learning mathematical statistics first contributes to the study of computer composition principle, with a confidence of 0.87 and a support of 0.45; meanwhile, learning the principle of computer composition first conduces to the learning of computer network, with a confidence of 0.88 and a support of 0.52. Therefore, according to the experimental results, the scientific and reasonable teaching plan is obtained: higher mathematics → linear algebra → mathematical statistics → principles of computer composition → computer network. Additionally, learning C language first helps learning C++; learning C++ first is conducive to the learning of the Java, and the confidence and support are 0.85, 0.48, 0.87, and 0.34, respectively. Therefore, as a basic course of computer major, C language should be set up earlier. The teaching plan recommended according to the experimental results is: C language, C++, Java.  Figure 5 shows that many courses have a two-side dependence on C language, which is consistent with the school's setting of this course as a basic professional course for computer-major students. Therefore, this course needs to be set up in the early learning of computer-major students and can enhance students' understanding and teachers' teaching quality. Moreover, advanced mathematics and the principles of computer composition depend on many computer professional courses, which shows that advanced mathematics and the principles of computer composition are the basic theoretical reserve courses for computer-students. Therefore, these two courses are particularly important for the follow-up course learning of students in this major. They should be learned and mastered as soon as possible, and teachers should also focus on the key explanation and management of the course.

Fig. 5. Course dependency network
The above analysis results show that the courses of computer-major students are certainly dependent, and the order of courses also affects students' scores. According to the analysis of correlation rules, it can be concluded that the results of the previous courses will directly affect the results of the follow-up courses. Therefore, in order to improve students' scores and teachers' teaching quality, colleges and universities should follow the scientific and reasonable theoretical basis when setting up students' courses.

High reliability association rule results
According to the Apriori algorithm and evaluation criteria, high confidence rules are obtained and processed by association rules. Then, strong association rules between students' course scores with time attribute are obtained, as shown in Table 5. The confidence of computer application foundation, electrical and electronic technology and computer assembly and maintenance is 0.9782; the confidence of local area network establishment, enterprise network security advanced technology, and enterprise network integrated management is 0.9571; the confidence of ecommerce process, e-commerce website design and production, and network marketing practice is 0.8902; the confidence of Linux server operating system, ecommerce process, and enterprise network integrated management is 0.8634; for the items mentioned above, their support are all 1.0000; the above courses are highly relevant, which is conducive to the analysis of early warning factors of students' scores.

Results of merging decision-tree of some association rules
The combination decision-tree analysis of strong association rules selected by Apriori algorithm can enhance the integrity of students' score early warning and excavate the hidden reasons of students' failure in subjects. Table 5 shows the more valuable hidden rules mined. The course category is one of the important influencing factors of students' score early warning, which is consistent with the actual experience evaluation results. Its branch attributes include basic courses, applied courses, and professional courses. Figure 6 shows that when a girl fails in computer application foundation and electrical and electronic technology courses, she may also fail in computer assembly and maintenance to a large extent, which is consistent with the understanding that boys are better than girls in engineering learning and understanding. Through the analysis of the above results, the relevant factors influencing the students' academic scores are obtained. Teachers and university administrators should pay attention to the phenomenon of students' failure of different courses, and put forward corresponding teaching reform programs and measures to avoid the students' failure of relevant courses, and improve the level of talents in colleges and universities. Moreover, it can avoid the phenomenon that the students who have failed in the course are constantly failing due to their carelessness and thus affect their studies.

Discussion
The concept of "student-oriented" means that teachers and administrators regard students as the main part of learning activities. Teaching objectives, teaching environment, teaching materials, teaching process organization, teaching methods and other factors should be student-centered. All are designed and served for students' personal development and all-round development, so as to promote students' active and full participation in the process of teaching planning, continuously improve their scores and promote their healthy development physically and mentally [13][14][15]. The education industry is developing towards a direction of information and intelligence. Data mining technology has been applied to the field of education, such as teaching information management, teachers' teaching evaluation, analysis of students' psychological characteristics, formulation of scientific and reasonable teaching programs, and analysis of students' examination results to find problems and effectively strengthen teaching. Among them, the most commonly used data mining technology for predicting and classifying the factors affecting students' scores is decision-tree, Bayesian classifier and artificial neural network (ANN) [16][17][18]. Therefore, based on theoretical basis and previous experimental summary, association rule is built to mine the value data hidden in students' score data, and the decision-tree model is combined to analyze the factors that affect students' scores.
According to the test results of the proposed model, the deep learning model can be combined with the potential value information related to students to provide a basis  [19,20]. Based on the deep learning prediction model, the big data in MOOC (massive open online courses) and the information provided by counselors are used to predict and research the students' scores, and the prediction ability of the model based on the user's video viewing behavior is tested. It can be found that the frequency of watching video every week is better than the single watching function in predicting students' score. The model of association rules combined with decision-tree can effectively mine the dependence relationship between students' courses and predict the influencing factors of students' scores through the order of course scores, which is consistent with the research results of Preet et al. (2019) [21]. By investigating the demographic, social, academic and behavioral factors that affect students' performance, three accuracy-based technologies are used to build an integrated model. The 10 times cross validation technology is applied to access the suitability of the results obtained in the integrated model and students' performance is accurately predicted. The results show that the academic performance of last semester and other factors have a significant impact on the current academic performance; any serious accident in the past year will also affect academic score, and this model has high accuracy. Early identification of influencing factors on academic performance is conducive to early detection of high-risk students; thus, preventive and corrective measures can be put forward to improve students' overall academic performance.
Decision-tree is used to analyze the influence of new attributes on students' curriculum scores, to achieve an effective early warning of students' score, and provide a basis for the formulation of teachers' teaching plan. This result is consistent with the research results of Concepción et al. (2018) [22], who used the logistic regression model to classify the data experiments of students for distance learning courses, and effectively predicted the dropout risk of students.
To sum up, data mining technology based on the concept of "student-oriented" can be applied to education field, including the early warning of students' curriculum performance, teaching plan reform, management decision-making and other applications, and provide effective supervision and early prevention in the process of students' course learning.

Conclusion
Based on the concept of "student-oriented", association rule is used to mine the value data hidden in the students' score data, and the decision-tree algorithm is combined to analyze the factors affecting students' scores, thus providing theoretical support in the early warning of students' scores, teaching planning guidance, and teaching management mode. However, only the decision-tree and association rules are analyzed, and the analysis method is not enough. In the later stage, more methods such as ANN and cluster analysis can be included in the research of students' scores.