A Recommendation Method for Highly Sparse Dataset Based on Teaching Recommendation Factorization Machines

: There is no reasonable scientific basis for selecting the excellent teachers of the school’s courses. To solve the practical problem, we firstly give a series of normalization models for defining the key attributes of teachers’ professional foundation, course difficulty coefficient, and comprehensive evaluation of teaching. Then, we define a partial weight function to calculate the key attributes, and obtain the partial recommendation values. Next, we construct a highly sparse Teaching Recommendation Factorization Machines (TRFMs) model, which takes the 5-tuples relation including teacher, course, teachers’ professional foundation, course difficulty, teaching evaluation as the feature vector, and take partial recommendation value as the recommendation label. Finally, we design a novel Top-N excellent teacher recommendation algorithm based on TRFMs by course classification on the highly sparse dataset. Experimental results show that the proposed TRFMs and recommendation algorithm can accurately realize the recommendation of excellent teachers on a highly sparse historical teaching dataset. The recommendation accuracy is superior to that of the three-dimensional tensor decomposition model algorithm which also solves sparse datasets. The proposed method can be used as a new recommendation method applied to the teaching arrangements in all kinds of schools, which can effectively improve the teaching quality.


Introduction
The personalized recommendation system predicts the potential interests of users based arrangements are mostly based on the teacher's wishes. When encountering some new courses, there is no reasonable scientific basis for teaching recommendation. Such a lack of scientific basis leads to a non-optimized and low quality of teaching. In this paper, a series of normalized models are proposed to pre-process a large amount of teaching data. Based on the processed data, the key attributes of teacher professional foundation, course difficulty coefficient, teaching comprehensive evaluation, and overall recommendation value are defined. We use these key attributes as the feature vector X and the recommended observation value as the target vector Y to construct the Teaching Recommended Factorization Machines (TRFMs) model. Then, the teaching recommendation algorithm is designed based on the TRFMs model. Finally, we compare it with the three-dimensional tensor recommendation algorithm in accuracy and time complexity. Experimental results show that the proposed TRFMs model and recommendation algorithm provide advantages in the recommendation of a sparse teaching dataset. The proposed methods are promising to address the lack of a scientific recommendation basis.

Related works
Referring to Adomavicius' definition of recommendation system [Adomavicius and Tuzhilin (2005)], we define the teaching recommendation system as follows. The course and teacher sets are defined as C and T, respectively. Utility function f() is used to calculate the recommendation degree of teacher object t to course c; that is, : f C T R × → where R is a set of totally ordered non-negative real numbers in a specific range. The problem is to find those T* whose recommendation degree R is the largest, that is, ∀c∈C, * arg max ( , ) t T T f c t ∈ = .
Steffen proposed the FMs. In FMs, the implicit factor model and the matrix decomposition idea for reference are used to remove the autocorrelation item from polynomial regression, and only the interaction between the categorical variables is used as factor decomposition [Rendle, Gantner, Freudenthaler et al. (2011)]. FMs can be used to deal with three prediction problems: a regression, a binary classification, and a ranking. The second-order factorization model is commonly used, which is defined as follows: The parameter w0∈ℝ, w∈ℝ n , v∈ℝ n×k , <vi,vj> denote the dot product of two vectors vi and vj whose size is k, i.e.
is called factor decomposition dimension hyper-parameter. In Eq. (1), by defining j>i, autocorrelation term is removed; thus, only the interaction between two mutually-distinct feature components is considered. The dot product of two low-rank matrices is used in FMs to approximate the interaction of categorical variables. That is, , = ∑  so that some interactions are shared between wi,j and wi,j*.
Accordingly, FMs can contain multiple categorical variables and be suitable for the case where the data is very sparse ].
To learn model parameter Θ={w0, w1, …wn, v1,1, …vn,k} from the training set, different loss functions need to be defined according to different issues. The time complexity of Eq. (1) is O(kn 2 ) because the interaction <vi,vj> between variables of different types is calculated. Therefore, the second-order model can be obtained by factorization and optimization into the form shown in Eq. (2): The time complexity of Eq.
(2) is reduced to O(kn). Further, the time complexity is only O(kp) in applications where the dataset is extremely sparse (the number of non-zero elements is set to p). For each pair (x, y) in the observation dataset (S), Eq. (3) can be used to find the minimum of the error sum of all observations y and predicted values ˆ( | Θ) y x to obtain the ideal parameter set Θ: When the factorization dimension k is large, the L2-norm regularization term prevents the model from overfitting. Thus, employing the L2-norm regularization term, Eq. (3) becomes as follows: where θ λ denotes the regularization coefficient and 2 Θ θ θ ∈ ∑ denotes the L2-norm of a parameter set Θ. Moreover, according to Rendle et al. [Rendle, Gantner, Freudenthaler et al. (2011)], for ∀θ∈Θ, the factorization machine can be expressed as a linear combination of two functions: where gθ and hθ are independent of the value of parameter θ. Then, according to Rendle et al. [Rendle, Gantner, Freudenthaler et al. (2011)], the optimization learning method is used to find the optimal parameters. The main idea is that each parameter is learned iteratively while other parameters are fixed until all parameters converge to the optimal solution.

Data pre-processing methods
The collected data is pre-processed to construct the requirements of the factorization model and decomposition algorithm for teaching recommendation. Firstly, a teaching information data warehouse is constructed based on the fact constellation model by extract-transform-load (ETL) from several database tables, such as teacher information table, course information table, and teaching evaluation table. The structure of the data warehouse is shown in Fig. 1.   Then, we construct the following definitions to normalize the related attributes in the data warehouse. Definition 1: School factor vector (vsf, vsf=[0.4 0.3 0.2 0.1] T ) is used to quantify the teacher's graduation schools. Vector elements represent the values of graduates from "985 Project" universities, "211 Project" universities, other first-tier universities, secondtier universities and below. Definition 2: Degree coefficient (vDc, vDc=[0.4 0.3 0.2 0.1] T ) is used to quantify the degree earned by teachers. Each element represents the value of Ph.D., master, bachelor, and non-degree, respectively. Definition 3: Graduation years (Gy, Gy≥0) represents the difference between the current year and the year of graduation (this paper takes the year of graduation of the final academic degree as the teacher's graduation year). That is, Gy=the current year-the year of graduation.
Definition 4: Attenuation function Af (Gy) (Gy∈ℕ + , 0≤Af(Gy) ≤0.1) means that with the growth of the teacher's graduation years (Gy), the professional foundation of teachers has a small decline. The function is defined as follows: where a is the constant coefficient of convergence rate. With a larger value of a, the function quickly converges to s. In this paper, a=1.3 and s=0.1 are used (the maximum attenuation value converges to 0.1). Definition 5: Teacher professional foundation (Tpf) is used to quantify the professional foundation of a teacher defined as: where ( 1, 2) i λ i = represents the proportion of the i-th graduation major (generally considered the first and last graduation major) of a teacher, it is defined as follows: where Cvi represents the correlation coefficient between the teacher's i-th graduation major and the major he is engaged in, the total correlation is 1, and the correlation is r, the concrete definition is as follows: In Eq. (7), Dw(0<Dw≤1) is the teacher's degree factor. If the degree is obtained in fulltime, Dw is 1, while if the degree is obtained part-time, Dw is a number between (0, 1) according to the total time.
To verify the validity of the definition, we select 40 teachers as sample data from a school in a second-tier university. The sample data are normalized and sorted by their professional foundation. As shown in Fig. 2, the obtained Tpf is differentiated. The result shows that the definition is reasonable.

Figure 2: Tpf example
Definition 6: Course difficulty coefficient (Cdc, 0.1≤Cdc≤1) indicates the difficulty of the course. A higher value means a more difficult course. We adopt an online survey system of course difficulty coefficient classified by major, where a difficulty coefficient is scaled from 1 to 10. The respondents are graduates and expert teachers of relevant majors in second-tier or equivalent universities. The total recovered questionnaires for each major shall not be less than the minimum threshold of m (including k teachers' questionnaires and l students' questionnaires, m=k+l). Finally, we use min-max normalization to normalize the difficulty coefficient to the interval [Emin, Emax] (Emin and Emax represent the lowest and highest difficulty coefficients of the course respectively. Emin=0.1 and Emax=1.0 are used in this paper), which is given by: where Qsmin and Qsmax are the minimum and maximum difficulty values of the courses investigated in a particular major. Qs is defined as: where w (0<w<1) is the weight of the expert teacher questionnaire survey and Cdi is the difficulty coefficient value given in the questionnaire for the i-th course. Aver Aver Eva Aver where Avermin and Avermax are the lowest and highest evaluation scores of all courses in the major. Aver represents the average score of student evaluations for the same course taught by a teacher over M semesters defined as follows: where Qtym represents the number of students participating in the evaluation of the course in semester m (1≤m≤M), and Averm (0<Averm≤100) represents the average score of the course in semester m.

TRFMs model and recommendation algorithm
Five variables are used to construct x (i) in FMs model feature vector X={x (1) , x (2) ,…, x (n) }: Teacher, Course, Teacher Professional Foundation, Course Difficulty Coefficient, and Teaching Comprehensive Evaluation, n is the total number of feature vectors. We use Tpf, Cdc, and Eva to represent the corresponding eigenvalues of teachers' professional foundation, course difficulty coefficient, and teaching comprehensive evaluation, respectively. Then, we use the partial recommendation value y (i) calculated by Eq. (14) to construct the recommendation label in the target vector Y={y (1) , y (2) , …, y (n) } of FMs: where ρ1, ρ2, and ρ3 are partial weights. If ρ1>ρ2 and ρ1>ρ3, the recommendation basis lays particular stress on teachers' professional foundation. We name such a Factorization Machine model as Teaching Recommendation Factorization Machines (TRFMs). Then, we constitute the TRFMs dataset by the feature vector X and the target vector Y.
An example of this dataset is shown in Fig. 3. x (1) x (2) x (3) x (4) x (5) x (6) x (7) Teacher Course Tpf In practical application, a course set is a large number of sets, and so is the teacher set. However, the courses taught by each teacher only account for a limited number of elements in the course set, which is bound to cause that the majority of elements in the TRFMs data set are 0. Thus, the TRFMs data set is highly sparse. We propose an improved recommendation algorithm based on the alternative least square method [Rendle, Gantner, Freudenthaler et al. (2011)]. After learning the optimal parameters of TRFMs from the training set, we calculate the recommendation accuracy of the model under different test sets and the TOP_N teacher recommendation accuracy classified by the specified course. The TRFMs recommendation algorithm is described in Algorithm 1.

Algorithm 1: TRFMs recommendation algorithm
Input: historical data in teaching information data warehouse O, degree obtaining factor Dw, professional correlation coefficient r, expert teacher questionnaire weight w, partial weight ρ1, ρ2 and ρ3, regularization parameter λ, initialize the standard deviation parameter σ.

Experimental evaluations 4.1 Experimental data
The experimental data is collected from the teaching data of a second-tier university in recent years. The data includes 928 teachers after desensitization, 2983 courses, and 1,956,632 valid evaluation scores obtained by averaging the student evaluation scores of courses taught by each teacher. During the normalization process of these data, we set different specific weight coefficients r, Dw, w, ρ1, ρ2, and ρ3 to obtain different experimental data. In this paper, according to the practical situation of the second-tier university, we select weight coefficients r=0.7, Dw=0.4,w=0.4,ρ1=0.2,ρ2=0.2,and ρ3=0.6 for normalization processing to obtain the experimental data set. The data are summarized in Tab. 1.

Evaluation index
Root Mean Squared Error (RMSE) evaluation index proposed in Zhu et al. [Zhu and Lu (2012)] is adapted to measure the accuracy of recommended experiments. The RMSE is defined as follows: where E T indicates the size of the test set TE. ytc and y * tc indicate the ground-truth and predicted recommended label values of course c taught by teacher t in the test set. Besides, we use P@N [Wang, Meng, Zhang et al. (2010)] to evaluate the relevance of the first N recommended teachers for each course. Because the experimental dataset is very sparse, the N values in this experiment only consider the first three, the first four and the first five values. P@N is suitable to evaluate TOP_N recommendation defined as follows:

Experiment results and analysis
To ensure that each course has data in both the training set and test set, in the TRFMs model experiment dataset (E), we choose 20% of the data of each course as the test set (TE), and the remaining 80% of the data (E-TE) as the training set.

Algorithm accuracy and sorting accuracy
In the experiment, firstly, the TRFMs optimization parameter Θ is trained according to the Algorithm 1. Then, teachers are recommended according to the classification of courses. Here, the initialization standard deviation parameter σ=0.01 is used following ], and the regularization parameter λ is obtained by the adaptive selection method proposed in ]. The accuracy of the teaching recommendation algorithm is compared with the factor decomposition dimension hyper-parameter k=8, 12,16,20,24,28,32,36. Theoretically, the parameter k is required to be large enough. A small value of k is used in the experiment since the experimental data is highly sparse so that not enough sample is given to estimate the complex interaction matrix. We conduct ten experiments for different k values, and then their mean value is used as the result value to obtain the precision comparison chart as shown in Fig. 4: Figure 4: RMSE under different k Fig. 4 shows that the algorithm can usually recommend from a highly sparse experimental dataset E, and k profoundly affects the recommendation accuracy. If k is small, RMSE is high (i.e., low accuracy). As k increases, RMSE gradually increases correspondingly. We further sort out the experimental results and obtain the comparison of the sorting accuracy of the algorithm at P@N when the dimension hyper-parameter k is 8, 16 and 32, and N is 3, 4, and 5. The results are shown in Fig. 5. As shown in Fig. 5, the larger the value of k is, the higher the ranking accuracy is, while the smaller the value of N is, the higher the sorting accuracy is under a reasonable range of the dimension super-parameter k.
Through the above experiment, we verified that the TRFMS algorithm provides a reasonable range of RMSE and P@N values for highly sparse teaching recommendation data sets. It indicates that the TRFMS algorithm correctly predicts the recommended label y * tc of each teacher for a given course, and recommends the teachers with the highest TOP_N predictive value y * tc for a given course. Consequently, the teaching effect will be improved.

Recommended accuracy comparisons between TRFMs and HOSVD
Tensor decomposition is also known as high order singular value decomposition (HOSVD). The Tucker decomposition model [Tucker (1966)] decomposes the three-dimensional tensor X into the product of a low-rank eigenvalue matrix on three dimensions and a core tensor: . C is the compressed tensor of tensor X which is much smaller than the original tensor and has a significant effect on a sparse dataset. The three-dimensional tensor recommended by teachers is constructed with the experimental data described in Tab. 1. The three-dimensional tensor t c e I I I X R × × ∈ is constructed with dimensions T, C, and E according to the four-tuples relationship (ti, cj, ek, y (i) ) of "Teacher(T)-Course(C)-Evaluation(E)-Recommendation Label (Y)". The corresponding element index is (ti, cj, ek), and the corresponding element value is y (i) with a partial recommendation value calculated by Eq. (14). If there is a teacher (ti) with a certain teacher professional foundation of Tpfi who teaches a course (cj) with a difficulty coefficient of Cdcj and gets an evaluation score (ek) of Evak, then the element value of tensor corresponding to the subscript (ti, cj, ek) is the weighted y (i) value. Otherwise, the corresponding element value is 0. Then, we adopt the Tucker tensor decomposition method [Tucker (1966)] to obtain the approximate tensor after the dimension reduction and generate the Top- of performing SVD calculation onX , and the modular multiplication complexity of solving the approximate tensor X (the same as solving the core tensor). In the algorithm, the dimension In of tensor X is much larger than the dimension Rn of decomposition factor; thus, the complexity of the algorithm can be reduced to In the experiment, we used the same parameters as in the previous experiment for the TRFMs algorithm but fixed the factorization dimension hyper-parameter k=24. For the HOSVD, we fixed the iteration threshold of Ɛ=0.0005 [Kolda and Bader (2009)]. Then we took 60%, 70%, 80%, 90%, and 100% of E-TE as training sets (the sparsity of training sets varies with the size of training sets) to compare their recommendation accuracy. Through the experiment, we obtained a comparison between their recommended accuracy and the running time of each iteration, as shown in Figs. 6 and 7. When N is 3, 4 and 5, we obtained the comparison of P@N sorting accuracy, as shown in Fig. 8: The results show that the recommendation accuracy of both models is improved with the increase in the training set. However, the recommendation accuracy obtained by the TRFMs model algorithm is slightly higher than that of the HOSVD model algorithm. The recommended sorting accuracy of the P@N is also slightly higher than the HOSVD model algorithm. The running time is also much lower than that of the HOSVD model algorithm. Those indicate that the recommended performance of TRMFs is superior to HOSVD for a highly sparse dataset.

Recommended differences comparisons under different specific weight factors
To compare the recommended differences of experimental datasets under different specific weight coefficients, we designed a series of experiments. In each experiment, we fixed the dimensional hyper-parameter k=24 of the TRMFs algorithm and compared their recommendation differences of Top_5 in the same course. The specific design is as follows: (1) Comparison of the effects of different r and Dw on the recommended results: We fixed w=0.4, optional r=0.7 and Dw=0.4 to construct the experimental dataset E1 while optional r=0.5 and Dw=0.2 to construct the experimental dataset E2. We completed TRFMs-based teaching recommendation experiments for E1 and E2 respectively, and the comparison results are shown in Tab. 2. Tab. 2 shows that the same course under different proportion coefficients r and DW (i.e., in different Tpf) provide different predicted recommended label values y * tc; that is, the recommended order of teachers is different. Through the analysis, it can be seen from Eqs. (6)-(8) that there are inconsistencies in each teacher's professional relevancy and degree obtaining method. That is, different values of Cvi and Dw lead to different values of Tpf. In this way, the order of teachers recommended for the same course will also be different. Therefore, different values of r and DW have a significant impact on the recommendation results.
(2) Comparison of the effects of different specific weight coefficients w on the recommended results: We fixed r=0.7, Dw=0.4, optional w=0.3 to construct the experimental dataset E3 while optional w=0.7 to construct the experimental dataset E4. we completed TRFMs-based teaching recommendation experiments for E3 and E4 respectively, and the comparison results are shown in Tab. 3. Tab. 3 shows that the predicted value of recommendation label y * tc can be changed with the change of w (that is, the change of Cdc value) while other data and the specific weight coefficient remain unchanged. However, there is no change in the recommendation results of Top_5 teachers in the same course. As can be seen from Eqs. (9) and (10), only a change of w will change the Cdc value of the course and the predicted value. However, when recommending teachers for the same course, a change of the Cdc value does not affect the order of recommendation. All the experimental results show that the TRFMs and the recommendation algorithm can accurately implement the course teaching teacher recommendation according to different focuses.

Conclusion
This paper proposes TRFMs and the recommendation algorithm to address the lack of scientific basis for teaching arrangement. Several normalized factors are defined: teachers' professional foundation, course difficulty coefficient and teaching comprehensive evaluation value. Based on the factors, a comprehensive recommendation value is computed using partial weights with historical teaching data. A highly sparse teaching recommendation dataset is constructed where Teacher, Course, Tpf, Cdc, and Eva are used as the attributes of feature vector X, and the comprehensive recommendation value is used as target vector Y. Then, TRFMs model and the recommendation algorithm are proposed for accurate teacher recommendation. The experimental results show that the proposed methods can be a new solution for the school course to realize intelligent and accurate recommendations of teaching teachers. Also, the proportion coefficients of the proposed methods can be adjusted to fit the situation of the target school, which leads to ideal recommendation results and effectively improve teaching quality.