A Knowledge-Fusion Ranking System with an Attention Network for Making Assignment Recommendations

In recent decades, more teachers are using question generators to provide students with online homework. Learning-to-rank (LTR) methods can partially rank questions to address the needs of individual students and reduce their study burden. Unfortunately, ranking questions for students is not trivial because of three main challenges: (1) discovering students' latent knowledge and cognitive level is difficult, (2) the content of quizzes can be totally different but the knowledge points of these quizzes may be inherently related, and (3) ranking models based on supervised, semisupervised, or reinforcement learning focus on the current assignment without considering past performance. In this work, we propose KFRank, a knowledge-fusion ranking model based on reinforcement learning, which considers both a student's assignment history and the relevance of quizzes with their knowledge points. First, we load students' assignment history, reorganize it using knowledge points, and calculate the effective features for ranking in terms of the relation between a student's knowledge cognitive and the question. Then, a similarity estimator is built to choose historical questions, and an attention neural network is used to calculate the attention value and update the current study state with knowledge fusion. Finally, a rank algorithm based on a Markov decision process is used to optimize the parameters. Extensive experiments were conducted on a real-life dataset spanning a year and we compared our model with the state-of-the-art ranking models (e.g., ListNET and LambdaMART) and reinforcement-learning methods (such as MDPRank). Based on top-k nDCG values, our model outperforms other methods for groups of average and weak students, whose study abilities are relatively poor and thus their behaviors are more difficult to predict.


Introduction
Educational data mining is an emerging discipline, concerned with developing methods for exploring the unique and increasingly large-scale data that come from educational settings and using those methods to understand students and the settings which they learn in better. In recent years, physical bricks and mortar classrooms are starting to lose their monopoly as a place of learning. e Internet has made online learning possible, and many researchers and educators are interested in online learning to enhance and improve students' learning outcomes while mitigating the reduction in resources [1]. Online learning platforms include Coursera, MOOC, and Udacity. Online assignments, such as quizzes, practice exercises, virtual labs, online literature searches, and simulations, play a critical role in online learning [2].
One of the most important tasks of an online assignment system in educational data mining is to find suitable questions for students according to their ability. An online assignment system can make learning more efficient. It can evaluate study performance and identify at-risk students, who can be given further help. For example, from a log of assignment results, we can identify topics that a student has poorly mastered and recommend to them related questions to improve their knowledge. us, the task is to rank a large number of questions and recommend only relevant questions to students.
An automated process for producing an online assignment works as follows: (i) Several questions, which are organized into an assignment, are assigned by a teacher weekly following the syllabus. Students are expected to finish them on time. (ii) e system can check the answers automatically, calculate the students' marks, and generate a report for each assignment. (iii) Since the number of candidate questions is too large to finish in one assignment, the recommendation system should be able to choose suitable questions and discard those that are too easy or too difficult.
Note that the goal of an assignment recommendation task is not whether the question can be answered correctly; that is, it is not a classification problem. Rather, questions should be ranked according to the importance of the topic in improving the student's ability.
e aim is to find the questions with the highest benefit for students.
In previous studies, this problem has been tackled by supervised and semisupervised ranking methods [3][4][5]. In particular, state-of-the-art reinforcement learning has been used, which considers the problem as a process of sequential decision-making and learns model parameters through maximizing the rewards accumulated from all decisions [6]. MDPRank is a ranking model based on a Markov decision process (MDP). It treats documents as states and ranks the position of documents at each iteration. However, the current MDPRank is imperfect if we apply it to our assignment recommendation system directly. It is obvious that a student's performance in an assignment not only depends on the questions, such as the marks for each question, the types of question, and the difficulty of each question, but also depends on their current knowledge, especially for those questions with similar knowledge points. Other researches [7,8] are close to our recommend target but both of them are based on study cognition and semantic content of questions, which is more complex than our situations In this article, we illustrate our motivation in Figure 1, which shows the relations between questions from different assignments. Our intuition is that the ranking is influenced by two dimensions: how questions have been answered in the same assignment and how questions were answered in previous assignments.
Based on the above intuition, we propose a knowledgefusion ranking system using an attention network, KFRank, which can improve the reliability and accuracy of ranking using the relations between knowledge points. Compared with the state-of-the-art learning-to-rank (LTR) and reinforcement-learning methods, our KFRank method has the following advantages: (i) KFRank considers the ranking problem as having multilevel dimensions and generates effective cognitive features for learning models.
(ii) KFRank utilizes knowledge points and integrates them during training. We build a cluster of questions using the knowledge points and generate an attention network to pretrain the terms in questions. ese are represented in vectors of questions in the classification phase. (iii) KFRank is based on an MDP but rebuilds the environment by considering multiple factors: questions in the current assignment and results from previous similar assignments. e rest of the paper is organized as follows: first, we formulate the problem and introduce the concepts used in the assignment ranking problem in Section 2. Section 3 gives the architecture of KFRank and proposes the attention network for training. Next, in Section 4, we describe our experiments with real-life datasets and compare the performance of the proposed model with other methods. Related work is discussed in Section 5, and Section 6 concludes the work.

Preliminaries
In this section, we first formally formulate the assignment ranking problem and then briefly introduce the LTR and reinforcement-learning methods for solving this problem.

Problem Definition
Definition 1. An assignment ω � (Q, p, t, λ), where Q � q 1 , q 2 , . . . , q |q| is a set of questions, p is a unique student, t is the assignment time, and λ is the score for q.

Definition 2.
e knowledge points of a question O are a set of knowledge points o q , which belong to question q. Definition 3. Our assignment ranking problem is to rank questions in an assignment by predicting the performance of each student based on the difficulty of the questions. Questions higher in the list are relatively easier for a particular student than other questions.

Ranking Using Supervised
Learning. LTR is a sorting method based on supervised learning. e user-item scoring matrix is produced by a recommendation algorithm after learning from a training set. Here, different sorting techniques, such as pointwise [9], pairwise [10], and listwise [11], can be used to obtain a sorting model. In the test phase, the system generates an ordered list of items for target users using the trained model. LTR can be online or offline. In offline approaches, the training set is produced by human assessors, which is timeconsuming and expensive. In contrast, an online LTR system collects data when users interact with the system, such as by clicking, moving a mouse, and entering a query string.

Ranking Using Reinforcement Learning.
Reinforcement learning is a branch of artificial intelligence. It is good at controlling an agent who can act autonomously in a specific environment and continuously improve their behavior. Suitable problems for reinforcement learning involve learning how to do tasks and how to map the environment into actions that maximize the rewards. In reinforcement learning, the learner is a decision-making agent who is not told what to do. Instead, they attempt a task repeatedly to find the behavior that gives the greatest reward. By giving each question a reward, reinforcement learning can learn how to rank them in an assignment as a closed-loop control problem.

The Kfrank Model
ree aspects of the KFRank model are presented in this section: (1) an MDP for ranking, (2) a knowledge-fusion model for updating the environment, and (3) the architecture and algorithm (Algorithm 1) of KFRank.

Ranking Using an MDP.
Analyzing study performance can be formalized as an MDP, in which the construction of a list of ranked questions can be considered as a sequential decision-making process in which each time step corresponds to selecting a question for a corresponding position. We propose a tuple 〈S, A, T, R, π, L, χ〉 to illustrate KFRank by states, actions, transition, reward, policy, history, and rebuilder, which are defined as follows.

States.
S is a set of states that represent the environment of the current assignments. In ranking, the agent should know the current positions as well as the remaining questions. us, state S t for step t is [t, X t ], where in our model X is initially treated as a given assignment ω, and X t are the questions still to be ranked Q t .

Actions.
A are a discrete set of actions that an agent can take in which available actions can depend on the state S, denoted as A(s t ). At step t, a t ∈ A(s t ) is used to calculate the value of each q in ω and to select a question q m (a t ) for the ranking position t + 1, where m(a t ) is the index of the question selected by action a t .

Transition. T(S, A)
is a function that maps state S t and action A t to a new state S t+1 as S × A ⟶ S. At step t, action a t selects q m (a t ) and removes it from Q t as follows: (1)

Reward.
e state value function V: S ⟶ R is a scalar evaluation, estimating the quality of the entire list of ranked questions (an assignment) based on the input state S. Here, we define the value function as DCG: where y m (a t ) is a relevance label for the current selected question q m (a t ). In our model, we calculate y m (a t ) according to a student's performance and the difficulty of a question. e difficulty θ of q is defined as where r � 0 means the result is wrong and r � 1 means the result is right. us, y m (a t ) is defined as

Policy. π is a function that takes a state as input and outputs a distribution over all possible actions a ∈ A(s).
KFRank calculates the probability of selecting each question based on its current rank: where w ∈ R K are the model parameters, whose dimension is the same as that of the ranking feature. In our case, the policy is an agent's strategy to rank the assignment by predicting the study results and measure the rewards by nDCG. Obviously, some policies are better than others, and there are multiple ways to assess them. 3.1.6. History. L is a set of historical assignment results for a group of students. Here, L � l q p , where l is a previous result, q is a question, and p is a student.
L is designed as an information retrieval system and contains student IDs, questions, knowledge points, results, and operating time. Several metrics related to information retrieval can be used to compare the similarity of given questions and archived questions as follows: where q is a question in ω, M is an information retrieval search function, and k is the number of output results.
3.1.7. Rebuilder. χ is a module that updates current states s to s t with D as input. We extract a student's performance on special knowledge points o from L and predict their current ability using a recurrent neural network.

Basic Study State.
We calculate the effective features for ranking in terms of the relation between a student's knowledge cognitive and the question. In this article, we first estimate the difficulty of each question using a correctness ratio. en, for each student, their knowledge cognitive is measured as their average score for all completed assignments based on knowledge points. Here, a student's knowledge cognitive is dynamic and updated during the study process. us, if a student does very well on an assignment, all the related cognitive levels increase rapidly. On the other hand, the difficulty of a question is fixed or not easy to change, because it depends on the performance of all students.
Based on the student's knowledge cognitive and the difficulty of the questions, we utilize several similarity functions to construct the representation of the student's states for ranking according to the traditional learning-torank method, including Euclidean distance, Pearson's similarity, Manhattan distance, cosine similarity, and so on.
As shown in Figures 2 and 3, one question can contain several knowledge points and one knowledge point may have many related questions. us, for a given question with several knowledge points, we first calculate the state for each knowledge point and then merge all the related states. According to the relationship of knowledge points in the question, we can obtain many basic triples in the knowledge graph. We will introduce how to build a reasonable representation of knowledge points in Section 3.3.

Knowledge Representation with
TransR. Since the content of math questions can vary considerably, we use knowledge points to illustrate the relations between questions, as Figure 3 shows. From the knowledge points and the relations between them, which are manually marked, we can construct the triples in the knowledge graphs. In this project, the relations between knowledge points are classed as contains, belongs, and equals. e following is an example.
In triangle A, angle B is equal to 90°and angle B is a right angle.
Here, triangle is a knowledge point, the relation between A and B is contains, the relation between B and A is belongs, and the relation between right angle and 90°is equals. A triple is defined as (h, r, t), where h, t represent the embedding of knowledge points and r is the set of relations. From the knowledge graph, we can obtain the vectors of knowledge points using TransR [12].
In TransR, the score function is defined as and the model convergence is based on minimizing where c is the margin, S is the set of correct triples, and S ′ is the set of incorrect triples.

Attention-Based Knowledge-Fusion Model.
To tackle the various knowledge points, we design an attention-based model using knowledge fusion. As Figure 4 shows, Q is the representation of a student's current knowledge, as noted in Section 3.2, and k is the embedding of concepts trained from TransR. In the triple (query, key, value) in the attention mechanism, we set Q as k, the knowledge point as q, and the reward corresponding to the question as v. e model is trained in the same way as the encoder part in the transformer (Figure 4), the vectors of multiple knowledge points were incorporated into the basic study state with the way of attention. e output is the latest performance. It is equivalent to integrating the information of different knowledge points into the original state and getting a new vector to represent the current state. R in the figure represent the student's current knowledge state for each question. e status update process is shown in Algorithm 2.
We show the study state vectors before and after the attention model in Figure 5. We select ten different knowledge points and related study states and then use the attention model to pretrain input vectors. en, we use t-SNE to show the latent space representations of two states. Note that the study state after attention training is simple and is closely surrounded by knowledge points with the same color. Figures 6 and 7 illustrate the construction of the question ranking. For each question in an assignment, first, we extract related performance records. In each episode, the environment is updated with the current status from the knowledge-fusion model. Based on the policy and value function, the agent chooses the optimal action that gives the greatest long-term return. After taking action, the environment is updated. e sorting construct for a given training document can be formalized as follows. A student's assignment is a query ω, which is a set of questions Q with length M. e initial state is s 0 � [0, Q]. At each step t � 0, . . . , M − 1, the agent chooses the optimal action a t to select q m (q t ) from the set of questions Q as the rank t (lines 7 and 8 in 4

Overview of KFRank.
Computational Intelligence and Neuroscience Algorithm 1). e action is removed from Q t , as in equation (1) (lines 9 and 10 in Algorithm 1). We calculate y m (a t ) using equation (3) and calculate the long-term return reward. e process is repeated until all of the M questions have been selected.
We propose to learn the parameters w in KFRank using a policy-gradient algorithm based on reinforcement learning [13]. e goal of this algorithm is to maximize the long-term return G t : In the algorithm, the gradient Δ w J(w) is calculated as At each iteration, an episode is sampled with the current policy. At each step t, the parameters w are adjusted   according to ∇ w log π(a t |s t ; w), which maximizes the increase in the probability of repeating the action a t for state s t . In this way, G t moves the parameters in the direction that gives the greatest return for the action.

Datasets.
We conducted experiments to validate the performance of our method using a real-life dataset Raw collected data stu 1 :assignment 1 :qst 1 stu 2 :assignment 1 :qst 2 stu 1 :assignment 3     Computational Intelligence and Neuroscience (Table 1), from two applications in pad: teacher client and student client. e data spanned the period from September 2017 to June 2018. ere were nearly 70 million records for 40,000 students. Based on the student's historical activity and correctness rate, we selected a record of 300 students with relatively high quality of records. en, we apportion the data into training and test sets, with a 70-30 split. After that, excellent teachers create the mathematics knowledge graph of middle school, which contains more than 700 nodes and 2000 relationship edges. Finally, we use TransR approach to embed knowledge topic for learning.
To evaluate the effectiveness of the ranking, we split the students into three groups: merit students, average students, and weak students. In the dataset, almost 72 percent of the questions covered three or more knowledge points and less than 14 percent of the questions had only one knowledge point. e distributions of rewards for the three groups of students are shown in Figure 8.

Evaluation Criteria.
We first calculate the score of each question for ranking by the performance result of students and difficulty of question. For example, for a given question, only 60 percent of students could choose the right answer, the difficulty of this question is 0.4. en, we also give the correct and wrong answer with 5 and 1 score. Finally, students will get 2 points if they do the right questions, and only 0.4 if they make a mistake. For each assignment, according to the student's answer to each question, the rank score is calculated as the true value.
We use nDCG@k to measure the performance. To get nDCG@k, we first calculate DCG@k: where r is the rank of items in the recommendation list, k is the length of the recommendation list, f is the ranking function or algorithm, v r is the value of the rth item, and 1/log(1 + r) is the discount. iDCG is an ideal discounted cumulative gain, iDCG@k is also needed and calculated in a similar way. In iDCG@k, the questions in the recommendation list are ranked by their original values instead of by the ranking algorithm. Table 2 compares the performance of our model with other methods using nDCG@5 and nDCG@10. e higher the score, the better the performance. Of the LTR methods, CoordAscent, LambdaMART, and ListNet perform relatively well and better than the original reinforcement-learning method, MDPRank. KFRank with updated environments has the best nDCG value for nearly all groups of students. From the results, it can be seen that the different methods have similar trends for performance for the three groups. For example, the nDCG scores for weak students are always higher than those for merit students, which means it is easier to make predictions for weak students. Students whose performance varies depending on the difficulty of the questions will have a better score. en, we ran the experiment again and evaluated the stability of our method for various top k results. As shown in Figure 9, the performance of KFRank is generally better and Input: assignment records ω � (Q, p, t, λ) N n�1 , learning rate η, discount factor c, and reward function R Output: w (1) Initialize w with random values (2) s ⟵ updated state in an episode (ω, G), Algorithm 2 s 0 , a 0 , r 1 , . . . , s M−1 , a M−1 , r M ) ⟵ sample ranking in an episode (6) for t � 0 to M − 1 do (7) Sample an action a t ∈ A(s t ) ∼ π(a t | s t ; w)
Computational Intelligence and Neuroscience more stable than Random Forest and LambdaMART in traditional learning-to-rank methods. From the results of KFRank, KFRank is superior in the performance of average and weak students, indicating that KFRank has greater help for students with poor knowledge.

Related Work
Related work can be classified into two categories, those based on LTR methods and those based on reinforcement learning.
s � dense (att * v) (6) end for ALGORITHM 2: State updating in an episode.    [5] applied the LTR method for item recommendation and integrated social information between users in the training of the Listwise model to improve the quality of a sorted list of items. Canuto et al. [22] applied the LTR method to learn automatically how to sort labels. ey compared the performance of eight different methods of recommending labels. Ifada et al. [23] developed a novel LTR method Go-Rank for a label-based project recommendation system. ey directly optimized the graded average precision, resulting in an optimized list of recommended items. Huang et al. [24] reviewed recent research into recommendation algorithms based on LTR. ey generalized, compared, and analyzed problem definitions, key technologies, utility evaluations, and progress. Finally, they discussed and forecast the trends for recommendation algorithms based on LTR.

Making Recommendations with Advanced Reinforcement
Learning. Shani et al. [25] proposed an MDP-based collaborative filtering model, which uses a finite window for history, instead of an unbounded one, to define the current state. It can be regarded as approximating a partial observable MDP (POMDP). Since POMDP has high computational and representational complexity, various strategies have been suggested for simplifying it, such as policy-based optimization [26], value function approximation [27], and stochastic sampling [28]. Regarding sequential decision problems, Sunehag et al. [29] designed a reinforcement-learning agent using high-dimensional combinatorial slate-action spaces and achieved remarkable results. As ranking is a key issue in practical recommendation problems, any improvements in ranking contribute significantly to reinforcement recommendation systems. Zhang et al. [30] used a log-based document reranking modeled as a POMDP. Wei et al. [6] proposed a novel LTR model based on a MDP, referred to as MDPRank, which directly optimizes a ranking using a MDP. FAIR-PG-Rank recommends items via a policygradient approach which could satisfy fairness of exposure constraints with respect to items [31]. A similar idea in the article is generating unified term impact (UTI) during the indexing time and combining into a hybrid model to improve the accuracy [32]. Since the relationship between study performance and exam results is much complicated, article [33] finds the correspondence of input values and predicts targets which is not a one-to-one relation, treats the classification task as a fuzzy geometrical problem, and proposes a fuzzy similarity approach to solve the problem [34,35].

Conclusions
Assignment recommendation is an essential and trick task in online study research. In this paper, we investigated how to predict the performance of students by using assignment ranking mechanisms. Based on traditional learning-to-rank models, we proposed a knowledge-fusion model with an attention network named KFRank, which employs two novel features compared to previous methods: (1) an attention network for capturing multiple knowledge factors in human behavior and (2) a reinforcement-learning module for ranking questions by their predictable reward. Our model could capture both current study status and previous study performance of similar math topics. Extensive experiments on a real-world dataset with three different levels of students showed that KFRank significantly outperforms other methods in most cases.
In the future, there are still some directions for further studies. First, besides the historical log, we would like to measure the study performance in more aspects, for example, by studying cognitive model. Second, for the policy strategy, some network optimization [36] and fuzzy theories could be introduced in our model [34]. Finally, as our

Computational Intelligence and Neuroscience
KFRank is a general framework, we will test its performance on other disciplines (e.g., click-through rate prediction) and, meanwhile, on the similar applications in other domains, such as the user behavior of customers in e-commerce.
Data Availability e homework record data used to support the findings of this study were supplied by the Xuehai Education Technology Co., Ltd., in China. Since data would reveal personal activities and the size of data is huge, the data cannot be made freely available. We are glad to supply part of the data after removing the personal information and unique IDs for your research. Requests for access to these data and project code should be made to Canghong Jin with e-mail jinch@ zucc.edu.cn.

Conflicts of Interest
e authors declare that they have no conflicts of interest.