A Q-matrix Model Based on Binarized Neural Network

Q-matrix theory plays an important role in the field of cognitive diagnosis assessment. It is time-consuming and laborious for experts to define Q-matrix from a large number of data. In order to deal with this problem, this article comes up with a Q- matrix generation model based on binarized neural network. We combine the neural network with the Boolean operation relations of Q-matrix, A matrix and R matrix in item response theory, so that the model can mine Q-matrix more effectively.


Introduction
In recent years, many data mining methods have been applied to the field of education to mine students' knowledge state from their practice records, such as traditional classical test theory (CTT), item response theory (IRT) [1], and cognitive diagnosis assessment.
The classical test theory (CTT) and item response theory (IRT) only focus on the evaluation of students' level by using exam scores and they cannot reflect students' knowledge status. Cognitive diagnosis assessment [2,3] judges students' mastery of specific attributes by analyzing their responses to questions, which makes up for the shortcomings of CTT and IRT.
Let us suppose that there are a number of students and items, and a matrix n m R   } 1 , 0 { indicating the students' answer. ij R express that student i respond to item j and 1  ij R indicates that the student i reply to the item j exactly. As shown in Table 1, 2 and 3, Q-matrix theory in cognitive diagnosis can well connect the knowledge status of students with the performance of answering questions. For example, to solve item 2+3-1 one must master the addition and subtraction concepts. In addition, it is necessary to determine the knowledge status of students, namely A matrix. For example, student 2 only mastered the two attributes of subtraction and division, so he cannot solve the item 2+3-1.
Generally speaking, the main contributions of this paper has the following two aspects: We propose a neural network-based model to automatically extract Q-matrix from students' practice records. The model combines the Boolean operation relations of Q-matrix, R-matrix and A-matrix in cognitive diagnosis theory on the basis of binarized neural network.
The proposed binarized network model, can not only extract the Q-matrix, but also extract the knowledge state(A-matrix) from the hidden layer of neural network.

Binarized Neural Network
The Binarized Neural Network (BNN) [19] is a neural network obtained by binarizing its weights and activation functions. Specifically, the weight W and hidden layer activation values are changed from floating-point (31 bit) to + 1 or -1 (1 bit), so as to occupy less storage space. At the same time, the binarized neural network uses bit operation to replace the multiplication and addition operation in the original neural network, which does not change the structure of the network, but binarize the parameters and activation function values of the network.
There are mainly two methods to binarize neural network. One is the stochastic method as formula (1), the weight values and the activation function values are converted into a probability through a function, and then the value is +1 or -1 with a certain probability. The is a "hard sigmoid" function in formula (2). The other method is the direct determination method, it is directly determined by the floating-point weight value or the sign of the activation function value. It can also be seen as binarizing the weight W and the hidden layer activation value to 1 or -1 as shown in formula (3).
The stochastic method is difficult to achieve, so most experiments use deterministic methods. In the process of BNN forward propagation, we first binarize the weights by ). Then, the binarized parameters are used to calculate the real number intermediate vector, and then the real number hidden layer activation vector is obtained by batch normalization operation. If it is not the output layer, the vector is binarized.
In the process of BNN backward propagation, we need to use the derivative of a symbolic function, which is almost zero everywhere. Hence, we have to relax the symbolic function as shown in formula (4) and formula (5). The derivative is formula (6). Suppose the loss function is C and the binarization function is , then the derivative of C to x is as in formula (7).Where, represents the derivative of C to x , and ) (q g is the derivative of C to q . When the absolute value of x is less than 1, the gradient of x is equal to the gradient of q , otherwise the gradient of x is 0. Figure 1 shows the relaxed symbolic function.  (8), matrix factorization aims to decompose matrix R to two matrices Q and A . In cognitive diagnose assessment, the Q-matrix reflects the connection between items and knowledge concepts while the A matrix indicates the knowledge state of students.
The objective function of matrix decomposition is show in formula (9).

Q-Matrix generation model of binarized neural network
It is an long-standing topic in educational data mining to mine students' knowledge states from their practice records. Matrix decomposition and neural network have been proved to be effective in solving this problem.
This part presents the proposed model to combine matrix decomposition and binarized neural network to mine students' practice records to generate Q-matrix.

Task Overview
Given the record of the students' practice, containing each student's answer to each item, it can be abstracted as R matrix, if student i correctly grasped the item j and 0  ij R otherwise. The Q-matrix defined by experts is also given, i relates to knowledge concept j k and 0  ij Q otherwise. Our model aims to mine Q-matrix from datasets, and reconstruct R matrix based on CMD.

Binarized Neural Network model
R is a matrix operation in formula (11) , ∘ means Boolean multiplication. It is defined as formula (12).
After the above formulation, can be obtained as formula (15). The vector form is written in formula (16). Where * means the row vector of Q-matrix, means the row vector of A-matrix, and e means a unit vector whose components are all one.
The model of the simplest three-layer neural network is shown in the Figure 3.  Fig. 3 three-layer neural network The input to the network is the row vector of the R matrix (student's practice record), the input layer to the hidden layer is a fully connected neural network, the hidden layer to the output layer is also a fully connected neural network. We set the number of neurons in the input layer equal to the number of items, the number of neurons in the hidden layer equal to the number of attributes, and the number of neurons in the output layer equal to the number of items.
The activation functions in the network model are all sign functions, and the Q-matrix can be extracted from the weight of the middle layer. The weight value from the hidden layer to the output layer is the transpose of the Q-matrix mined from the data. And the activation value of neurons in the last hidden layer is A matrix.
The objective function of model optimization: In formula (17), means Cross entropy loss function,  Q means the real Q-matrix defined by experts and β means weigh parameters. We input row vectors of the R matrix to train the model. By continually narrowing down the value of the optimization function, we extract the mined Q-matrix between the output layer and the hidden layer.
Generally speaking, deep neural network has stronger expression ability and ability to extract features. On the basis of Figure 3, we can build a deep binary neural network model to achieve better fitting effect, as shown in Figure 4. In a way, our model is very similar to the idea of an automatic encoder. The output form of our model is consistent with the input form.

Datasets
Simulated Data: This dataset is a dataset artificially simulated by IDR method. The data set contains 1000 students and 50 items and each topic has 8 attributes or more. After the data set is expanded, we extract the training data according to different proportion and use the rest as the test data. FrcSub: This dataset was used for fractional subtraction tests and first appeared in Tatsuoka [8], which include 536 students, 20 items, and 8 implicit attribute.
The table 4 shows the difference between the Simulated data and FrcSub dataset. We add slipping parameters and guessing parameters to simulate real data when using IDR method to generate artificial data. The composition of the data set is shown in Table 4

Evaluation Criteria
Let's start with two existing models, and then compare our model with them. DINA: DINA model is one of the most commonly used and important models in cognitive diagnosis and is often used as a benchmark for cognitive diagnosis.
NMF: Nonnegative matrix decomposition is an effective multiple data decomposition technique, which has no negative value in all matrix properties. In the field of personalized learning, the method can also be used to predict students' performance.
Two evaluation indexes were adopted, including accuracy (Acc) and root-mean-square Error (RMSE).The accuracy is higher and the root-mean-square error is smaller, the model effect will be better. (19) As shown in Formula (19), ij R means the real response of student I on item J, and  ij R means the predicted response of student I on item J, |T| is the total number of ratings in the test set.

FN TN FP TP
In the process of model training, we use the five-layer model. In order to reduce the error of the result, we will train the model five times to obtain five groups of training parameters. By extracting five groups of Q-matrix values, the final predicted value of Q-matrix ( pre Q ) was obtained by voting.
For the test set, we only need to extract the A matrix in the last hidden layer of the model. By combining the Q-matrix ( pre Q ) obtained from training with the A matrix extracted from the test set, we can reconstruct the R matrix  R of students.
In order to evaluate the performance of this model, we built the training sets based on different knowledge concepts. We randomly chose the training set from the simulation data set and use the remaining data as the test set. Table 5 and Table 6 compare the RMSE of DINA, NMF algorithm and Q-matrix generation model based on binarized neural network. And in this paper, a method of generating Q-matrix and reconstructing R matrix on four datasets is proposed.
Tab. 5 4. The Acc of Q-matrix generation On the whole, the performance of our model is better than that of the traditional cognitive diagnostic model. In particular, when there are fewer knowledge points involved in the item, our cognitive diagnostic model is not very prominent compared with the traditional cognitive diagnostic model. However, with the increase of knowledge points, the advantage of our model is gradually obvious. Table 7 compare the ACC of DINA, NMF algorithm and Q-matrix generation model which based on binarized neural network proposed in this article to generate Q-matrix respectively on four datasets.On the whole, the performance of our model is superior to the DINA model and the NMF algorithm. This paper has proposed a Q matrix generation model by combining matrix decomposition and binarized neural network in the field of cognitive diagnosis. Our model can automatically extract the Q-matrix from the student response matrix. The experiment results indicates that our model is superior to the other two models. In future experiment, we will improve the performance of the model and apply it to more data sets.