Towards Student Learning Ability Estimation and Truth Discovery in Japanese Online Course

This paper focuses on an important task in online courses, which is to estimate student learning ability from students’ answering records. The challenge of this task is to automatically estimate the learning ability for students and infer the true answer for each question without any supervision. Most of the existing methods solve these challenges by designing an optimization objective function. However, these approaches ignore the characteristics of students from different groups. Intuitively, outstanding students always provide correct answers and should be assigned higher weights compared with ordinary students. Based on this intuition, this paper proposes a new optimization framework by dividing students into two groups: an authoritative group and an ordinary group. The losses of the objective are from both authoritative students and ordinary students. Through optimizing both losses simultaneously, the proposed model can automatically estimate learning ability for authoritative students and ordinary students and infer the right answer for each question. Two experiments conducted on two datasets show that the proposed model outperforms state-of-the-art baselines.


Introduction
In recent decades, online courses become increaseingly popular, and more and more students study online courses by themselves, such as courses about deep learning techniques [1] in computer science. It is hard for teaching faculty to grade all the online students' homework or exams and evaluate each student's learning ability. Fortunately, we have collected a large number of answering records from students. Although students are not experts and errors are inevitable in their answers, it enables teaching faculty to estimate student learning ability and infer correct information (i.e., the truths) from the conflicting data provided by students.
To estimate the truths, the most naïve way is majority voting, which selects the most frequent answers from all the students as the final output. However, this simple approach treats all the students equally and ignores the differences among the learning ability of students. To address this issue, more advanced techniques are proposed [2][3][4][5][6][7][8][9][10][11][12][13][14][15][16], which try to estimate the right answers for each question as well as automatically learn a learning ability weight for each student. Despite the differences in those models, they all follow the same principle. The more reliable a student is, the more likely this student would provide a trustworthy answer, and vice versa. Following this principle, existing models assign larger However, these approaches only assign weights to students and do not distinguish the importance of their groups. Intuitively, outstanding or authoritative students always provide correct answers and should be assigned higher weights than ordinary ones. In other words, even only using the data provided by these authoritative students, we still can obtain good performance for the correct answer estimation task. Thus, it is crucial to consider the group information of each student.To tackle the challenges mentioned above, in this paper, we design a novel authoritative truth discovery (ATD) framework for estimating student learning ability and infer correct answers for questions incorporating the group information of students. In particular, we divide students into two groups based on their daily performance when taking the online course, i.e., authoritative students and ordinary students. We separately consider the answering records from these two groups and design a loss function for each student group. These two losses work together and enhance each other. An iterative optimization approach is used to learn the weights for students and infer truths simultaneously. We conduct experiments on two real world datasets: one is collected from an exam of the Japanese online course at Dalian University of Technology, and the other is from a question-answering TV game show, which is similar to the exam dataset. Experimental results demonstrate the effectiveness of the proposed ATD framework compared with state-of-the-art baselines.

Problem Formulation
In this section, we start by introducing some basic terminologies used in this paper and then define our problem formally. We use some examples to illustrate these concepts.
 Definition 1. An object is a question that students answer in the exam or homework; a source is a student that contributes information about the objects; and an observation is the answering record perceived by a student on a particular question.  Definition 2. A source's weight is defined as the reliability degree of the information provided by the source. A higher weight indicates that the source is more reliable and observations from this source are more accurate.  Definition 3. An authoritative source is defined as the one who always provides correct information for objects. In general, authoritative sources may have greater weights. Except authoritative sources, the remaining ones are ordinary sources.  Definition 4. An estimated truth is defined as the value of an object learned by algorithms; and a truth is the real value of the object. Input: The inputs of the proposed method are objects, and observations provided by ordinary sources and authoritative sources.
 Definition 5. The observation for the -th object provided by the -th ordinary source is denoted as ; and the observation for the -th object provided by the -th authoritative source is . Output: The goal is to estimate the true values, ordinary sources' weights and authoritative sources' weights.
 Definition 6. The weight the -th ordinary source is denoted as ; and the weight of the -th authoritative source is .  Definition 7. The estimated truth for the -th object is the most trustworthy information provided by sources. Note here the real truth of each object (denoted as * ) is only used in evaluation, and the estimated truths and sources' weights are usually unknown a priori. Based on these definitions, we can formally define our problem as follows: 

Methodology
In this section, we introduce the proposed Authoritative Truth Discovery (ATD) model, which estimates the truth for each object and learns both ordinary and authoritative sources' weights from multi-source data. We first give a general overview of the proposed model and present details of the proposed model for both categorcial and continuous data types.

ATD Model
Different from existing methods in truth discovery, we distinguish observations contributed by authoritative sources from ordinary sources when inferring truths. Since authoritative sources often provide correct information to objects, their weights may be greater than those of ordinary sources, which leads to that the proposed method may estimate the truth more correctly. However, authoritative sources cannot guarantee that the quality of all the observations they provide is high. Therefore, the proposed model needs to combine ordinary sources' observations to improve the performance of inferring the truth. That is because ordinary sources also can provided trustworthy information for some objects. Therefore, modeling observations from these two kinds of sources simultaneously when inferring true information is more reasonable. The benefit of this method is that jointly modeling observations from ordinary and authoritative sources can help estimate accurate truth for each object, in turn, correct estimated truth leads to learn reasonable sources' weights. Based on the above analysis, we propose the following optimization model that can jointly model observations contributed by authoritative and ordinary sources: Where ⋅,⋅) is the distance function. ∈ 0, ∞ is a trade-off parameter to balance between the two terms in the objective function. If 0, the proposed method only models observations provided by ordinary sources; and if → ∞, just authoritative sources' observations can affect the final estimated truth.
Since the proposed method is a joint model, it should estimate the truth , authoritative sources' weights and ordinary sources' weight simultaneously. In order to achieve this ultimate goal, block coordinate descent approach is used to iteratively update the value of one set while fixing the other two sets until convergence. In the following, we will introduce the details of iteratively solving the optimization model.

Source Weight Computation
In order to learn sources' weights, we need to fix the estimated truth of each object. Sources' weights can be computed by the difference between the estimated truths and observations provided by authoritative and ordinary sources.  We can derive the update rule for each ordinary source's weight and each authoritative source's weight using the above two equations: These update rules show that a source obtains a higher weight when its observations are more close to the estimated truths.

Truth Estimation
The final goal of the proposed model is to infer each object's true information based on observations contributed by sources. In this paper, we focus on estimating the truths for categorical data. the most commonly used distance function is 0-1 loss, i.e., if the observation is the same with the estimated truth, the loss is 0; otherwise, the loss is 1. The loss function is formally defined as follows: When the sources' weights are fixed, to minimize the objective function based on 0-1 loss function, the estimated truth of each object should be the value that receives the greatest weighted votes among all answer candidates, which can be formally given as follows: where , 1 if , and 0 otherwise. From the above truth estimation equation, we can observe that with the increasing of , the importance of observations from authoritative sources improves significantly until achieving the best performance. After that, the performance may be drop when increases. This fits our definition on authoritative sources which may provide incorrect observations.
The proposed ATD algorithm work as follows. The inputs of ATD algorithm are the observations both from ordinary sources and authoritative sources as well as the trade-off parameter . It starts by initializing the estimated truths by simple majority voting. The iterative process then begins. First, we collect the estimated truth for each object. Then, update ordinary sources' weights and authoritative sources' weights respectively. Finally, the algorithm returns the estimated truths, ordinary sources' weights and authoritative sources' weights.

Experiments
In this section, we conduct experiments to validate the performance of the proposed ATD model. We first introduce two crowdsourced datasets, then describe the baselines for comparison, and finally, analyze the performance of all the models.

Datasets
In our experiments, we use two datasets: Exam and Game datasets. (1) Exam dataset is collected from a final exam of the Japanese online course in the School of International Information and Software at Dalian University of Technology. Each question has multiple choices but only one correct answer. The Exam dataset contains 43 questions, 199 students, and 8,544 answering records. (2) The Game dataset is collected from a crowdsourcing platform via an Android App based on a TV game show "Who Wants to Be a Millionaire". The Game dataset contains 2,103 questions, 37,029 users, and 214,849 answering records. For each dataset, we divide students/users into tow groups. In the Exam dataset, 31 students belong to the authoritative group, and the remaining 168 students are in the ordinary group. In the Game dataset, 3,636 users are labeled as authoritative sources, and others are ordinary ones.

Baselines
We choose the following approaches as baselines: Majority voting (MV) estimates true answers as the one given by most users, where each user is considered with equal weight. TruthFinder [3] computes the probability of each answering link being correct given by the estimated user reliability degrees. It also considers the influences between answering links. AccuPr [4] applies Bayesian analysis and also adopts the idea of influence among answering links. Moreover, it uses the idea of a complement vote. Investment [5] adopts the idea that users "invest" their reliability on the answering links they provide. The weight of an answering link grows non-linearly with respect to the sum of the invested reliability from its user. 3-Estimates [6] consider the difficulty of getting the truths when calculating the user reliability degree. It also adopts the idea of a complement vote. CRH [7] is an optimization framework, which minimizes the weighted deviation of the answering links and truths. CATD [8] considers the long tail distribution of answering records to discover the truths.

Performance Comparison
To evaluate the performance of the proposed ATD model and all the baselines, we use error rate as the evaluation metric, which is defined as the number of incorrect estimated questions divided by the total number of questions . The lower error rate means that the estimation is closer to the ground truth, and the method is better than others with larger error rates. Table 1 lists the error rates on both datasets. We can observe that the proposed ATD can achieve the best performance compared with all the baselines. In the designed ATD model, there is a parameter to control the contribution of authoritative users when inferring the truth for each question. To investigate the performance change with the value of , we conduct the following experiments as shown in Figure 1. From Figure 1, we can observe that with the increasing of , the error rate drops on both datasets. When 0, it means that only the records provided by ordinary sources are used to estimate the truths. In this scenario, the error rate is much larger, which demonstrates that the authoritative users are more important for truth discovery task. When is large enough, the authoritative users dominate the objective function, and only using those data, the proposed ATD still achieves better performance compared with baselines, as shown in Table 1. In our experiments, we set 5 and 25 for the Exam dataset and Game dataset, respectively.

Conclusion
Estimating student learning ability and inferring true answers for questions from the crowd data is an important and practical task in online courses. The key challenges are to automatically (1) learn the weights for students and (2) infer truths without any supervision. Existing methods only assign a weight to each student, but ignore the importance of the categories of students. To address this issue, in this paper, we propose a new method to infer the truths of objects by considering different groups of students. From experimental results, we demonstrate the effectiveness of the proposed ATD method on two datasets. In the future, we will investigate how to automatically adjust the value of .