An Application of Collaborative Filtering in Student Grade Prediction

This research presents the process of student performance prediction by using the collaborative filtering (CF). The benefit of this research includes assist instructor to identify student performance, personalized advising, and student degree planning. The CF technique composes of similarity calculation and prediction. In our experiments, a prior course clustering with heuristic knowledge is adopted and different techniques of similarity calculation are compared. The performance of each student has been predicted by using existing grades available at that time.


Introduction
Student performance prediction in future course is important as it provides valuable information to facilitate student success.In this paper, we present the process of student performance prediction by using the collaborative filtering: CF [1], which is one of the most popular techniques wildly used for student performance prediction.The performances that students achieved in the earlier courses are used to predict grade that they will obtain in future courses.The algorithm is based on the idea of finding the most similar students.We have performed various methods to calculate students' similarity, i.e.Pearson correlation, cosine similarity, and Euclidian distance.The performance of each method is experimentally evaluated on a dataset obtained from Dhurakij Pundit University with enrollments of 200 undergraduate students between 2012 and 2016 from the Faculty of Information Technology.Our experiments shows that finding students' similarity with Pearson correlation achieves the lowest prediction error and a prior course clustering with heuristic knowledge can enhance predictability.
The rest of this paper is organized as follows: related work and fundamental concept is given in Section 2. The proposed method is described in Section 3. Experiments are conducted in Section 4. Conclusions are summarized in Section 5.

Preliminaries
This section summarizes related work and briefly defines the fundamental concept needed to facilitate the presentation of the proposed algorithm.

Related work
Different models have been developed in order to predict student's performance and many approaches rely on collaborative filtering methods.The similarities of students are calculated utilizing their study results,

P -292
represented by the grades of their previously passed courses.A recommendation tool called the personalized Grade Prediction Advisor (pGPA) was proposed in [3].The system allows user to set parameters such as number of similarity students used for prediction.Another course recommender system for University College Dublin's on-line enrolment application was proposed in [2].The system recommends elective modules to students based on the core modules that they have selected by using item-based collaborative filtering.
On the other hand, [5] presented future course grade prediction methods that utilize approaches based on linear regression and matrix factorization.Hybrid methods and content features are also used in [6].

User-based collaborative filtering
Collaborative Filtering algorithm is based on the main idea that people have similar preferences and interests.One user's behavior is compared with other user's behavior to find his/her nearest neighbors, and according to his/her neighbor's preferences or interest to predict his/her preferences or interest.Suppose that U = {u1, u2,..., um} is a list of m users and I = {i1, i2,..., in} is a list of n items.Each user Ui gives rating scores for a list of items Iui.The prediction problem is to predict the rating active user Ua will give to an item Iua from the set of all items that Ua has not yet rated.The CF technique composes of 3 steps as follows: 1) users similarity calculation 2) top N nearest neighbors selection and 3) prediction.

Similarity and distance
Various methods can be used to find similarity between users such as Pearson correlation and cosine similarity.On the other hand, dissimilarity calculation, i.e.Euclidean distance can be converted to similarity.

Pearson correlation
Let the set of items rated by both users u and v be denoted by I, then similarity coefficient ) , ( v u sim between them is calculated as Here i u r , denotes the rating of user u for item i , and u r is the average rating of all items given by user u .
Similarly, i v r , denotes the rating of user v for item i , and v r is the average rating of all items given by user v .

Cosine similarity
The similarity ) , ( v u sim between user u and v is calculated as where i u r , denotes the rating of user u for item i , and i v r , denotes the rating of user v for item i .

Euclidean distance
Euclidean distance for two user u and v is calculated by Here i u r , denotes the rating of user u for item i , and i v r , denotes the rating of user v for item i .Then, obtained distance scores are converted to similarities by

Prediction
Once similarities are calculated, a set of top-k users most similar to the active user u are selected and their rating scores are used for the prediction i u P , of the specific item i for user u as follow:

Dataset and Method
For each student whose grade needs to be predicted, a set of similar students are identified by using their grades from courses that they have already taken.The data used for this study obtained from Dhurakij Pundit University with enrollments of 200 undergraduate students between 2012 and 2015 from the Faculty of Information Technology.The dataset comprised of 200 students and

P -293
An Application of Collaborative © The 2015 International Conference on Artificial Life and Robotics (ICAROB 2015), Jan. 10-12, Oita, Japan their 12,000 grades.The A-F letter grades were converted to the 4-0 scale.The performance of each student who enrolled in semester 2, 2015 has been predicted by using grades available at that time.

Prediction without prior courses clustering
Our study have comprised of 3 steps in user-based CF to make a prediction for each student as follows: Step 1: Calculate similarity between the active student a S and every other user by using Pearson correlation, cosine similarity, and Euclidian distance.
Step 2: Based on their similarity scores, various set of k students, most similar to active student a S is then selected.
Step 3: Prediction for grade student a S will receive from the course i is generated by using grades of course i that k similar neighbors have already taken.1. shows an example of similarity scores for student S1 obtained by various approach.The result is slightly different for Pearson correlation and cosine similarity methods.For Pearson correlation method, top 3 most similar students are student S25, S13, and S10 with similarity scores 0.9466, 0.9249 and 0.9427, respectively.The result from cosine similarity method depicts that top 3 most similar students are student S25, S10 and S13, respectively.On the other hand, top 3 most similar students form Euclidean distance are student S23, S41, and S55.
Once similarities for each student are obtained, the performance of each student who enrolled in semester 2, 2015 has been predicted as show in Table 2

Performance evaluation
The performance evaluations were conducted using accuracy measure and root mean square error (RMSE) obtained by Eq.( 6) and Eq.( 7), respectively.
where i Y ˆ is a predicting value of subject i and i Y is a real value of subject i .

Experiment
We compared the predicted grades with the actual grades of students who enrolled in semester 2, 2015.The P -294 performance of each approach is compared as shown in Table 3.In our experiment, Pearson correlation achieves the best accuracy.There are hardly different between Pearson correlation and Euclidean distance method.Since the A-F letter grades were converted to the 4-0 scale, which is actually discrete.Then, the result of prediction has been compared to real performance of student which has been removed from the experiment data.However, letter grades come from marks range such as 90-100 for A, 85-89 for B+.It means that with only 1 mark different, student who has mark at 89 will get grade B+ while student with mark at 90 will get grade A. Therefore, the comparison was relaxed in the boundary of deviation error, 0, and 0.5.By (i) Error deviation 0 means the result of prediction have to exactly match to the real performance of student, and (ii) Error deviation 0.5 means the result of prediction will be counted as corrected if it differs from the real performance of student less than or equal to 0.5, as shown in Table 4.

Table 1 .
Example of similarity scores for student S1

Table 2 .
. Example of grade prediction for student S1 with different similarity approach

Table 3 .
Comparison of accuracy for each method

Table 4 .
Grade converted scale with tolerance 0.5The corrected answers have been taking to calculate the average of accuracy and root mean square error, as shown in Table5 and 6.

Table 5 .
Comparison of accuracy with tolerance ±0.5

Table 6 .
Comparison of RMSE with tolerance ±0.5