A Novel Fuzzy Rough Sets Theory Based CF Recommendation System

Collaborative Filtering (CF) isoneof thepopularmethodology inrecommender systems. It suffers from thedata sparsity problem, recommendation inaccuracy and big-error in predictions. In this paper, the efﬁcient advisory tool is implemented for the younger generation to choose their right career based on their knowledge. It acquires the notions of indiscernible relation from Fuzzy Rough Sets Theory (FRST) and propose a novel algorithm named as Fuzzy Rough Set Theory Based Collaborative Filtering Algorithm (FRSTBCF). To evaluate the model, data is prepared using the cross validation method. Based on that, ratings are evaluated by calculating the MAE (mean average error), MSE (means squared error) and RMSE (root means squared error) values. Further the correctness of the model is measured by ﬁnding rates like Accuracy, Speciﬁcity, Sensitivity, Precision & False Positive Rate. The proposed FRSTBCF algorithm is compared with the traditional algorithms experiment results such as Item Based Collaborative Filtering using the cosine similarity (IBCF-COS), IBCF using the pearson correlation (IBCF-COR), IBCF using the Jaccard similarity (IBCF-JAC) and Singular Value Decomposition approximation (SVD). The proposed algorithm gives better error rate and its precision value is comparatively identical with the existing system.


INTRODUCTION
Recommendation system (RS) is a stimulating field in this Big data epoch. The CF is an approach widely used in the recommendation systems [1]. It is used to provide the most compatible and suitable suggestions in a decision making process. For example, when a student wants to get an assistance to choose his right career, the RS predict and suggest a career path related courses for further study. That suggestion would be based on his academic performance as well as his co-curricular activities. This kind of a knowledge pedestal which recommends a career path is a boon for the multi disciplinary young generation. As a rule of thumb, the set of functional and technical competence describes the student's caliber. From that, the CF methods executes its though process for a best recommendation.
A wide variety of industries and businesses needs an application which is capable of providing cogent recommendations. Such kind of recommendation application helps the end users to find the necessary information that they might not have thought of. Some of the domains which require an instance recommen-dation system are listed below.
• Banking, Financial services and Insurance (BFSI) • Public Broadcasting • Online disclosers like publications • Retail merchandise User's traditional patterns and consumption patterns are the base to recommend new items to the user. This type of recommendation system is called as collaborative filtering system [1], [2] and [3] which gives a suggestion to the users based on their past behavior and similar users. A number of setbacks like data sparsity, recommendation inaccuracy, and big error in predictions may reduce the users trust in the conventional collaborative filtering system.
To address the big error in prediction issue, the FRST concept indiscernibility is exploited in the proposed model. FRST described in [4], [5], [6] and [7] are used for approximation, feature selection, rule extraction and many more purpose. FRST is derived from the classical Rough Sets model is proposed [8]. The earlier is based on fuzzy relation the latter is based on Boolean equivalence relation. This paper is organized as follows. Section 2 summarizes the background and related work. Section 3, introduces the Fuzzy Rough Sets Theory based CF recommendation in detail. In section 4, the proposed Fuzzy Rough Sets Theory based CF recommendation method is evaluated using the stakeholder's functional and technical competence level data set and it is compared with the legacy recommendation methods. The paper is concluded in the section 5.

2.
BACKGROUND AND RELATED WORK

Recommender Systems Basic Concepts
There have been many studies on recommender systems and most of these studies propose a system which recommend an item to the users [2] [3]. The recommender systems are used to discover the items which would be significant for the users. The significant items can be of any type, such as movies, jokes, restaurants, books, news articles and so on. The recommendation methods are broadly classified into collaborative filtering (CF), content based (CB) and hybrid methods [2]. Here the focus is to propose a novel CF method based on FRST to eliminate the big error in prediction.

Fuzzy Rough Sets Theory Basic Concepts
Fuzzy Rough sets theory (FRST) proposed in [4], [5], [6] and [7] is an extended mathematical tool to represent knowledge. It is used to analysis the vague description of the objects. It influences the uncertain knowledge from the given data set, even without knowing any additional details about the objects. It is a popular scaffold for the applications based on pattern recognition,feature selection, rule induction etc., the fuzzy indiscernible relation is used to determine the similarity degree between two objects. FRST analysis work starts from the information tables. The information tables contain data about objects of interest. The knowledge of these objects is described by their attributes and attributes value. It is often interesting to discover similarity relationship. Like in the classical rough sets theory, the information system in FRST is defined as I S = (U, A) where U is the universe contains a finite non-empty set of objects,U = {O 1 , O 2 , O 3 ,...O n } and A is the non-empty finite set of attributes. Each attribute a ∈ A is an information function: f a U → V a where V a is the set of allowed values of a. Information systems may include a decision attribute, which contains the decision values of each object.
The formal representation of such information system is, ∈ A is the decision attribute. It analyzes the information systems that involve uncertainty and indiscernibility. Indiscernibility arises, when the differences between the objects are not able to identify based on their attribute values. FRST express the indiscernibility relation in many different ways. The approaches like fuzzy tolerance, equivalence and T-equivalence relations have been implemented to determine the indiscernibility relation.

FUZZY ROUGHSETS THEORY BASED CF RECOMMENDATION
In this section, the proposed FRST based CF approach is introduced. In which the nearest competences are found based on rated competences of stakeholders. We first introduce the formal definitions of concepts in Section 3.1.The real mechanism and its technical details described in Sections 3.2

Preliminaries
Assume that in a FRST based CF recommender system, there are a set S of stakeholders and a set C of competence. To identify the similarity between those competences, the FRST indiscernibility relation is calculated. It determines the degree to which those competences are indiscernibility. The indiscernibility relation is the basic concept of rough sets theory which is extended to a fuzzy indiscernibility relation [8].
In literature, the methods like fuzzy tolerance [9],equivalence, T-equivalence (or) T-similarity [10] are commonly approached to find the fuzzy indiscernibility relations. In this paper, the T-equivalence method is considered to find the fuzzy indiscernibility relations. A traditional similarity-based learning method [1] uses the distance similarity matrix to differentiate the competences.

Mechanism
To calculate the fuzzy indiscernibility relation,five different T cos -Transitive kernel functions are proposed in [10] [11] [12]. It is employed to determine the degree of similarity among two objects. Here the fuzzy T-equivalence relation with the equation gaussian kernel function is considered to estimate the indiscernibility relation. From that, further process will takes place to obtain the recommendation. Let's start on with a data table IS whose rows represent a set of stakeholders in an organization and columns represent a set of functional and technical competence that describe those stakeholders skill levels. Formally, the set S = {S 1 , S 2 , S 3, ....., S n } of stakeholders and the set C = {C 1 , C 2 , C 3, ....., C n } of functional and technical competences.
The set of competences rated by the stakeholders are examined and finds the competences similarities with the targeted competence. In the conventional recommendation systems, similarities between the competences are identified by using the cosine similarity measures (or) pearson similarity measurements. But in this paper, fuzzy indiscernibility relation computes the similarity. Once similar competences are found then rating for the new competence is predicted by taking the weighted average of the stakeholders rating on these similar competences.

Prediction
The constructed FRST based CF recommendation system helps to predict the probable rating for any stakeholder S to his functional and technical competences C by using the traditional weighted sum technique [3]. The system get all the competences similar to the target competences, and from those similar competences, it pick the competences which the active stakeholder has rated. The system weights the stakeholder's rating for each of these competences by the similarity between that and the target competences. At the end, the system scale the prediction by the sum of similarities to get a reasonable value for the predicted rating.

FRSTIBCF MODEL EXPERIMENTAL RESULT
Recommended system needs dataset and pre-processing steps. Making a dataset, the real time data are collected and the dataset is preprocessed to segregate. In this experiment 90% of the data set is consider as a training set and the remaining 10% is consider as a test data for training and testing the proposed FRSTIBCF model and existing models of the recommended system.

Data Set and Pre-processing Step
To evaluate the proposed model, the sample data set is shown in the Figure 1 and it is used. It contains the collection of student's functional and technical competence level in a rating matrix fashion (rating score from 1-5). The dimensionality of the data set is 40 rows and 26 columns. Rows represent the students and the column represents the competence levels in a particular technical course. An element e i j from the data set represents the knowledge level in a technical course j by the student i. Not yet studied/registered technical courses competence levels are set as 0 by default. The dataset is divided into training set and test set to conduct the experiment and it is illustrated in Figure 2. For diving the dataset, the popular approaches like split, bootstrapping & cross validation are used [13] [14] [15].

Results and discussion
The dataset preparation approaches are split, bootstrapping and cross validation to segregates the dataset into training set and test set. The training set is used to train the proposed and existing models like Item Based Collaborative Filtering using the cosine similarity (IBCF-COS), IBCF using the pearson correlation (IBCF-COR), IBCF using the Jaccard similarity (IBCF-JAC) [18] and Singular Value Decomposition approximation (SVD)   [19]. After training the test data is given to the proposed and existing models. Root Mean Square Error (RMSE), Mean Square Error (MSE) and Mean Absolute Error (MAE) values are calculated for predicting the rating value [16] and it is tabulated in Table 1.
The proposed model FRSTIBCF has much lower error rate and smaller MAE value is observed from the table 1 and it indicated that the proposed model is the best fit model than the other existing models. The cross validation approach is the most accurate one as compared to the remaining two approaches [17]. Here, the cross validation approach is considered for segregate the dataset and validate the recommendation evaluation experiment.
In order to evaluate the proposed FRSTIBCF model's recommendation with the existing models recommendation, the recommendation's positive ratings are compared. To know more about the model's sensitivity and precision, the experiments are conducted with different numbers of top n recommended items [20]. For example, the models are evaluated while recommending only one course, 5 numbers of course and 10 numbers of course.
The performances of the proposed and existing models are validated by the metrics like precision, recall, True Positive Rate (TPR) and False Positive Rate (FPR) and the outputs are listed out in Table 2. The measured precision represents the percentage of recommended courses that have been registered and the measured sensitivity/recall represents the percentage of registered courses that have been recommended. Specificity measure shows that, the ability of the proposed model to identifies the negative cases.
ROC curve of FPR Vs TPR and Precision Vs Recall are shown in figure 3 and figure 4 respectively. From the figure 3 and figure   4 The proposed model works much better than the other models, especially if the goal is high recall and the plotted ROC curves shown in Figure 3 is the base to measure the models performance. Usually the Area under the ROC curve (AUC) tells how well the model separates the positive cases from the negative cases. Figure 3 clearly shows that, comparatively the proposed model FRSTIBCF performances well in the lower range.
The proposed model FRSTIBCF has one of the numeric parameter k, which is for configuring the number of courses to recommend to the stakeholders by the system. Based on the k value, the relationship between FPR and TPR and recall Vs precision are views in figure 5 and figure 6 respectively. When the choice of getting output from the system k is less then the performance of the system is high and it is observed.

CONCLUSION AND FUTURE WORKS
In this paper, a novel FRST is proposed to find the similarity measure in the collaborative filtering recommendation system and the model is named as FRSTIBCF. The proposed model is experimented by the student data of their carrier selection pro-  cess. For predicting the rating value, Root Mean Square Error (RMSE), Mean Square Error (MSE) and Mean Absolute Error (MAE) values are calculated. Based on the rating the proposed model performance is evaluated using precision, recall, sensitivity, specificity, TPR and FPR. The recommendation system can be extended by using traditional prediction technique.