Hybrid Personalized Recommender System Using Fast K-medoids Clustering Algorithm

— Recommender systems attempt to predict items in which a user might be interested, given some information about the user’s and items’ profiles. This paper proposes a fast k-medoids clustering algorithm which is used for Hybrid Personalized Recommender System (FKMHPRS). The proposed system works in two phases. In the first phase, opinions from the users are collected in the form of user-item rating matrix. They are clustered offline using fast k-medoids into predetermined number clusters and stored in a database for future recommendation. In the second phase, clusters are used as the neighborhoods, the prediction rating for the active users on items are computed by either weighted sum or simple weighted average. This helps to get more effective and quality recommendations for the active users. The experimental results using Iris dataset show that the proposed fast k-medoids performs better than k-medoids and k-mean algorithms. The performance of FKMHPRS is evaluated using Jester database available on website of California University, Berkeley and compared with web personalized recommender system (WPRS). The results obtained empirically demonstrate that the proposed FKMHPRS performs superiorly.


I. INTRODUCTION
A Recommender System arises from the need to be able to provide users with relevant and personalized information.They help the user make choices when there is not sufficient personal experience regarding the available options.These systems can assist the consumer in various ways.They can simplify the information search process and facilitate the comparison of products, report the reviews of other users, or exploit the consumer's history to suggest products similar to those purchased in the past or previously selected by users with a similar buying behaviour [1].Nowadays, the advance of Internet and Web technologies has continuously boosted the prosperity of e-commerce.Through the Internet, different merchants and customers can now easily interact with each other, and then have their transactions within a specified time.However, the Internet infrastructure is not the only decisive factor to guarantee a successful business in the electronic market.With the continuous development of electronic commerce, it is not easy for customers to select merchants and find the most suitable products when they are confronted with the massive product information in Internet.In the whole shopping process, customers still spend much time to visit a flooding of retail shops on Web sites, and gather valuable information by themselves.This process is unnecessary time-consuming Even sometimes the contents of Web document that customers browse have nothing to do with that of they need indeed.This inevitably influences customers" confidence and interests for shopping on Internet.In order to provide decision support for customers, one way to overcome the above problem is to develop intelligent recommendation systems to provide personalized information services.A recommendation system is a valid mechanism to solve the problem of information overload in Internet shopping.On the shopping websites, the system can help customers find the most suitable products that they would like to buy by providing a list of recommended products.For those products that customers buy frequently, such as grocery, books and clothes, the system can be developed to reason the customers" personal preferences by analyzing their personal information and shopping records.This produces the sensible recommendations for them.Therefore, it is of important to develop the high efficient learning algorithm to capture what customers need to help them buy.The WPRS framework intended for hybrid recommender system which combines collaborative recommendation with other types of recommendation components [2].It alleviates the challenges such as data sparsity and scalability.
The remainder of this paper is organized as follows.The section II summarizes the different strategies for recommender systems and their drawbacks.The proposed clustering based hybrid personalized recommender system is described in the section III.The section IV illustrates experimental setup of the proposed recommendation system.This section also gives performance evaluation with the existing algorithms.Finally, the section V concludes the paper.

II. RECOMMENDER SYSTEM STRATEGIES
In the recent years web personalization has undergone through tremendous changes.The content [3,4], collaborative [5,6] and hybrid [7] based filtering are three basic approaches used to design recommendation systems.
The content based filtering [8] relies on the content of an item that user has experienced before.The content based information filtering has proven to be effective in locating text, items that are relevant to the topic using techniques such as Boolean queries, vector space queries etc.However, content based filtering has some limitations.It is difficult to provide appropriate recommendation because all the information is selected and recommended based on the content.Moreover, the content based filtering leads to overspecialization i.e. it recommends all the related items instead of the particular item liked by the user.
The collaborative-filtering [9] aims to identify users who have relevant interests and preferences by calculating similarities and dissimilarities between their profiles.The idea behind this method is that to one"s search the information collected by consulting the behavior of other users who shares similar interests and whose opinions can be trusted may be beneficial.The different techniques have been proposed for collaborative recommendation; such as correlation based method, semantic indexing etc.The collaborative filtering overcomes some of the limitations of the content based filtering.The system can suggest items to the user, based on the rating of items, instead of the content of the items which can improve the quality of recommendations.However, collaborative filtering has some drawbacks.The first drawback is that the coverage of rating could be very sparse thereby resulting in poor quality recommendation.In the case of the addition of new items into database, the system would not be able to recommend until that item is served to a substantial number of users known as cold-start.Secondly, when new users are added, the system must learn the user preferences from the rating of users, in order to make accurate recommendations.Moreover, these recommendation algorithms seem to be very extensive and grow non-linearly when the number of users and items in a database increase.The hybrid recommendation systems [10,11] combine content and collaborative based filtering to overcome these limitations.As stated below, there are different ways of combining content and collaborative based filtering [12].
I. Implementing these approaches separately and combining them for prediction.II.Incorporating some content based characteristics into collaborative approach and vice versa.III.Constructing a general unified model that incorporates both content and collaborative based characteristics.The hybrid approach proposed in this paper extracts user"s current browsing patterns using web usage mining, and forms a cluster of items with similar psychology to obtain implicit users rating for the recommended item.

III. PROPOSED FKMHPRS
We have developed and tested the FKMHPRS for Jester dataset available on website of California University, Berkeley.The system architecture has been partitioned into two main phases; offline and online.The Fig. 1 depicts the architecture of FKMHPRS with its essential components.
The phase I is offline.It preprocesses and forms clusters.In this phase background data in the form of user-item rating matrix is collected and clustered using the proposed approach which is described in section A. Once the clusters are obtained the cluster data along with their centroids is stored for future recommendations.The phase II is online where the recommendation takes place for the active user.Here, similarity between active users and clusters is calculated to choose the best clusters for making recommendations.The rating quality of each item unrated by active user is computed in the chosen clusters.To generate the recommendations, clusters are further selected based on rating quality of an item.The recommendations are then made by computing the weighted average of the rating of items in the selected clusters.The working of FKMHPRS is described below in detail with the Jester dataset.

A. Preprocessing phase
The Jester dataset is available online on the site www.ieor.berkeley.edu/~goldberg/jester-data.The Jester is a WWW Based Joke Recommender System, developed by University of California, Berkeley.This data has 73421 user entered numeric rating for 100 jokes, ranging on real value scale from -10 to 10. User-item rating taken from Jester dataset rated in the scale of -10 to +10 is normalized in the scale of 0 to 1, where 0 indicates that item is not rated by corresponding user.To facilitate the discussion, running example shown in the Table I is used, where U 1 -U 10 are the users and J 1 -J 10 are the items (jokes) rated or unrated by users.The last row of Table 2 gives ratings of the active user (U 1 ).

B. Proposed fast K-medoids clustering
K-medoids is a clustering algorithm that is related to the k-means algorithm [13].The k-medoids is a partitioning algorithm that divides the data set up into separate clusters.The algorithm attempts to minimize the squared error, which is the distance between points in the cluster and a point that is designated as the center (medoid) of the cluster.A medoid is considered as an object of a cluster whose average dissimilarity to all the objects in a cluster is minimal [14].However K-medoids has some limitations.K-medoids clustering algorithm centroids are initially selected by the user.Therefore, performance of these algorithms depends on this manual selection of centroids.It works inefficiently for large data sets due to its complexity [15].This is the major motivation behind the work presented in this paper.The proposed clustering algorithm initially calculates centroids appropriately; this results in the proper creation of the clusters Let p patterns having n variable are to be grouped into where k is assumed to be given and let The Euclidean distances will be used as a dissimilarity measure.The distances between i and j is given as The proposed algorithm is composed of the following steps.
Step 1: Initial selection of medoids: To determine the medoids of the cluster, all the patterns are applied to each of the pattern and the patterns with Euclidian distance less than or equal to  , a user defined value, are counted for all the patterns, as described by following equation 1.

If then
The top k patterns given maximum count are selected as initial medoids.
The pattern with the maximum count is selected as the medoids of the cluster as illustrated in equation 2.
Where, max D , is the maximum value in the row vector D, and, ind D , is the index of maximum value.Therefore the most appropriate pattern, , is chosen as the initial medoid.
Step 2: Finding Nearest Neighbour I.If the distances between given pattern and the medoid are greater than  , user defined parameter, the pattern is assigned to outlier, otherwise kept near to the nearest medoids.The process of Nearest Neighbour is as follows: II. Calculate the sum of distances from all patterns to their mediods as described by equation 3.
Step 3: Update mediods I. To update mediods, select the pattern which count is less than previous mediods.Assign each object to nearest mediods using step 2 and obtain the cluster result.II.Calculate the sum of distances from all patterns to their mediods using equation 3.
Step 4: If sum is garter than equal to the previous one then stop the algorithm, otherwise goes Step 2.
The proposed fast K-mediods is used for clustering of the Jester data set as shown in Table 1.The clustering is resulted in the three clusters    II.After clustering as stated in the fast K-mediods, knowing the members of each group, we have recomputed new centroids as the average of all corresponding coordinates of the members of each cluster.

C. Recommendation Phase
This phase consists of two steps: (I) Similarity Computation and (II) Generating Recommendation .

I. Similarity Computation:
In order to find the nearest neighbors of the active user, it must measure the similarity of the users.Calculate similarity between clusters" centroids and active users.Select cluster that have the highest similarity.We use Pearson correlation similarity algorithm to measure the similarity between active user ( 1 u ) and clusters ( u c ).
User rating can be treated as a vector on an n-dimensional item space.The similarity between 1 u and u c is computed as follows For running example, the similarity value of active user of three clusters is shown in Table III.Choose the clusters with the highest similarity value.

II. Generating Recommendation
When a cluster of nearest neighbors (neighborhood) of the active user is chosen, predictions are generated based on a weighted aggregate of their ratings.Most used aggregating functions are similarity sum or simple similarity average.The predication rating of item i by 1 which is gained by the rating of nearest neighbors set u c rated by active user 1 u , the computation method is as the follows:  The predication rating of active user for running example is shown in Table IV.Once the predicting rating of each item is calculated, the recommendation to the active user is provided, e.g., joke 3 predication rating up to 0.63 will be recommended and so on.

IV. EXPERIMENTS
We have conducted a set of experiments to examine the effectiveness of our proposed recommender system in terms of accuracy of neighbor-selection and recommendation quality.
The proposed FKMHPRS is implemented in MATLAB version 7.2.The experiments are conducted on a 2.0 GHz, Intel Pentium-IV PC with 512 MB memory, running Microsoft Windows XP Professional.

A. Performance evaluation of clustering
In order to check the performance of the proposed clustering algorithm, we first applied the algorithm to real data set, "Iris" data, whose true classes are known.The Iris data set is available in UCI repository (ftp://www.ics.uci.edu/pub/machinelearningdatabases/), which includes 150 objects (50 in each of three classes -"Setosa", "Versicolor", and "Virginica") with four variables ("sepal length", "sepal width", "petal length", and "petal width").
The performance was measured by the accuracy, which is the proportion of objects that are correctly grouped together against the true classes.To investigate the performance more objectively, a simulation study was carried out by generating artificial data sets repetitively and calculating the average performance of the method.
We have applied the proposed fast K-medoids, Kmedoid, K-means [16], and fuzzy c-means clustering (FCM) [17] to create three clusters using this data without the class information.The table V shows the results obtained using existing and proposed clustering method.The table V shows that the proposed fast K-Medoids clustering algorithm works superior than the traditional algorithms because the algorithm calculates centroids properly instead of selecting randomly.

B. Performance evaluation of recommender system
The experiments are performed on the small Jester dataset consisting of user-item rating matrix of size 10 (users)  10 (jokes) as shown in the Table 1.The measurement method of evaluating the recommendation quality of recommendation system mainly includes statistical precision measurement method and decision supporting precision measurement method [18,19].Statistical precision measurement method adopts MAE (Mean Absolute Error) to measure the recommendation quality [20].MAE is a commonly used recommendation quality measurement method.So we use MAE as the measurement criteria.MAE calculates the irrelevance between the recommendation value predicted by the system and the actual evaluation value rated by the user.We represent each pair of interest predicted rank as <p i , q i >, p i is the system predicted value, q i is the user evaluation value.Basing on the entire <p i , q i > pairs, MAE calculates the absolute error value |p i -q i | and the sum of all the absolute error value, and then calculates their average value.If the MAE value is small, it indicates good recommendation quality.
The predicted user rating set can be represented as   , its corresponding actual user rating set can be represented as   , the MAE can be defined as the following [21]: The performance of proposed FHMHPRS is compared with WPRS [10].The WPRS proposed an overall framework for hybrid recommender system which combines collaborative recommendation with other types of recommendation components.Let us examine the influence of various nearest neighbor set on predictive validity.We gradually increase the number of neighbors; the experiment results are shown in Table VI As Fig. 2 shows, the FKMHPRS has smaller MAE value than WPRS in most cases, which means that the sparseness has the less impact on our proposed algorithm.IV.CONCULSIONS This paper describes a novel hybrid personalized recommender system that utilizes clustering of user-item rating matrix through proposed fast K-medoids and provides the recommendations for the active user with good quality rating using similarity measures.The results from various simulations using Iris data set shows that the proposed fast K-medoids clustering algorithm performs better than K-medoid, K-means, and FCM which helps to improve the quality of rating.Through the experiment analysis, it is found that the proposed FKMHPRS performs better than WPRS and the sparseness has less impact on the proposed system.

Figure 1 :
Figure 1: The architecture of FKMHPRS

.
The details of the clusters are created and users in each cluster are shown in the Table average rating of items rated by active user 1u and average rating of item i rated by all user.

Figure: 2
Figure: 2 MAE on each algorithm.(A small value means a better performance)

TABLE I RUNNING
EXAMPLE OF RATING MATRIX FROM JESTER DATA SET AFTER NORMALIZATION IN THE RANGE OF 0 TO 1

TABLE II .
USERS IN EACH CLUSTER USING FAST K-MEDIODS

TABLE V
: