User proﬁle correlation-based similarity algorithm in movie recommendation system

A recommendation system is a software used in the e-commerce ﬁeld that provides recommendations for customers to choose the items they like. Several recommendation systems have been proposed; however, collaborative ﬁltering is the most widely used approach. The main issue in collaborative ﬁltering is how to implement a similarity algorithm that can improve performance in the recommendation system. Several similarity algorithms based on user rating value have been developed, and recently a similarity algorithm has been developed that combines the user rating value and the user behavior value. However, the existing research is still based only on a single user behavior value, which is the genre data. Therefore, we propose a new similarity algorithm that considers not only the genre data but also the user proﬁle data (namely age, gender, occupation, and location). The new similarity we are proposing is called User Proﬁle Correlation-based Similarity (UPCSim). The user proﬁle correlation similarity was obtained by calculating the correlation coeﬃcient between the user proﬁle data and the user rating or behavior values. An experiment was done to compare the accuracy of the UPCSim algorithm with that of the previous algorithm. The experiment results showed that the UPCSim algorithm can improve the recommendation performance MAE by 1.64% and RMSE by 1.4% compared to the previous algorithm.

Introduction The exponential growth of information on the internet causes users can get more information resources to dig up and collect. However, users will get lost in the sea of information and will have difficulty in processing that information [1,2]. Users have to spend more time and more energy finding the information they want, but users may not necessarily get satisfactory results. Fortunately, user behavior on ecommerce sites and other social networks can be recorded and be tracked, making it easier to analyze user interests [3,4]. One of the tools used to solve this problem in analyzing user interest is known as a recommendation system. The recommendation system is software that helps users to get relevant items from millions of items in the database [5,6]. The recommendation system's main task is to offer users personalized item recommendations through information filtering. This system has become a commercial platform that assists the user in providing suggestions for items to be selected. The provided suggestions are useful to support users in various decision-making processes, such as what books to read, which locations to visit, what news to read, and more [7]. 1  Based on the utilized data source and computation method, the recommendation system is divided into three approaches, namely: collaborative filtering, contentbased filtering, and hybrid filtering [6,8]. The collaborative filtering approach uses the collaborative power of ratings given by all users to make recommendations. The content-based filtering approach uses descriptive attributes of items to make recommendations. Meanwhile, the hybrid filtering approach combines several filtering methods to get a list of items according to user preferences [9].
Of the three recommendation system approaches, collaborative filtering is one of the most popular, successful, and widely used methods in recommendation systems [5,10,11]. This recommendation method is widely used because it is simple, efficient, and has an acceptable accuracy level. Based on the advantages of this collaborative filtering, several real-world systems have used this method, such as Amazon, MovieLens, and Netflix [8,10,12].
The collaborative filtering approach is categorized into two approaches, viz ranking-oriented collaborative filtering and rating-oriented collaborative filtering [13,14]. Ranking-oriented collaborative filtering directly provides a preference order for items from users without predicting ratings for items that have not been rated. In contrast, rating-oriented collaborative filtering predicts ratings for items that have not been rated by users based on rating information from other users [14]. The rating-oriented collaborative filtering approach is more widely used because it is faster in generating recommendations than the ranking-oriented collaborative filtering approach.
The rating-oriented collaborative filtering approach is categorized into two methods, viz the model-based method and the memory-based method [8,15,16]. The model-based method uses a rating database to build a model and uses that model to predict ratings for unrated items [17].Some of the techniques that are often used to build models include clustering [18,19], Bayesian network [20], Markovian factorization [21], and Singular Value Decomposition (SVD) [22,23]. This model-based method has a drawback; namely, the computational complexity is very dependent on the model being built [3].Therefore, the algorithm complexity value cannot be ascertained.
Meanwhile, the memory-based method uses the rating database to calculate the similarity between users or similarity between items [24].In its implementation, this method is divided into two techniques, namely User-Based Collaborative Filtering (UBCF) and Item-Based Collaborative Filtering (IBCF) [1,2,25]. The UBCF predicts ratings for all items that have not been rated based on the similarity of users, while the IBCF predicts ratings based on the similarity of items [26].Some of the frequently used traditional similarities are the Cosine (COS), Pearson Correlation Coefficient (PCC), and Jaccard [2,8].
In recent years, the majority of researchers have focused more on developing a similarity algorithm between users/items, because of its simplicity in computation. For example, Patra et al. [16] proposed a similarity algorithm known as Bhattacharyya similarity. Furthermore, Polatidis et al. [27] proposed a similarity algorithm that is an increase in PCC similarity, known as multi-level collaborative filtering. Their proposed similarity is an increase in the PCC similarity by taking into account the number of items co-rated on several levels. Sun et al. [28] proposed a similarity algorithm by integrating the similarity of Triangle and Jaccard, known as TMJ similarity. Feng et al. [2] proposed a new similarity algorithm by integrating three factors of similarity impact, namely S 1 ,S 2 , and S 3 . With S 1 expressing similarity between users, S 2 is used to calculate the number of items co-rated less than the specified threshold, and S 3 explains the weight of each user's rating preference. The four similarity algorithms that have been developed by previous researchers perform similarity calculations based only on the user's rating value data. Furthermore, Wu et al. [29] proposed a new similarity by combining similarity based on the user's rating value and similarity based on user behavior value. Wu et al. [29] view that the traditional recommendation algorithm only pays attention to the rating value given by users explicitly, but ignores implicit information on user behavior in assessing the item type (genre) that will also affect the accuracy level. In their research, similarity based on the user's rating value (S r ) was calculated using the UBCF method. Meanwhile, similarity based on user behavior value (S b ) is calculated from the probability of the user score in giving a rating to the genre. The proposed similarity algorithm is known as the User Score Probability Collaborative Filtering (UPCF).
By combining the two similarities, the similarity calculation between users not only require user rating value data but also genre data to calculate the user behavior value. The research results can provide an increase in accuracy compared to the application of the traditional similarity algorithm, which only considers the user rating value. However, there is still a need to improve the recommendations' accuracy by exploring other user behavior affecting user interests.
Therefore, we propose a new similarity algorithm -that is called User Profile Correlation-based Similarity (UPCSim) -which not only pays attention to genre data, but also adds other user behavior data in term of user profile data (namely age, gender, occupation, and location) on giving weight to similarity. In this UPCSim algorithm, the similarity weighting technique is obtained by calculating the correlation coefficient between the user profile data (namely age, gender, occupation, and location) and the user rating value for the similarity weight of S r , while the similarity weight of S b is obtained by calculating the correlation coefficient between the user profile data and the user behavior value.
The structure of this paper is as follows. In "Similarity Algorithm" section describes the several similarity algorithm which developed in the collaborative filtering approach. Then, "Research Method" section explains the proposed similarity algorithm in detail. Subsequently, "Experiment" section present the experiment's results using the MovieLens dataset and its discussion. Finally, "Conclusion and Future Work section provides some conclusions and suggestions for further research development".

Similarity Algorithm
In the previous section, it has been explained that the frequently used traditional similarity algorithms in recommendation systems are the Cosine similarity (COS) and the Pearson Correlation Coefficient (PCC) [30]. To explain this similarity calculation, we assume that the user and item sets are defined as U = {u 1 , u 2 , . . . , u m } and I = {i 1 , i 2 , . . . , i n }. The user rating matrix for the item is denoted as where m and n are the number of users and the number of items, respectively, and r ui is the rating given by user u on item i Cosine similarity measures the angle between two rating vectors (user or item) [2,3,16]. The Cosine similarity formula between user u 1 and user u 2 is stated in (1).
Pearson similarity measures how two users or items are linearly related to each other [3,16]. After identifying the items rated jointly between user u 1 and user u 2 , Pearson similarity calculates the linear correlation between the two users using the formula specified in (2). Pearson similarity values ranged in the range [-1, +1]. A value of +1 indicates a very high correlation and -1 indicates a negative correlation.
Several similarity improvements are continuously being developed to increase the recommendations' accuracy. Among them are Bhattacharyya's similarity [16], the multi-level collaborative filtering similarity [27], the TMJ similarity [28], and the similarity integrating three impact factors, namely S 1 , S 2 , dan S 3 [2]. These four similarity algorithms only consider user rating value data explicitly to calculate the similarity between users. These similarity algorithms assume that users who give a high rating on an item indicate that they like the item, while users who give a low rating indicate that they do not like the item.
In the current era of online shopping, users assess items based on their quality, delivery speed, service attitude to customers, and other factors. If the user is not interested in an item, the user will not select/buy the item. The illustration of user behavior in assessing items is shown in Figure 1. The User u likes to watch genre animation and does not like to watch drama and adventure genres. The User u selects the movie title "Toy Story" from a collection of animated films. After watching the movie, the User u gave a low rating to "Toy Story" because the "Toy Story" movie gave unclear sound quality.
In the similarity algorithm, which is only based on the user rating value, if a user gives a low rating, it means that the user does not like the movie title as well as the movie genre. This indicates that the user's preference for similar items will decrease and will affect the resulting recommendations. Therefore, it is necessary to involve user behavior in rating items to calculate similarity as a form of invisible user preferences.
To adopt this user behavior, Wu et al. [29] developed a new similarity algorithm that involves not only explicit rating data but also user behavior data in giving an implicit rating. Wu et al. [29] assumed that users who gave a low rating to an item did not necessarily dislike the item. The similarity algorithm combines similarity is the similarity between user u 1 and user u 2 calculated by the UBCF method, β is the threshold (which is between 0 to 1) that can be set to the average value of the similarity of all the users who are similar to the active user u 1 , and S b is defined in (4).
G u1 and G u2 are the set of item types (genres) rated by users u 1 and users u 2 respectively. P u1g and P u2g are the probability scores for the type of item g from users u 1 and users u 2 respectively. P u1 and P u2 are the average probability scores for all item types from users u 1 and users u 2 , respectively, and g is one type of items rated by both users. The combination of the two similarities in the research conducted by Wu et al. [29] has several limitations, viz the calculation of similarity based on user behavior value only considers the genre data of the item so that it does not guarantee the resulted recommendations accuracy. Based on the problem, this paper focuses on increasing accuracy by considering other user behavior data that is the user profile data (namely age, gender, occupation, and location) that will influence user behavior in determining the selected item.
We assumed that age, gender, occupation, and location would influence the user's interest in the item. As an illustration, young users would have different preferences from older users. Female users would have different preferences from male users. Users with a technician job would have different preferences from users with a lawyer job, and users living on the coast would have different preferences from  users living in cities. Based on this assumption, the main contribution of our study is the proposed new similarity algorithm that considers not only genre but also age, gender, occupation, and location data to improve accuracy in dealing with items that have not been rated by users (data sparsity). The detail of this algorithm is be described in the next section.

Research Method
The conducted research was more focused on the proposed UPCSim algorithm. This UPCSim algorithm would be compared with the UPCF algorithm [29], because the UPCF already accommodates one user behavior, viz the genre data. Meanwhile, the UPCSim algorithm adds other user behavior data, namely the user profile data. For this comparison, an MBCF system applying the UPCSim algorithm was created. It should be noted that all the pre-processing and processing of MBCF we did were the same as that done by Wu et al. [29]. So that the comparison between the two studies is equal.
The MBCF system that we have developed is a development of the MBCF system that was previously developed by Wu et al. [29]. A detailed illustration of the proposed MBCF system is shown in Figure 2.
The system is divided into four blocks, namely input, data preparation, the MBCF process, and output. The input block is the input dataset used in the MBCF system. The data preparation block consists of the data pre-processing stage, which results in data source ready for use in the MBCF process. The data source includes the rating data and the behavior data. While the behavior data used in Wu et al. [29] only employs the genre data (the green component in data preparation block), our research also accommodated the user profile data (the red component in data preparation block). The MBCF process block is the development of the MBCF method using the weighting similarity. The similarity weighting carried out by Wu et al. used a threshold value ranging from 0 to 1 (the green component in the MBCF process). Whilst, the weighting similarity in our research used the coefficient correlation between the user profile data and the user rating or behavior values (the red component in MBCF process). Furthermore, the final block of the developed system is the output block that provides an evaluation of the UPCSim algorithm application in the MBCF method.
The detail of our proposed UPCSim algorithm is explained as a component of similarity calculation, which is presented in "Similarity Calculation" section. Meanwhile the detail of the developed MBCF system is explained in "Developed MBCF System" section.

Similarity Calculation
In this study, the similarity calculation between users was divided into three components. The first component is the S r similarity calculation component (shown in the dashed blue box). The second block is the S b similarity calculation component (shown in the dashed green box). Finally, the UPCSim block (shown in the dashed red box) gives weight to both similarities. Details of the three component in our similarity calculation can be seen in Figure 3.
Each component in Figure 3 is described as follows.   to R 9431682 range from 0 to 5, with a value of 0 indicating that the user unrated the item. After the rating matrix is formed, the next step was to calculate the S r similarity using the Cosine similarity formula referring to (1). The final result of the S r similarity calculation would form the S r similarity matrix with the order 943 × 943. Illustration of the similarity matrix S r is shown as follows. rating and item data. In the MovieLens 100K dataset, the rating data stated the user's rating value for the watched movies, and the item data stated the movie title data containing the genre information of each movie. Each movie title can include several genres. For example, the movie "Toy Story" has the genre of animation, children, and comedy. The rating and item data are used to calculate the user bahavior value. The user behavior value could be calculated by relating the rating data and the item data, removing some unused attributes from the results of the relationship between the rating data and the item data, and performing data aggregation using the sum function grouped by user. This data aggregation results are illustrated in the user behavior value matrix with the order matrix of 943 × 19, , where the value 943 represents the number of users, and the value 19 represents the number of genres that exist. The illustration of user behavior value matrix is shown as follows. is the 943 rd user behavior value for the 19 th genre, representing the total number of 19 th genre watched by 943 rd user. After the user behavior value matrix was formed, the next stage was to calculate the probability of genres occurrence from the user behavior value matrix to produce a probability matrix of user behavior value using (5) below.
B(g) is the value of user behavior for the target genre g, and N is the total number of users who gave rate to the target genre g. The illustration of the probability matrix of user behavior value is shown as follows. is the probability value of the 943 rd user behavior for the 19 th genre. The probability matrix of user behavior value was used as the basis for calculating the S b similarity referring to (4). The results of the S b similarity calculation would form a matrix with the order 943 × 943. The illustration of the S b similarity matrix is shown as follows. S 1 943 is the similarity value based on the user behavior value the 1 st user and the 943 rd user.

UPCSim
The UPCSim component is the component of the similarity calculation using the UPCSim algorithm, which calculates the weights of both similarities (S r and S b ) based on user profile. The initial stage in calculating the weights was done by reading the user data. In the MovieLens 100K dataset, the user data stated the user profile data that consisted of age, gender, occupation, and location. The user profile data was used to calculate the weights of S r and S b similarities. The weights of these two similarities were calculated based on the correlation coefficient (R) using multiple linear regression. The weight of S r similarity was obtained by calculating the correlation coefficient between user profile data (age, gender, occupation, and location) and the user rating value, which was then symbolized by α. While the weight of S b similarity was obtained by calculating the correlation coefficient between user profile data (age, sex, occupation, and location) and the user behavior value, which was then symbolized by β.
After weighting the two similarities, the next stage was to calculate the final similarity matrix by combining the weighted S r and S b similarities. By combining these two similarities, we could obtain the final similarity matrix S with the order matrix 943 × 943. The formula of the final similarity between user u and user v is defined in (6) below.
S(u, v) is the final similarity between user u and user v. S r (u, v) is the similarity based on user rating value between user u and user v. S b (u, v) is the similarity based on user behavior value between users u and user v. α is the weight of the similarity S r , and β is the weight of the similarity S b .

Developed MBCF System
Based on the illustration shown in Figure 2, this section describes each block of the developed MBCF system.

Input
The first block of the developed MBCF system was the input dataset. In this paper, we used the MovieLens dataset (https://grouplens.org/datasets/movielens/). This dataset was collected by the "GroupLens Study Group of the University of Minnesota" [31]. There were several versions of the dataset, including ml-100K, ml-1M, ml-10M, ml-20M, etc. In this experiment, we chose the dataset used in previous studies [29], namely ml-100K. This ml-100K dataset contained several data files. Our study used 3 data files, namely rating data, item data, and user data.. The rating data consisted of 100,000 ratings as rated by 943 users on 1682 movies. Each user has rated at least 20 movies. The rating value given by the user ranged from 1 to 5. A score of 1 stated that the user did not like the watched movie. A score of 5 stated that the user liked the watched movie. This rating data had a sparsity of 93.7% and a density of 6.3%. This rating data structure consisted of user-id, movie-id, rating, and timestamp. Item data contains information about items (movies). This item's data structure consisted of 24 attributes, namely, movie-id, movie title, release date, video release date, IMDb URL, and 19 attributes of movie type (genre). Each item (movies) can have several genres.
User data contains information about the user's profile. This user data structure consisted of 5 attributes, namely user-id, age, gender, occupation, and zip code (which states the user's location).

Data Preparation
The second block was data preparation. In this section, data pre-processing was carried out. The purpose of this process was to prepare data to obtain a quality dataset. In this study, data pre-processing was done by reducing attributes that were not relevant to data processing.
Based on the existing file data structure, several attributes were not needed in data processing. These attributes included the timestamp on the rating data, movie title, release date, video release date, and IMDb URL on the item data. At the dataset preparation stage, these attributes would be deleted so that the data processing stage ran more effectively.

MBCF Process
The third block of the MBCF system was the MBCF process. The MBCF process was divided into two sub blocks, namely the similarity calculation and the prediction.
The similarity calculation was the initial process used in the information filtering process using the MBCF approach. In this study, the similarity calculation was divided into three components, namely S r similarity calculation, S b similarity calculation, and the UPCSim algorithm. In detail, these three components have been described in "Similarity Calculation" section.
The prediction was carried out to provide a predicted rating for items that had not been rated by active users. The initial stage taken in this prediction was to determine the number (k) of the nearest neighbors of an active users. k was an integer number representing the number of neighbors, ranging from 10 to 100 [2,[27][28][29]. After the k value was determined, the next stage was to determine the rating prediction for some unrated items.
The formula used to determine the predicted rating for an item (i) unrated by active users (u) is shown in (7) below [2,16].
p ui represents the predicted rating value of user u to item i.r u danr v is the average rating of user u and user v respectively. r vi is the rating value given by user v to item i. S(u, v) is the final similarity between user u and user v. N N u is the set of nearest neighbors to user u. Output The fourth block of the MBCF system was the output block. This block was used to evaluate the performance of the UPCSim Algorithm in predicting ratings for items that had not been rated by an active user.
To measure the performance of the recommendation system, mean absolute error (MAE), rooted mean squared error (RMSE), precision, and recall were the most popular measures. According to Jalili et al. [32], the metrics for evaluating recommendation systems can be classified into two categories, namely prediction metrics and classification metrics. The MAE and RMSE are primarily used to evaluate prediction metrics [33,34], whereas precision and recall are used to evaluate classification metrics, namely evaluation of the quality of top-N recommendations [3].
In this study, we adopted the MAE and RMSE metrics to measure the prediction metrics of the UPCSim Algorithm. The MAE is the most widely used metric in recommendation systems with a collaborative filtering approach. It is used to estimate the average absolute deviation between the actual and the predicted rating values. A lower MAE provides good recommendation quality [35]. The formula for calculating MAE is defined in (8).
Meanwhile, RMSE reflects the degree of deviation between the predicted rating and the actual rating. A lower RSME is associated with higher prediction metrics [36]. The RMSE formula is defined in (9).
T N is the total number of predicted items. p ui and r ui represent the predicted rating and actual rating of the user u to item i, respectively.

Experiment
This section begins with the experiment design described in "Experiment Design" section. Next, the "Experiment Result and Analysis" section explain the comparison between the proposed UPCSim algorithm and the previous similarity algorithms, namely the Cosine similarity algorithm and the UPCF similarity algorithm. The comparison utilized the MAE and RMSE values of the MBCF system that had been made. Finally, "Discussion" section provides conclusions from the experiment results.

Experiment Design
To evaluate the performance of the proposed UPCSim algorithm, the experiment design in this study implemented the following four steps: The first step was to divide the dataset. The dataset would be divided into two parts, viz training data, and testing data. The k-fold cross-validation method was used in dividing the dataset. In this experiment, the chosen k was 5. It meant  that 80% of the dataset was used for training data and the remaining 20% was for testing data. The training data were train1, train2, train3, train4, and train5, and the testing data are called test1, test2, test3, test4, and test5. The second step was to calculate the similarity matrix between users. The S r similarity was obtained based on the user rating value matrix, and the S b similarity was obtained based on the user behavior value matrix. Then, we calculated the weights of the two similarities using the correlation coefficient (R), with multiple linear regression analysis. The final similarity matrix was obtained by combining the two weighted similarities.
The third step was to calculate the predicted ratings for the testing data. The nearest neighboring k was selected based on the final similarity matrix. In this experiment, the k values varied from 10 to 100, with an increase in the k value by 10.
The fourth step was to measure the proposed UPCSim algorithm's performance using the MAE and RMSE prediction metrics.

Experiment Results and Analysis
This section aims to compare the proposed UPCSim algorithm's performance results with the traditional Cosine similarity and the UPCF similarity. The MAE and RMSE values were obtained by comparing the three algorithms using variations in the number of different neighbors. Experiments were carried out in five iterations. The first iteration was done using the train1 and test1 dataset. The second iteration was done using the train2 and test2 datasets, and so on. The number of nearest neighbors used in this experiment ranged from 10 to 100, which were used in each train1, train2, train3, train4, and train5 data. The three similarity algorithms (Cosine, UPCF, and UPCSim) were applied to the five-training data. The average MAE performance for the three algorithms is shown in Table 1.
In Table 1, MAE c is the average MAE value of the UBCF experiment using Cosine similarity. The MAE p is the average MAE value of the UBCF experiment using UPCF similarity. The MAE ps is the average MAE value of the UBCF experiment using the proposed UPCSim similarity. The MAE ps-c is the difference between the MAE value using the proposed UPCSim similarity and the MAE value using the Cosine similarity. And the MAE ps-p is the difference between the MAE values using the proposed UPCSim similarity and the MAE values using the UPCF similarity.  Table 1, the average MAE value of the three algorithms can be illustrated graphically, as in Figure 4.  Figure 4 shows that the three algorithms' MAE value decreases with an increasing number of nearest neighbors. At the beginning of the curve, it can be seen that the decline in MAE value is very sharp with the increase in the number of nearest neighbors, while at the end of the curve, the greater the number of nearest neighbors, the MAE value tends to be stable. It can be said that the number of nearest neighbor variables affects the MAE value, where the greater the number of nearest neighbors, the smaller the MAE value. With the same number of nearest neighbors, the MAE value of the UPCSim algorithm is always smaller than that of the other recommendation algorithms. In other words, the error between the actual rating and the predicted rating of the proposed UPCSim algorithm is the smallest with a more accurate rating prediction.
Furthermore, the comparison of the average RMSE values of the three recommendation algorithms is shown in Table 2. The RMSE c is the average RMSE value from the UBCF experiment using Cosine similarity. The RMSE p is the average RMSE  As can be seen in Table 2, when the number of nearest neighbors is the same, the average RMSE value on the UPCSim similarity is always smaller than the other similarities. Meanwhile, an increase in the number of nearest neighbors results in a decreased RMSE value in the three algorithms. This shows that the number of nearest neighbors influences the RMSE value. The UPCSim algorithm produces the smallest RMSE value, which means that the proposed algorithm's prediction error is the smallest. So, it can be said that the UPCSim algorithm is superior. Compared to the Cosine similarity, the UPCSim algorithm has an increase in RMSE values ranging from 6.24% to 8.17%, with the average RMSE of 7.53% for all k nearest neighbors. Compared with the UPCF similarity, the UPCSim algorithm has an increase in the RMSE value ranging from 0.42% to 1.79%, with the average RMSE of 1.4% for all k nearest neighbors. Based on the data in Table 2, the three algorithms' average RMSE value can be illustrated graphically, as shown in Figure 5. Figure 5 illustrates the effect of changes in the number of nearest neighbors on the RMSE value. Three algorithms show a decrease in the RMSE value first and tend to be stable when the neighbors' number is greater than 50. The RMSE value in the UPCSim algorithm always shows the smallest value for each different number of neighbors. It shows that the UPCSim algorithm has the lowest error rate compared to the other two algorithms. So, it can be said that the UPCSim algorithm is superior.
The superiority of the UPCSim similarity algorithm can be obtained because the algorithm considers the calculation of similarity weighting involving more complete user behavior (genre and user profile) than that used in Wu et al. [29], which only involved genre. So, the prediction metric is closer to the actual value. However, the UPCSim algorithm is considered computationally more complex than the previous recommendation algorithms so that if it is implemented in a larger dataset, it will take a longer time to produce recommendations.

Conclusion and Future Work
Our experiment results on the MovieLens 100K dataset show that the UPCSim algorithm can reduce MAE and RMSE values by 1.64% and 1.4%, respectively, compared to UPCF algorithm. The strength of our algorithm is the accomodation of the user profile for calculating the similarity weighting in order to capture the user interest more accurately. Although the UPCSim similarity algorithm can improve the accuracy of prediction metrics, this study still has some limitations. The UPCSim algorithm is more complex than the previous algorithms. Thus, it will be time consuming if it is applied in a larger dataset. Besides, the UPCSim similarity algorithm still needs to improve the resulting prediction metrics. Therefore, in future studies, a clustering method can be considered to overcome scalability problems due to the larger number of datasets and reduce computation time.