Collaborative filtering recommendation algorithm integrating user similarity and trust

The rapid development of the Internet has brought convenience to people and has also produced the problem of “information overload”. In view of the traditional collaborative filtering algorithm facing some bottlenecks to be solved, this study proposes a collaborative filtering algorithm that combines similarity and trust. First of all, in view of the large deviation of traditional similarity calculation and prediction of user ratings, this study proposes an optimized Pearson correlation coefficient calculation method; secondly, the trust relationship is established based on the user’s rating of the common project, and the trust relationship between users who do not have a direct trust relationship is established through the transfer of trust; then find the nearest neighbor set of the target user through the fusion of user similarity and trust; finally, the item is scored and predicted to generate a recommendation list. Experimental results show that the algorithm proposed in this study can effectively improve the accuracy of recommendation.


Introduction
As of December 2020, the number of Internet users in my country is 989 million, and the Internet penetration rate has reached 70.4% [1]. With the rapid development of the Internet, the amount of information on the Internet has increased dramatically, and the number and types of products have grown rapidly. It takes a lot of time for users to find the products they need, which causes the "information overload" problem [2][3][4]. In order to solve this problem, a personalized recommendation system came into being, which is widely used in the fields of e-commerce, movies and videos, music, reading, personalized mail, and advertising.
Collaborative filtering is one of the most widely used technologies in recommendation systems. The collaborative filtering recommendation algorithm discovers user preferences by mining user historical behavior data, divides users into groups based on different preferences, and recommends products with similar tastes. Collaborative filtering recommendation technologies are mainly divided into three categories, namely user-based collaborative filtering [5], item-based collaborative filtering [6], and model-based collaborative filtering [7][8]. However, collaborative filtering algorithms have problems such as cold start, data sparseness, and scalability. In the era of social networks, it is easy for users to build their own trust network, and adding user trust information to the traditional recommendation  [9] proposed a new trust measurement model that integrates user interaction information, preference and trust, and used an improved structural hole algorithm to identify influential users in the network and improve the performance of top-N item recommendation. Tong et al. [10] proposed an extended Pawlak conflict model on the trust mechanism to solve the problem of the consensus process, and verified the effectiveness and superiority of the proposed method through examples. Zou et al. [11] proposed a user trust recommendation algorithm based on topic-based tensor decomposition, which is used to mine the trust level of users to different friends when selecting different items.
This study first proposes an optimized similarity calculation method to measure the user's preference for items, and secondly calculates the direct trust value between users based on the user's common scoring items. Users who do not have a common scoring item can be indirectly through the transfer of trust. The trust relationship is built up, and finally the similarity and trust between users are combined to find the set of nearest neighbors of the target user, and the score prediction is performed to generate recommendations. The implementation results show that, compared with the traditional collaborative filtering recommendation algorithm, the method proposed in this study can effectively improve the accuracy of recommendation.

Build user-item rating matrix
Assuming there are m users and n items, the user-item rating matrix is shown in Table 1. …… r m,n Among them, the element r i,j in the i-th row and the j-th column represents the rating of the i-th user on the j-th item.

Calculate user similarity
Calculate the user similarity according to the user-item rating matrix. There are many commonly used methods for calculating user similarity, but in related studies, cosine similarity and Pearson correlation coefficient are mostly used to measure user similarity. Therefore, this article uses the Pearson correlation coefficient to measure user similarity. The calculation formula of Pearson's correlation coefficient is shown in formula (1): Among them, R i,k represents the rating of user i on item k, i R represents the average of user i's ratings on all items, R j,k represents the rating of user j on item k, j R represents the average of user j's ratings on all items, and I i,j represents a set of common scoring items of user i and user j.
Although the Pearson correlation coefficient solves the problem of different user rating levels, it does not consider the impact of the number of overlapping rating items between users on the similarity.   Table 2 that U 1 evaluated 8 items, U 2 evaluated 3 items, and U 3 evaluated 5 items. Using formula (1), we can get Sim(U 1 ,U 2 )=1, Sim(U 1 ,U 3 ) =1, that is, the similarity between U 1 and U 2 and the similarity between U 1 and U 3 are both 1. Although the calculation is correct, U 2 and U 1 have 3 similarity scores, and U 3 and U 1 have 5 similarity scores. We think U 3 and U 2 are more similar to U 1 , so the number of items that users have evaluated together on the basis of Pearson's correlation coefficient is added to better measure the similarity between users. The improved Pearson correlation coefficient formula is shown in formula (2): Among them, I i,j represents the total number of items evaluated by user i and user j, I i represents the total number of items evaluated by user i, and I j represents the total number of items evaluated by user j.

Direct trust
There is no data that directly expresses the degree of trust in the data set, so the direct trust degree is calculated by considering the items jointly evaluated by users. Two users rate the same item and think that the two users have an exchange. According to the user's common rating items, the initial direct trust between users is calculated, and the calculation method is shown in formula (3): Among them, I a,b represents the number of items evaluated by user a and user b, and I a represents the number of items evaluated by user a.
If there are more common evaluation items between two users, the more communication between the two users will be. In real life, the direct trust relationship between people changes with the result of communication. If their communication and results are good, then their trust relationship will be strengthened; if their communication and results are poor, then their trust relationship will be weakened. Calculate the user's average rating based on the user's rating of the item. If the user's rating for an item is greater than the user's average rating, it means it is a positive rating; conversely, if the user's rating for an item is less than the user's average rating Score, it means it is a negative score. If user a and user b's ratings for item i are both positive or negative, it means that the results of the communication between user a and user b are successful, that is, the users have the same opinion; if user a and user b have the same opinion on item i One of the scores is a positive score and the other is a negative score, which means that the communication results of user a and user b are failed, that is, the opinions between users are inconsistent; the expression for determining the success or failure of the communication is shown in formula (4):  Calculating the scoring deviation for both parties of the exchange is convenient to measure the influence of user preference deviation on direct trust. The trust impact factor between users is added to the exchange information between users, and the direct trust between users is calculated. The calculation formula is shown in formula (5): , , , Among them, r a,i represents the rating of user a on item i, r b,i represents the rating of user b on item i, I s represents the set of judging items for successful communication between users, and I f represents the set of judging items for failed communication between users.

Indirect trust
In social networks, if there is a connection between users, there is a direct trust relationship between users. When there is no direct trust relationship between users, it is necessary to calculate the indirect trust relationship between users through the propagation of trust. User a and user b have a direct trust relationship, and user b and user c have a direct trust relationship. Through trust propagation, the indirect trust relationship between user a and user c can be inferred. If each intermediate user is regarded as user b, then the indirect trust calculation formula between source user a and target user c is shown in formula (6): , , Among them, β d is a weight parameter to determine the optimal trust propagation distance, , d max represents the maximum trust propagation distance, according to the sixdegree separation theory, the maximum value of d max is 6, and d represents the distance from the source user to the target user.  [ 1]  , represent the direct trust of user a and user b, and user b and user c, respectively. The parameter α∈(0,1] is an adjustable trust threshold, only users whose direct trust degree is greater than this threshold can participate in the trust propagation process.
In summary, the trust degrees of user a and user b are represented by Trust a,b , then:

Integration of similarity and trust
From the above, the user similarity and user trust are obtained, and the two are weighted and combined to obtain the user's comprehensive correlation. The calculation formula is shown in formula (8): According to the calculated comprehensive relevance of users, sort the comprehensive relevance values of the target user to other users from large to small, and select the N users with the largest comprehensive relevance value to the target user to form the nearest neighbor set N u , and then use the weighted average of N nearest neighbors' ratings of the item to predict the target user's rating of the item.
In order to minimize the error of the predicted score, the average weighting method is used to predict the target user's score. The calculation formula is shown in formula (9): , , , , Among them, P a,i is the predicted score of user a on item i, R b,i is the score of user b on item i, a R is the average score of user a, and b R is the average score of user b. Sort the user's predicted score values for the items from largest to smallest, and recommend the items with a value greater than a specified threshold or the top N items with the highest scores to the target user.

Experimental data
This study uses the Movielens data set provided by GroupLens [12]. The data set used in this article contains 1 million ratings of 3,952 movies from 6040 users. This data set is a rating data set. Users can rate movies with 5 different levels of scores (1~5). Each user will rate at least 20 movies, and the data sparsity is 95.81%.

Evaluation Criteria
The average absolute error (MAE) is used as the evaluation standard. MAE is the average absolute deviation between the user's predicted score and the actual score. The smaller the value of MAE, the higher the recommendation accuracy and the better the recommendation effect. The calculation formula is shown in formula (10): Among them, R a,i is the actual score of user a on item i, P a,i is the predicted score calculated by recommendation, and n is the number of items in the data set.

Experimental results
When calculating the trust between users, the trust value is affected by the trust threshold α and the maximum trust propagation distance d max . Therefore, experiments are performed on these two parameters to find the best combination of parameters.
First of all, an experiment is performed on the confidence threshold α, and the experimental results are shown in Figure 3. Sim&Trust in the figure represents a recommendation method that combines user similarity with trust, and Trust represents a recommendation method that only uses trust. The experimental result is obtained under the premise that d max =3 and the number of nearest neighbors is 30. It can be seen from Figure 3 that when the value of α is between 0 and 0.15, the value of MAE decreases. When the value of α is 0.15, the MAE value obtained by the recommendation method that combines similarity and trust is the smallest. When the value of α is greater than 0.15, the value of MAE increases. Therefore, the value of α can be set to 0.15.  After determining the value of the trust degree threshold α and the maximum trust propagation distance d max , the weight factor λ in formula (8) is tested, and the number of nearest neighbors is 30. Figure 5 compares the corresponding values of MAE when the value of λ is between 0 and 1.  It can be seen from Figure 5 that when the value of λ is 0.4, that is, the similarity algorithm occupies a weight of 0.4, and the trust relationship occupies a weight of 0.6, the value of MAE is the smallest, and the recommendation effect is the best.
After determining the above parameters, in order to verify the collaborative filtering recommendation algorithm that combines similarity and trust, the experiment selected three methods of calculating similarity of traditional collaborative filtering algorithms to compare their MAE values. The algorithm includes cosine similarity (Cosin), Jaccard similarity (Jaccard) and Pearson similarity (Pearson). The experimental comparison results are shown in Figure 6. It can be seen from Figure 6 that the MAE value of these algorithms gradually decreases with the increase of the number of nearest neighbors, and tends to be stable when the number of nearest neighbors reaches 100, indicating that the relative maximum value will be achieved when the number of nearest neighbors is 100. Good recommendation result. It can also be seen from Figure 6 that the MAE value of the algorithm proposed in this study is significantly lower than that of the traditional method, indicating that the algorithm proposed in this study can effectively improve the accuracy of the algorithm and produce more accurate recommendation results.

Conclusion
This study uses improved cosine similarity to measure user preference, and divides the trust relationship between users into direct trust relationship and indirect trust relationship according to whether users have a common rating item, and then the user similarity and user trust are weighted and fused to obtain the user's comprehensive correlation, and finally the neighbor set with the highest comprehensive correlation with the target user is found. The experimental results verify the superiority of the method proposed in this study and greatly improve the high efficiency and accuracy of the recommender system. This study only uses the user's rating information, and does not consider contextual information such as time and other user attribute information, so in the next step, we will consider introducing contextual information to improve the recommendation results.