Novel Multidimensional Collaborative Filtering Algorithm Based on Improved Item Rating Prediction

Current data has the characteristics of complexity and low information density, which can be called the information sparse data. However, a large amount of data makes it difficult to analyse sparse data with traditional collaborative filtering recommendation algorithms, which may lead to low accuracy. Meanwhile, the complexity of data means that the recommended environment is affected by multiple dimensional factors. In order to solve these problems efficiently, our paper proposes a multidimensional collaborative filtering algorithm based on improved item rating prediction. )e algorithm considers a variety of factors that affect user ratings; then, it uses the penalty to account for users’ popularity to calculate the degree of similarity between users and crossiterative bi-clustering for the user scoring matrix to take into account changes in user’s preferences and improves on the traditional item rating prediction algorithm, which considers user ratings according to multidimensional factors. In this algorithm, the introduction of systematic error factors based on statistical learning improves the accuracy of rating prediction, and the multidimensional method can solve data sparsity problems, enabling the strongest relevant dimension influencing factors with association rules to be found. )e experiment results show that the proposed algorithm has the advantages of smaller recommendation error and higher recommendation accuracy.


Introduction
Recommendation algorithms are mainly divided into six categories: content-based filtering, collaborative filtering, recommendation based on association rules, recommendation based on utility, recommendation based on knowledge, and mixed recommendation [1,2]. Collaborative filtering (CF) algorithms are the most widely used and classic because of their easy implementation, high accuracy, and high recommendation efficiency. However, in the era of big data, one typical feature is that the amount of data is huge but the information density is low, which can also be called information sparse data. Collaborative filtering algorithms are often ineffective when dealing with large amounts of sparse data. Furthermore, the complex data environment results in many factors affecting the recommendation. With the development of the mobile Internet, mobile devices can easily obtain more information about dimensions, such as location, weather, and social relationships. Under different external influences, the recommendation results will change greatly. However, most of the current collaborative filtering algorithms are based on a single dimension for recommendation.
In order to improve the performance of collaborative filtering recommendation algorithms, researchers resolve the problems from different perspectives and propose a variety of recommendation algorithms. Some researchers optimized the user scoring matrix using different methods [3][4][5][6], and others used fuzzy sets to efficiently represent user features [7,8].
ese methods all effectively alleviate the problem of sparse data. In order to find a neighbor set that is more similar to the target user's interest and improve the accuracy of recommendations, some researchers improved the similarity calculation method [7,9,10], and others used location information and trust relationship information, such as [11,12]. e potential relationship between information mining users provides new ideas for finding neighbors. Researchers have also used demographic knowledge [13,14] to achieve major breakthroughs, while some scholars used score ranking prediction methods to enhance recommendation performance such as [15][16][17], and others chose the genetic algorithm used in the prediction process to improve recommendation performance, such as [18,19]. Context as a dynamic description of an item and a user's situation affects the user's decision-making process; hence, it is essential for any recommendation system in a big data environment [20][21][22]. ese algorithms alleviate the problems caused by data sparsity to some extent, improve the accuracy of calculation similarity and recommendation quality with different methods, but the implementation of the algorithm depends on a large amount of user information and calculation, which can be due to high complexity and hard implementation.
In this paper, we focus on the recommendation algorithm of data in complex environments. Firstly, we study the traditional item rating prediction algorithm and make some improvements with adding the weight problem considering user score. We introduce the system error factor based on statistical learning for the user to develop a personalized rating prediction algorithm. en, we propose a personalized rating prediction method that is combined with a classical collaborative filtering algorithm User-Inverse Item Frequency (User-IIF) [23] to develop a novel collaborative filtering algorithm. is method is based on both User-IIF and personalized rating prediction. Secondly, we focus on the impact of multidimensional factors and propose a novel multidimensional method, which can separate user groups based on context-aware dimensions combined with both user clustering and item clustering. Finally, we conduct a series of experiments to prove that our algorithms and methods are effective and efficient. e experiment results prove that our algorithms are easy to implement with low computational overhead. In addition, our algorithms can also process sparse data and improve the accuracy of recommendations. e rest of the paper is organized as follows. e works related to our research and our novel methods to deal with multidimensional recommendation are proposed in Section 2, and the experiment results and discussion are presented in Section 3. Finally, the conclusion and suggestions for future work are given in Section 4.

Proposed Item Rating Prediction Method.
Traditional item rating prediction algorithms take the target user's average item historical score as the reference center and then use the similarity between similar neighbors to perform item rating prediction. When user data is sparse, the error rate of traditional item rating prediction algorithms increases and the accuracy rate decreases.
is proposed method adds weights to consider user ratings and introduces systematic error factors based on statistical learning to improve traditional item rating prediction algorithms.

User Rating Weighting
Factor. Traditional item rating prediction algorithms take the historical average score of the target user as the central value and rely on the neighbor's score to correct it. Traditional algorithms rely too much on the user's score and its anti-interference ability is ineffective when it is faced with data sparseness. For example, when a user has not scored many items, even if the average user score is close to 0, using the user's historical average score as the center value may result in inaccurate recommendation results. e item scoring prediction method proposed in this work considers the factors of public scoring, improves the algorithm using (1), introduces the weighting factor a of the user's score, and assigns the weight of the scoring (1 − α) to the scoring of the item. (1)

Systematic Error Factor Based on Statistical Learning.
A large number of studies show that there are errors in item rating prediction, which only a few algorithms have addressed by performing a statistical analysis calculation on each recommendation result. In order to achieve more personalized recommendations, it is necessary to establish a system error factor generated by the recommendation system for each user. e system error ε u generated by the recommendation of target user u is calculated by (2), where the actual score of user u on item i is represented as R ui , and r ui describes the predicted rating of user u generated by the system. N I(u) describes the number of items in the itemsets I(u) that target user u adopts from the recommendation results. rough statistical learning, the system sets the error factor for each user, and then this is applied to the collaborative filtering algorithm to correct the item rating prediction, as shown in (3), for a more accurate personalized recommendation algorithm for the target user.
Based on the above calculations, this paper proposes a novel algorithm, namely, improved item-rating prediction (IIP) for user scoring. e main steps of IIP are shown in Figure 1(a). e basic idea is to form a set of error factors for Scientific Programming 3 each user through statistical learning, and then apply it to the collaborative filtering algorithm to correct the item rating prediction.

User Scoring Based on Multidimensional Context
2.2.1. User Similarity Calculation. e first step of our method is to get the user's neighbor cluster, which is obtained through the user's scoring matrix. Users in the user group whose interests are similar to each other can be selected as neighbor users.
is paper utilizes Pearson's similarity [24] to measure the distance between users as shown in (4). Pearson's similarity is similar to cosine similarity in form.
e average evaluation value of users is subtracted during calculation, which is to normalize the cosine similarity and unify the user's scoring standard. e range of Pearson's similarity is [−1, 1], which is more accurate than that of Jaccard's correlation coefficient and cosine similarity.
where s u,v represents the similarity value between target user a and its neighbor cluster user, I u represents the set of items that target user u has scored, and I v indicates the set of products scored by neighbor cluster user v. i represents the item that the target user u and neighbor cluster user v scored together. r u,i indicates the rating of item i by target user u, and r u indicates the average rating of target user u. Following the same principle, r v,i is the score of neighbor cluster user v for item i, and r v indicates the average rating of neighbor cluster user v. e traditional collaborative filtering algorithm uses the above formula to calculate the similarity between users. A user will have different scoring standards under different contexts, such as the user's rating of a hotel when traveling on business and the rating criteria for a hotel when traveling privately. So after considering the context, it has nothing to do with the previous rating and is replaced by a new symbol, we use r u,c instead of r u and r v,c instead of r v , where r u,c represents the average rating of user u under context condition c.
e range of c can be appropriately generalized or filtered as needed. e proposed improved method can be described as follows: e improved user neighbor cluster similarity calculation formula takes into account the influence of context factors on the basic rating, making the calculation formula closer to the context recommendation environment. After considering the context, the user's similarity calculation (4) is improved to (5), and the influence of the context on the user's rating is taken into account when using the mean calibration error of the score.

User and Item Cross-Iterative Bi-Clustering.
e crossiterative bi-clustering method is used for cluster users and items separately. Due to the sparsity of the user-item matrix, the initial clustering is not accurate enough. erefore, we use the cross-iterative method to adjust both user clustering and item clustering.
User clustering adjustment is calculated by (6), and item clustering adjustment is calculated by (7) where r tk is the score of user u t for each item and r ck is the score of user u c for each item. I(u t , u c ) is a collection of items that u t and u c have scored together, and sim(i i , j j ) is the similarity between items i i and j j . Here, we also use Pearson's similarity to calculate this. If there are a lot of items that have been rated together, they can be considered as users with similar interests. If the obtained S u (u t , u c ) is greater than a certain threshold ε, it can be kept in the cluster; otherwise, it will be separated from the current cluster. en, we calculate the similarity between u t and the other cluster centers. is is added to the cluster with the most similarity to complete the adjustment of the user cluster.
where r tk is the score of each user on item i t and r ck is the score of each user on item i c . U(i t , i c ) is the set of users that i t and i c have scored together. sim(u i , u j ) is the similarity between items u i and u j . Here, we use Pearson's similarity. If the obtained S I (i t , i c ) is greater than a certain threshold η, the item will be kept in the cluster; otherwise, it will be separated from the current cluster. en, we calculate the similarity between i t and the other cluster centers. is is added to the cluster with the most similarity to complete the adjustment of the user cluster. Algorithm 2 is proposed for cross-iterative bi-clustering, as shown in Figure 1(b).

Context Similarity Calculation.
When the scope of the context is very large, there are many different dimensions, such as time, place, surrounding people, etc. According to the characteristics of the dataset and the environment collection ability, the context dimension selected by the recommendation system will be different. As far as the time dimension is concerned, it can also be specifically subdivided into seasons, weeks, moments, holidays, and so on. Assume that we select a system with z different dimensions, which is shown as c � (c 1 , c 2 , . . . , c z ), where c t (t � 1, . . . , z) is a contextual dimension (such as time, location, weather, etc.). e similarity of the context between two score records x, y on dimension t can be recorded as sim t (x, y). We use the degree of influence of the context dimension on the score to measure the similarity between the two context variables as follows: where u is the user and r u,i,x t describes the rating of item i under the context of x t by user u. r i is the average score of item i. Similarly, r u,i,y t is user u's rating of item i under context y t , σ x t is the standard deviation of context dimension x t , and σ y t is the standard deviation of context dimension y t . is paper proposes a novel method to measure the similarity of context x and y, according to the influence degree of different contexts on the score of the same commodity i in the t dimension. Algorithm 3 is shown in Figure 1(c) to calculate context similarity efficiently.

e Proposed Multidimensional Context-Aware Based
Method. In multidimensional recommendation, the addition of context results in a lot of interesting rules and mining high-frequency patterns between contexts and items can help discover the impact of different contexts on user decisions. In this paper, we select the multidimensional context from strong association rules with the algorithm FP-growth.
Generally, when determining the neighbor user group, the N-user with the largest similarity can be selected as the cluster neighbor of the target user according to the similarity calculation formula. e context can help the user to filter out some of the user score records that have a large difference in context from the current recommendation environment. Because some commodity decisions are closely related to a certain context factor, the context is called a hard context and must be considered and satisfied in the recommendation. Some score records that do not satisfy the current context can be filtered out preferentially and are not considered when calculating the similarity of neighbor clusters.
Due to the influence of the context, the user's rating record has its own context background, and the target user's current background is different, so the rating record in different contexts is different from the user in the current context. In order to distinguish the relevance of the rating record under the current context, we use the contextual similarity calculation method to calculate sim t (x, y, i), which describes the similarity between context x and context y in the t dimension.
e user rating predictions in a multidimensional context can be described as follows: where c is the context in which the target user is located and ε u is the system error (the other symbols are described in the previous formula). It is well known that contexts can have many specific dimensions, depending on the data collection, such as time, location, and related personnel. e time dimension can be divided into seasons, weeks, moments, holidays, and so on.
After comprehensively considering the influence of context on the recommendation system, (10) can be replaced with (11). e basic clustering rating prediction formula is modified as follows: Algorithm 4 shown in Figure 1(d) is proposed as the multidimensional context-aware based method. Using this algorithm, item scores can be obtained under multidimensional conditions.

Experimental Datasets and Environment.
In order to verify the impact of a user's scoring weight on the recommendation results and to prove that the recommendation accuracy of the collaborative filtering algorithm based on the user and improved item scoring is more accurate, it is necessary to compare our proposed algorithm with traditional algorithms that are based on classical item scoring prediction methods. ese experiments were conducted under the following conditions: (1) CPU dual core i7-8750H with frequency 2.5 GHz; (2)  e MovieLens 100K dataset comprises 100,000 ratings from 1000 users for 1,682 movies, rated between 1 and 5. e sparsity of the set is about 93.7%, and the data sparse problem is evident. In the experiment, in order to simulate datasets of different scales and different sparsity levels, the existing datasets were processed to generate four datasets as shown in Table 1. From this table, we can see that the datasets were randomly divided into training sets and test sets during the experiment, where the training set accounted for 80% of the entire dataset, and the test set accounted for the remaining 20%. When the Jesters dataset was processed, the score was formulated with the value 0 to 5 by r * � (r + 10)/4. e source of the experimental datasets for testing the multidimensional collaborative filtering algorithm is CAR-SKit (https://github.com/irecsys/CARSKit/), which is an open-source Java-based context-aware recommendation engine. We used two datasets: DePaulMovie [25] and Tri-pAdvisor_v1 [26]. In the experiments, DePaulMovie kept its original shape, and TripAdvisor_v1 was filtered and adjusted. We used 70% of the dataset as the training set and 30% as the test set.

Evaluation Indicators of the Experiment Results.
In order to study the performance of the improved recommendation algorithm, the experiment used four indicators, namely, precision, recall, mean absolute error (MAE), and root mean square error (RMSE).
Precision is an important indicator for evaluating the performance of a recommendation algorithm. It describes the proportion of the recommended items that the recommendation system makes for the user. e larger its value, the higher the accuracy of the system, and the better the system's recommendation. Precision is computed as shown in the following formula: where u is the target user who uses the system, R(u) is the set of recommended items for the user, and T(u) is the set of items in which the user is actually interested.
"Recall" describes how many of the products the user is interested in and how many are actually recommended to him by the system. Recall is computed as shown in the following formula: e molecular weight of recall is the same as the molecule of the precision, which is the intersection of R(u) and T(u); however, their denominators are different. e denominator part of the accuracy rate is R(u), which is the set of all items recommended to the user, and the denominator of the recall rate is T(u), which is the collection of all the items of interest of the user. A larger recall corresponds to a better performance.
MAE avoids the problem where the errors cancel each other out and accurately reflects the actual prediction error. e calculation method is as shown as in formula (14), which averages the absolute value of the difference between the actual score and the predicted score.
where Y i and y i denote the original data and predicted data, respectively. RMSE is used to measure the deviation between the observed value and the true value. e calculation method is shown in formula (15), that is, the ratio of the square of the difference between the predicted score and the actual score to the m number of observations squared.
e smaller the MAE and RMSE values, the better the recommended performance of the algorithm.

Choosing the Best Value for the User Score Weighting
Factor. Firstly, the optimal value for the user's score weighting factor a is determined for the collaborative filtering algorithm (U&IPRP-CF), based on the user and the improved item rating prediction. To ensure the accuracy of the experiment, the number of item recommendations N of the Jester-500-100, Jester-1000-100, and Jester-1000-200 datasets is set to a constant N � 10, meaning that 10 items are recommended to each target user. However, the number of items in the MovieLens dataset is large, and the number of recommended items N is set to a constant N � 30, meaning that 30 items are recommended to each target user. e K number of the most similar neighbors selected for each target user is a variable, and K is taken from 50, in increments of 10, and sequentially taken to 100, that is, [50, 60, 70, 80, 90, 100].
Each dataset was tested on a set of values of a � 1, 0.9, 0.8, 0.7, 0.6, 0.5 for a. e experiment results of the Jester-500-  Tables 8 and 9. e best value of a is around 0.7. In summary, as the size of the dataset increases and sparsity increases, the optimal value of the user's score weighting factor a decreases.
After determining the best value of the user's score weighting factor, a comparative experiment is carried out, and the algorithms are first sorted and named, as shown in Table 10.
e algorithms are compared in Table 10. It can be seen that for the different datasets, the error generated by the U&IPRP-CF algorithm is smaller than the traditional algorithm, and it has obvious advantages in terms of the accuracy of recommendations.
In the three datasets Jester-500-100, Jester-1000-100, and Jester-1000-200, the U&URWFRP-CF algorithm does not reduce the recommendation error, whereas the recommendation error for the U&URWFRP-CF algorithm is reduced in the MovieLens dataset. is shows that when the dataset is small and sparse, the user's score is very reliable; otherwise, when the dataset is large and information sparse, the user's score is less likely to be referenced under a large base. In this situation, the weights of the user ratings need to be considered.
From these figures, it can be seen that when the scale and sparsity of the dataset are gradually increased, the recommendation error of the U&IPRP-CF algorithm is also reduced and the performance becomes increasingly better, which alleviates the data sparsity problem to some extent.
It can be seen from the experiment results that using the average score of the public to replace the average historical score weight of the user alone can enable the system to predict the user's score on the unrated item, resulting in less error, thus making more accurate recommendations. To reduce the error of score prediction, a systematic error factor is established for each user, and statistical learning is improved on each recommendation, which can effectively correct the error and improve the accuracy of recommendations.
In addition, combined with experiment 3.3.1, it can be seen that choosing the correct parameters is also key to improving the accuracy of recommendations. When the number of items recommended by item N, the number of neighbors of target user K, and the user's score weighting factor a are in a practical application, the recommendation system needs to compare and select appropriate values for the experiment.

Performance Comparison Experiments Using Different
Algorithms.
e experimental results were verified by DePaulMovie and TripAdvisor_v1 datasets published by GroupLens. Our paper compared four recommendation algorithms: CF, CF-AR, Multi-CF, Multi-CF-AR, CF-AR-IIP, and Multi-CF-AR-IIP, the latter two being the algorithms proposed in this paper. e algorithms are explained as follows: (a) CF: classical user-based collaborative filtering recommendation algorithm (b) CF-AR: user-based collaborative filtering recommendation algorithm combined with association rules mining (c) CF-AR-IIP: user-based collaborative filtering recommendation algorithm combined with association rules mining and improved item-rating prediction, which is proposed in this paper (d) Multi-CF: classical multidimensional context-aware user-based collaborative filtering algorithm (e) Multi-CF-AR: multidimensional context-aware user-based collaborative filtering algorithm combined with association rules mining (f ) Multi-CF-AR-IIP: user-based collaborative filtering recommendation algorithm combined with association rules mining and improved item-rating prediction, which is proposed in this paper In the experiments based on the DePaulMovie datasets, the top-K (K � 10, 15, 20, 25) neighbors with the highest similarity for each user and the top-N (N � 10, 15,20) recommended items with the highest predicted rating for the target users are selected. e system has three context   Table 11 shows information on the related context dimensions selected for the datasets. Figures 3(a) and 3(b) show the precision of the algorithms. Multi-CF-AR-IIP achieves the best precision, and Multi-CF-AR is the second best. Particularly when N is small, Multi-CF-AR-IIP has obvious advantages. As N increases, the precision of all the algorithms decreases, and the difference between the algorithms becomes increasingly smaller.
is shows that the increase in the number of recommendations reduces the accuracy of the recommendation. Different algorithms will exhibit different characteristics in different datasets. Multi-CF has an advantage in DePaulMovie, but it does not work well in TripAdvisor_v1. Figures 3(c) and 3(d) show the recall of the algorithms. e result is the same as for precision, where Multi-CF-AR-IIP achieves the best recall, and Multi-CF-AR is the second best. However, recall increases significantly as N increases.
is is because the denominator of recall is the number of items in which the user is actually interested and its value is small.     five algorithms are almost similar. But the error gap in TripAdvisor_v1 is obvious, and the multidimensional algorithms produce a high number of errors. MAE and RMSE calculate the difference between the predict rating and the user's true rating, which aims to measure the difference between the recommended result and the user's true preference. e smaller the MAE and the RMSE, the more users like the recommended items. Association rule mining (ARM) improves the precision and recall of the recommendation, which means that it improves click volume and purchase amount. At the same time, it also makes it difficult for the recommender system to predict the rating, and the probability of the system recommending items that the user does not like is increased. From the experiment results, it can reduce MAE and RMSE with improved itemrating prediction (IIP) method. In general, the fusion algorithm Multi-CF-AR-IIP has better recommendation performance than the others. It