Three‐way recommendation integrating global and local information

The matrix factorisation approach computes a low-rank approximation of the incomplete user-item rating matrix. Existing approaches suffer from under-fitting due to the use of global information for all users and items. In this study, the authors propose a three-way recommendation model that integrates global and local information. This new model has a number of main aspects. The first is rating prediction with global and local information. A clustering and two matrix factorisation algorithms are employed for this purpose. The second is the computation of recommendation thresholds based on the decision-theoretic rough set model. Misclassification and promotion costs are considered simultaneously to build the cost matrix. The last is the determination of the recommender actions based on the prediction and thresholds. Experimental results on the well-known datasets show that authors’ proposed model improves recommendation quality in terms of average cost.


Nomenclature
U the set of all users V the set of all items R the rating matrix p predicting rating m the number of users n the number of items r i,j the rating of u i to v j c rating levels d(.) feature vector D feature vector set r max the maximum rating level r min the minimum rating level g the user group number q the item group number G g the set of the gth user group Q q the set of the qth item group NU the total number of user groups NI the total number of item groups a MF latent factor a P the positive domain a N the negative domain a B the boundary domain α three-way recommended threshold three-way non-recommended threshold PP the domain for the recommended item that the user also likes PN the domain for the recommended item that the user does not like BP the domain for the promoted item that the user also likes BN the domain for the promoted item that the user does not like NP the domain for the non-recommended item that the user likes NN the domain for the non-recommended item that the user does not like λ PP the cost for PP λ PN the cost for PN λ BP the cost for BP λ BN the cost for BN λ NP the cost for NP λ NN the cost for NN x random sample point

Introduction
The matrix factorisation (MF) approach commonly produces a low-rank approximation based on the rating matrix of recommender systems [1,2]. Its advantages including fast speed and ability to effectively alleviate the data sparsity issue. The probabilistic MF (PMF) [3] algorithm establishes a model from the perspective of probability by observing a large amount of linear data. Singular value decomposition (SVD) [4,5] uses the characteristics of users and items to give an optimal low-rank approximation to the real-valued matrix based on the squared error. Non-negative MF (NMF) uses non-negative constraints to prevent mutual cancellation between basis functions and generates partbased representations [6,7]. However, existing MF-based collaborative filtering (CF) methods use global information about all users and items, which can suffer from an under-fitting problem.
In this paper, we propose to build a three-way recommendation model that integrates global and local information (3WGL). Threeway recommender systems partition a set of items into positive, negative and boundary domains [8][9][10]. Items in these domains are divided into recommended, not recommended and promoted [9,11]. Misclassification costs [11,12] result from wrong recommendations, including recommending items to users who dislike them, and not recommending items to users who like them. From the view of cost sensitivity [13], the cost of false-positive errors is different from the cost of false-negative errors. In order to recommend items with fewer errors, the systems introduce promotion cost, e.g. distributing coupons to users. We construct an improved MF approach that integrates global and local information to predict ratings. According to the three-way decision and prediction ratings, the recommendation behaviours of items can be obtained.
First, we predict the ratings with MF considering both global and local information. For this purpose, we divide users and items into a number of groups using the k-means clustering [14] method coupled with the Kullback-Leibler divergence [15]. In this way, we obtain different sub-matrices of the original rating matrix. Submatrices of users and items indicate user local information and item local information. Global information [16] is used to obtain users' preferences based on all users and items involved. By integrating global and local information, we build two kinds of MF algorithms of local-user-global-item (LUGI) and global-userlocal-item (GULI). The results of these two algorithms are integrated as the predicting rating p. Second, we calculate the threshold pair α and based on the three-way decision. We construct a cost matrix according to the misclassification cost and promotion cost mentioned above. The cost matrix is a 3 × 2 matrix that stores the cost of recommendation actions in different states [17]. We calculate the threshold pair α and based on the cost matrix and the rating level. α and represent certain tolerance levels for making a decision of recommendation, non-recommendation or promotion [8,17].
Third, recommendation actions are determined based on p, α and . If p ≥ α, the item is recommended. The recommended items that the user does not like cause the wrong acceptance cost (λ PN ). If p ≤ , the item is not recommended. Not recommending the user's favourite items causes the wrong rejection cost (λ NP ). If < p < α, the item is promoted. In boundary-positive and boundary-negative regions, items need to pay the promotion cost (λ BP and λ BN ), i.e. the businesses will distribute the coupons to the consumers. For the correct recommendation, recommending the user's favourite items causes the correct acceptance cost (λ PP ), and items that users do not like that are not recommended cause the correct rejecting cost (λ NN ). Finally, the average cost is used as a measure of recommendation quality.
Experiments on the well-known datasets show that (a) the integration of global and local information can effectively improve the quality of recommendation; and (b) our three-way method outperforms the existing three-way approaches in terms of average cost.
The rest of the paper is arranged as follows: Section 2 briefly reviews some of the studies related to our paper, as well as the definition of the rating system. Section 3 discusses how to build a three-way recommendation system model based on global and local information. Section 4 shows experimental results in realworld datasets. Section 5 concludes the paper.

Matrix factorisation
MF is one of the most commonly used CF methods. It was initially designed to resolve matrix sparseness problems in machine learning [16]. In 1998, it was first introduced to the CF domain [18] and widely adopted. Srebro et al. [19] provided a simple and efficient algorithm for solving weighted low-rank approximation problems. Lee et al. [20] proposed a new MF model where the observation matrix is represented by a weighted sum of low rank matrices.
Salakhutdinov and Mnih [1,2] presented a fully Bayesian treatment of the PMF model in which model capacity is controlled automatically by integrating over all model parameters and hyperparameters. SVD [21] is used to establish an implicit high-order structure associated with a document to detect related documents based on the terms found in queries. Billsus et al. [18] proposed an algorithm based on the SVD of a user rating matrix, which uses the latent structure to substantially eliminate the limitations of previous approaches. Ma et al. [22] proposed a semi-NMF method with global statistical consistency, emphasising the consistency between the predicted values and the statistics given by the data.

Three-way recommender systems
A three-way decision focuses on ternary problems, that is using three regions instead of two to represent a concept [11,23]. Yang and Yao [24] proposed a multi-agent decision-theoretic rough set model and expressed it in the form of three-way decisions. Li et al. [25] proposed a sequential three-way decision method based on granular computing for cost-sensitive face recognition. Li et al. [26] presented a cost-sensitive sequential three-way decision model that simulates a gradual decision process from rough granules to precise granules.
In order to reduce the recommendation errors, three-way decisions have been introduced into recommender systems. A three-way recommendation classifies items into three parts, namely, recommended items, promoted items and rejected items. Schaffer et al. [27] combined the implicit profile and the feedback profile with suitably greater weight being given to the feedback information to generate three-way media recommendations. Liu et al. [28] adopted a three-way decision approach based on the decision-theoretic rough set model adopted to risk government decision-making. However, they did not consider cost-sensitive issues. Zhang and Min [9] proposed a framework that integrates three-way decision and random forests to build cost-sensitive recommender systems, but they do not consider pair-wise preference of items. Huang et al. [8] introduced pair-wise preference into cost-sensitive three-way recommendations that forms the basis of a ranking strategy, using a user's preference to estimate the user-item relevance probability.

Rating system
In this subsection, the rating system [10] is defined as follows. Let U = {u 1 , u 2 , …, u m } be the set of users and V = {v 1 , v 2 , …, v n } be the set of all possible items that can be recommended to users. The rating function is given as follows: where R c = {r 1 , r 2 , …, r c } is the rating domain employed by users to evaluate items. For convenience, we denote the rating system as an m × n rating matrix R = (r i, j ) m×n , where r i,j = R(u i , v j ), 1 ≤ i ≤ m and 1 ≤ j ≤ n. Table 1 is a simple rating system of size 4 × 4, where R c = {1, 2, 3, 4, 5}. The numbers 1-5 represent the five rating levels. Some of these elements are zero, indicating that the items are not rated by the users.

Proposed approach
In the section, we illustrate how to build a three-way recommendation model integrating global and local information. For the sake of clarity, Nomenclature introduces the symbols in this paper.

Three-way framework
In order to obtain the recommendation behaviours of items, we calculate the prediction rating and three-way threshold pair (α, ). Fig. 1 depicts our approach framework. It is divided into three parts: Fig. 1a constructs LUGI and GULI, two MF algorithm models using the rating matrix. Two types of prediction ratings can be obtained based on these models. By integrating these two ratings, we can get the prediction rating p. Fig. 1b constructs a cost matrix to calculate the threshold pair (α, ). α is the threshold of recommended items and is the threshold of not recommended items.  Fig. 1c determines the recommender actions of items according to p, α and . Items can be recommended, promoted, or not recommended to users. After getting the recommender actions, we can get the recommendation performance of our algorithm.

Local information extraction through clustering
In the section, we describe how to extract local information. Local information corresponds to user subsets and item subsets clustered from the rating matrix. We employ the k-means clustering algorithm to extract local information. The clustering algorithm includes two steps: feature vector extraction and user/item clustering.
First, we introduce feature vector extraction. Fig. 2 depicts an example of item-based feature vector extraction corresponding to Table 1. This is the same way to extract user feature vectors. We extract the item-level connections to obtain a feature vector of length c. Each element of the feature vector is the number of users' connections to the corresponding level. In this example, u 1 rates v 2 as 2, u 2 rates v 2 as 2, u 3 rates v 2 as 1, u 4 rates v 2 as 1 and nobody rates v 2 as 3, 4 or 5. Hence, the rating distribution of v 2 is d(v 2 ) = (2, 2, 0, 0, 0). Similarly, we have d(v 1 ) = (0, 0, 1, 1, 0), d(v 3 ) = (0, 1, 1, 1, 1) and d(v 4 ) = (0, 1, 0, 2, 0). Second, we use the feature vectors to represent the users/items in the k-means clustering. k-means clustering is a prototype-based clustering algorithm, where k is a constant that indicates the number of categories of clustering. Here, we employ the Kullback-Leibler divergence method to calculate the distance between two items. There is a similar way for user-based clustering.

Rating prediction and ensemble through MF
In the section, we introduce a sub-matrix construction, reconstruct the MF optimisation objective function based on these submatrices and design two MF algorithms integrating global and local information.
Firstly, we construct two kinds of sub-matrices using the user/ item cluster and the R. Assume we have NU × NI clusters. Let g ∈ [1, NU] be the user cluster number and q ∈ [1, NI] be the item cluster number. R g,* is a sub-matrix with historical ratings for users in G g and all items. R *,q is a sub-matrix with historical ratings for all users and items in Q q .
Secondly, we reconstruct the new MF optimisation objective functions based on R g,* and R *,q . According to the idea of MF, we decompose each sub-matrix into the product of two low rank matrices. We adopt the principle of square error to design optimisation functions as (2) and The learning goal can be viewed as minimising the integrating of these two functions, where L(R g,* , G g V) ≤ ε 1 and L(R *,q , UQ q ) ≤ ε 2 are constraints, and ε 1 and ε 2 are almost close to 0. Using Lagrange multipliers [29], the objective function with the L 2 regulariser [30,31] of our algorithm can be further presented as where λ U , λ V , λ 1 and λ 2 are the regularisation parameters. π 1 and π 2 are hyperparameters that are used to control the contribution of user and item features in an embedded model. Finally, we design two MF algorithms integrating global and local information. Here, we use the root mean square error (RMSE) [32] as the loss function in (5). By computing the partial derivatives of parameters, we can get the updated formulas: where is the learning rate. We set σ 1 and σ 2 as the RMSE in R g,* and R *,q , respectively, and Δ 1 , Δ 2 as the prediction errors R(i, j) −G g (i, a)V(a, j) and R(i, j)−U(i, a)Q q (a, j). Based on the updated formulas, we can obtain two prediction ratings, R LUGI and R GULI . The average rating of these two ratings as p is involved in determining the recommendation behaviours of the items.

Three-way recommendation
The traditional decision model is the two-way decision model, that is, accept and reject the two options. However, this model can easily lead to erroneous decisions, and in some cases, making wrong decisions can be costly. On this basis, Yao et al. put forward a three-way decision theory. Moreover, introduced the Bayesian risk decision [33] into the three-way decision method, which makes it have the risk-cost-sensitive feature [34].
We define the entire domain as A. The three-way decision set is which indicates that the judgement objects belong to the positive domain, the negative domain, or the boundary domain, respectively. Different domains correspond to different decisions, and different decisions result in different costs. Different decision costs are divided into six situations: correct acceptance cost λ PP , false rejection cost λ NP , boundary domain acceptance cost λ BP , correct rejection cost λ NN , false acceptance cost λ PN and boundary domain rejection cost λ BN . Table 2 is a cost matrix constructed on these six kinds of decision costs. To simplify the rules, some constraints should be added so that each item can be put into only one domain. The first condition is The second condition is The final condition is α* is the positive domain threshold and * is the negative domain threshold. Here, α* and * are computed as follows: We map α* and * to the rating interval to obtain the recommendation behaviour demarcation threshold pair (α, ). The rules for mapping are: where r min is the user's minimum rating of the item and r max is the maximum rating. By comparing the relationship between (α, ) and the prediction rating p, we can get the recommendation behaviours of items: (a) if p > α, the item can be recommended; (b) if p ≤ α and p ≥ , the item can be promoted; (c) if p < , the item can be not recommended.

Experiments
In the section, we conduct several experiments on real-word datasets and evaluate the proposed performance of our algorithm based on the average cost [9]. For each dataset, we randomly divided it into a training set and a testing set in a ratio of 8:2. First, we use training data to build our three-way recommendation model. Then, we use the three-way model to recommend items to each user in the testing data and evaluate the recommendation performance. The average cost is used as the evaluation measure. The recommendation effect is good and the average cost is low. Our experiments are performed on two MovieLens [35] datasets and one DouBan dataset shown in Table 3 For parameters, we set learning rate = 0.0008 as the gradient descend iteration step; λ U = λ V = λ 1 = λ 2 = 0.05 are the penalty factors; and hyperparameters π 1 = 0.5 and π 2 = 0.3. The cost matrix is set as λ PP = λ NN = 0, λ BP = λ BN = 20, λ NP = 40 and λ PN = 50. Then, we can obtain α* = 0.6 and * = 0.5. Based on (13), the recommendation behaviour demarcation thresholds are α = 3.8 and = 3.4. We design two sets of comparative experiments to verify the recommendation performance of our algorithm. We observe the average cost on three datasets.
Exp 1: We use LUGI and GULI, two MF algorithms, to predict ratings and use different integration strategies between them to predict ratings. The average cost is used as an evaluation metric to observe which strategy can obtain better predictions on different datasets.
Exp β: Our algorithm is compared with the classical three-way recommendation based on random forests and kNN to judge how well the algorithm is improved. Table 4 shows the recommendation performance using different strategies on different datasets. For the Ml-100k and DouBan datasets, we clustered users and items into three clusters to extract local information. For the Ml-1M dataset, we have five user clusters and five item clusters. Through the different combinations of two MF algorithms, we can get five strategies. From the comparison of the results in the table, we can find that the recommendation strategy of integrating LUGI and GULI is the best. This is because this strategy MF approach captures global and local information for users and items. Table 5 compares the performance of our algorithm 3WGL with the traditional three-way recommendation based on random forests. For the Ml-100k and Ml-1M datasets, our algorithm reduces the average cost and improves performance by 19.65 and 16.63%, respectively. The random forests approach uses users, items and rating information to build multiple subtrees. The DouBan dataset does not contain users and items information; so, it is not possible to build random forests for recommendation. In comparison, our approach uses only rating information to build the MF algorithms. It reduces the recommendation error caused by unreliable sources of users and items information. Table 6 contrasts the performance between 3WGL and kNN approaches. By comparing the results on three datasets, we can see that our algorithm performance is greatly improved compared to kNN.    We construct a new three-way recommendation approach integrating global and local information. The recommendation strategy of integrating LUGI and GULI is better than other strategies. Our approach outperforms the existing three-way recommender systems.