Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

E-commerce develops rapidly. Learning and taking good advantage of the myriad reviews from online customers has become crucial to the success in this game, which calls for increasingly more accuracy in sentiment classification of these reviews. Therefore the finer-grained review rating prediction is preferred over the rough binary sentiment classification. There are mainly two types of method in current review rating prediction. One includes methods based on review text content which focus almost exclusively on textual content and seldom relate to those reviewers and items remarked in other relevant reviews. The other one contains methods based on collaborative filtering which extract information from previous records in the reviewer-item rating matrix, however, ignoring review textual content. Here we proposed a framework for review rating prediction which shows the effective combination of the two. Then we further proposed three specific methods under this framework. Experiments on two movie review datasets demonstrate that our review rating prediction framework has better performance than those previous methods.


Introduction
Web 2.0 and e-commerce give rise to the explosion of online reviews. In turn, intelligently learning the sentiment propensity and opinions of these reviews is exactly the key to success in current wave of e-commerce. Binary classification, or positive-negative classification, of these reviews has been quite common but it gradually fails to meet the requirement of accuracy [1]. For instance, which item will be selected out of several items that all belong to the positive category is, therefore, hard to predict. However, even the nuances between review rates can lead to great difference in their volume of sales. The review rating prediction research of [2] shows that consumers are often willing to pay 20% to even 99% more money to buy a product of 5-star rate than that of 4-star rate.
There are mainly two types of methods of current review rating prediction (RRP). The first one based on review text content adopts the perspective of natural language process. Researchers transform review text into feature vectors and then employ a multiclass classifier or a regression model to predict review rates [3][4][5][6][7]. It simply ignores the relationship between the costumers and the items. The second one based on collaborative filtering (CF) focuses on the standpoints of recommenders [8,9]. Researchers employ the -nearest neighbor methods [10,11] or matrix factorization methods [12][13][14][15][16][17] to extract information from previous reviewer-item rating matrix for review rating prediction. This type of method exploits no information from review text.
In order to include more information to accomplish finergrained review rating prediction, we proposed a framework combining review text content with previous reviewer-item rating matrix. After that, we contrived three specific methods under this framework. Then we did some experiments on two movie review datasets to examine the efficiency of our framework and three methods. And the result shows that our methods under this framework by and large refine the performance of RRP and generate desirable results.
The outline of the paper is as follows. Section 2 introduces those related researches on RRP. RRP based on review text content and RRP based on CF are described, respectively, in Section 3. Section 4 presents our framework and the three methods under it. Experimental results on two movie review datasets are reported in Section 5. Finally, Section 6 2 Computational Intelligence and Neuroscience concludes the paper and points out our future research direction.
At present, there are mainly two ways of fulfilling finergrained RRP. One includes methods based on review text content (MBRTC), which mines information from review text content by discerning and quantifying a variety of text features and then employs the regression model to predict review rating [7,22]. For example, Qu et al. [7] consider RRP as a feature engineering problem, and extracted various features, such as words, patterns, syntactic structures, and semantic topics, from the review text to improve the performance. Wang et al. [22] proposed a type of methods based on the content of review and weighting strong social relation of reviewers to predict review rating. To be specific, they predict review rating by incorporating the character of reviewer's social relations, as regularization constraints, into content-based methods. The main problem of MBRTC is that it mainly uses the review text content information and does not refer to reviewer-items rating matrix information.
There are also some researches based on review text content taking into account characteristics of the items or the reviewers [5,23]. Wang et al. [23] noticed that the score that a piece of review relates to cannot be fully determined by the review content itself, since review content is not an absolute metric of sentiment propensity. A tough reviewer may use tough words for all items, even items that he rates high. Different items have to meet different basic requirements. Simply analyzing the review content is not enough. Li et al. [5] proposed a method of incorporating reviewer and item information into review text content. They consider the personal characteristics of the reviewers when mining reviews content and use tensor factorization techniques to learn parameter of regression model and predict reviews rating. This method only considers the effect of reviewer and item on review text content and then uses review text content to predict review rating, which is indeed a method based on review text content.
The other one contains methods based on CF (MBCF). Those methods can be further divided into two categories. The first one dictates people to do similarity calculation to find the -nearest neighbor reviewers or items to do prediction [11,12]. The second one requires people to use the latent factor model to fulfill the matrix factorization. Several low-dimensional matrix factorization techniques are presented in [24][25][26]. Koren et al. [12][13][14] proposed several enhanced matrix factorization methods which can generate promising results by applying heterogeneous information to object functions. Koren [12] built a combined model by merging the matrix factorization and neighborhood models and improved accuracy of recommendation by extending the models to exploit both explicit and implicit feedback by the users. Koren [14] proposed a method to model the time changing behavior throughout the life span of the data and improved the performance of recommendation. In [27], researchers extended the matrix factorization objective function with the social network information of reviewers. In [28], Shi et al. proposed a context-aware movie recommendation algorithm based on joint matrix factorization (JMF). They jointly factorize the user-item matrix containing general movie ratings and other contextual movie similarity matrixes to integrate contextual information into the recommendation process.
Up to now, some researches combining ratings and text reviews have been applied to recommend system [29,30]. For example, Cremonesi et al. proposed a hotel recommender algorithm (Interleave), which provides recommendations based on the text reviews and ratings [29]. Levi et al. proposed a recommender system that combines reviews and ratings to recommend hotels [30]. But as far as we know, the methods based on combining ratings and text reviews have not been applied in review rating prediction. Different from the existing methods focusing on recommend system, we focused on review rating prediction and proposed a general framework and three special methods based on review text content and reviewer-item matrix. Different from the above methods, we propose a framework combining MBRTC and MBCF to include more information to improve the accuracy of prediction. We also present three specific methods under this framework. Finally, the experiment results verify effectiveness of our proposed framework and methods.

RRP Based on Review Text Content.
Review text content is a very important information source for RRP. Current review text content-based RRP methods mainly use vector space model (VSM) to express review text content and then use linear regression model to predict review rating. To be specific, there are four steps to take. Firstly, online review text content, which includes segmentations of terms, partof-speech tagging, and frequency statistics, should be preprocessed. Secondly, regarding words, phrases, and -gram as features, people employ some feature selection methods to choose features that can perfectly express the review text content to compose the feature set = { 1 , 2 , . . . , }. Thirdly, each online review in = { 1 , 2 , . . . , } is expressed as an -dimensional vector which is exactly an instantiated value of . Fourthly, the linear regression model dealing with those vectors of reviews is adopted to predict review rating. The linear regression model is described in (1) To work out the parameter vector , given training datasets with 1 = { 1 , 2 , . . . , 1 } and 1 = {V 1 , V 2 , . . . , V 1 } available, least squares error loss is used to minimize the objective function: Here, ‖ ‖ 2 , the regularization term of parameter vector , is employed to avoid overfitting; is the regularization coefficient. To estimate the parameters , a simple stochastic gradient descent algorithm is adopted to solve the optimization problem. For each observed rating V ∈ 1 = {V 1 , V 2 , . . . , V 1 }, we refer to the following updating rules to learn the parameters : Here = V − , and are the learning rates. After getting , given 2 = { 1 , 1+1 , . . . , }, we can applyV = to predict the review rating of each review in 2.

RRP Based on Collaborative
Filtering. RRP now plays an essential role in recommend system. At present, RRP for recommend system mainly based on collaborative filtering involves two methods. One uses the -nearest neighbor to predict and estimate the current object. The other one uses matrix factorization.

RRP Based on the -Nearest Neighbor
Model. RRP based on the -nearest neighbor model includes the reviewer-based method and the item-based method. With the reviewer-item rating matrix available, a typical reviewer-based approach is to predict a reviewer's rating on a target item by aggregating the previous ratings on it from -nearest reviewers. We can consequently formulate the predicted rating on item from reviewer as follows:V Here, is the set of nearest neighbor of reviewer and represents the similarity between reviewer and reviewer ; V is review rating from reviewer on item .
To get the parameter , given training datasets with 1 = { 1 , 2 , . . . , 1 } and 1 = {V 1 , V 2 , . . . , V 1 } available, we have to solve the optimization problem, that is, to minimize the square error loss function below: Here ‖ ‖ 2 is a regularization term of parameter aimed at avoiding overfitting and is regularization coefficient. Then, a simple stochastic gradient descent algorithm is adopted to solve the optimization problem. For each observed rating V ∈ 1 = {V 1 , V 2 , . . . , V 1 }, we refer to the following updating rules to learn the parameters : Here, = V − ∑ V and are the learning rates. After getting , we can applyV = ∑ V to predict the review rating.

RRP Based on Matrix Factorization.
Matrix factorization (MF) is one of the most popular methods in recommend system. The kernel of MF is to find a small number of latent features that might relate to the preferences of reviewer and use them to match observed ratings. A typical model associates each reviewer with a vector of reviewer factors and each item with a vector of item factors. The prediction is done through an inner product which is described bŷ In order to compute the two parameters and , we follow the least squares error loss principle to minimize the objective function: Here ‖ ‖ 2 and ‖ ‖ 2 are the regularization terms of parameters and serving to avoid overfitting; is the regularization coefficient. In order to estimate the parameters and , a simple gradient descent algorithm was successfully applied to solve the optimization problem. For each observed rating V ∈ 1 = {V 1 , V 2 , . . . , V 1 }, we used the following updating rules to acquire the parameters and : Here, = V − and are the learning rates. After getting and , we can applyV = to predict review rating.

Problem Description.
In order to illustrate the problem we study in this paper, a toy example about reviewers, items, review text content, and review rating is shown in Table 1. From the toy example, we can get three types of information: the user-item rating matrix, review text content with corresponding rating, and review text content without corresponding rating. The problem we study is how to effectively predict missing rating of each review in useritem rating matrix. In this section, we propose a new RRP framework combining reviewer-item rating matrix (RIRM) with review text content (RTC). That is, we want to find a function : (RIRM, RTC) → (rating).

General Framework.
There are mainly two types of methods in existing RRP. One includes the methods based on review text content, which can be described as a function 1 : (RTC) → (rating). The other one contains the methods based on collaborative filtering, which can be described as a function 2 : (RIRM) → (rating).
The above two types of methods use either review text content or reviewer-item rating matrix, not having made full use of all the information available. Therefore, we proposed a RRP framework combining review text content and revieweritem rating matrix. The framework is described in According to the general framework, we proposed three specific special RRP methods. In order to improve the performance of the three special RRP methods, we contrived a way to compute the parameters in the three special RRP methods.

RRP Combining Linear Regression Model and -Nearest
Neighbor. We choose the linear regression model as 1(RTC) and -nearest neighbor to form 2(RIRM). The special methods are described in In order to get optimum parameters , , , given training datasets with reviewers = { 1 , 2 , . . . , } who have written 1 review ratings 1 = {V 1 , V 2 , . . . , V 1 } which are corresponding to 1 reviews 1 = { 1 , 2 , . . . , 1 }, we follow the least squares error loss principle to minimize the objective function: Here and are regularization terms of those parameters aimed at avoiding overfitting; is a regularization coefficient. In order to estimate the parameters , , and , we firstly traverse from 0 to 1. Secondly, for each fixed in training dataset, we then adopt a simple stochastic gradient descent algorithm to solve the optimization problem. For each observed rating V ∈ 1 = {V 1 , V 2 , . . . , V 1 }, we refer to the following updating rules: Here and are the learning rates. After getting , , and , given 2 = { 1 , 1+1 , . . . , }, we can applyV = (1 − ) + ∑ V to predict review rating of each review in 2.

RRP Combining Linear Regression Model and Matrix
Factorization. Here, we choose linear regression model as 1(RTC) and Matrix Factorization as 2(RIRM). The special methods are described in To acquire optimum parameters , , , and , given training datasets, we minimize the objective function according to the least square error loss principle: Here , , and are regularization terms of those parameters serving to avoid overfitting; is a regularization coefficient. In order to estimate the parameters , , , and , we firstly traverse from 0 to 1. Secondly, for each fixed in training dataset, we adopt a simple stochastic gradient descent algorithm to solve the optimization problem. For each observed rating V ∈ 1 = {V 1 , V 2 , . . . , V 1 }, we refer to the following updating rules: Computational Intelligence and Neuroscience 5 Here = V − (1 − ) − , and are the learning rates. After learning , , , and , for 2 = { 1 , 1+1 , . . . , }, we can applyV = (1 − ) + to predict the review ratings of it.

RRP by Combining Linear Regression Model, the -Nearest Neighbor, and Matrix Factorization.
In RRP framework, we choose linear regression model as 1(RTC) andnearest neighbor and Matrix Factorization as 2(RIRM). The special methods are described in In order to get optimum parameters , , , , , and , given training datasets available, we follow the least square error loss principle to minimize the objective function: Here , , , and are regularization terms of those parameters serving to avoid overfitting; is a regularization coefficient. In order to estimate the parameters , , , , , and , we firstly traverse from 0 to 1. Secondly, for each fixed in training datasets, we traverse from 0 to 1. Thirdly, for each fixed and in training datasets, we adopt a simple stochastic gradient descent algorithm to solve the optimization problem. For each observed rating V ∈ 1 = {V 1 , V 2 , . . . , V 1 }, we refer to the following updating rules: Here and are the learning rates. After learning , , , , , and , for 2 = { 1 , 1+1 , . . . , }, we can applyV = (1 − − ) + ∑ V + to predict the review ratings of it.

Datasets and Experimental Setup.
In order to verify the performance of our proposed framework and methods, we performed several experiments on two datasets from the popular review site http://www.douban.com/. This website is a reviewer-opinion website where reviewers can read and write reviews on movies, music, and books and mark a rating from 1 star to 5 stars. We downloaded information of both movies and reviewers, movie reviews, and their ratings through the API of http://movie.douban.com/ community. The rough description of the two datasets is shown in Table 2.
In Table 2, reviewer-item rating matrix density (RIRMD) is calculated by . (20) To evaluate the overall performance of our framework and methods, we divide datasets into 10 parts at random. We do the experiment with taking 80% reviews for training and the remaining 20% reviews for test. We compare our framework and methods with methods based either on review text content or on reviewer-item rating matrix through experiments on two different datasets. The six different methods are abbreviated as follows: There are mainly two factors influencing RRP in our framework. One is review text content information. The second is review-item matrix information. We did four experiments on two datasets to answer the following four questions: (1) How to set parameters and , and what are the effects of different parameters and on MAE and RMSE of RRP?
(2) Can our framework and methods decrease MAE and RMSE of RRP?
(3) Is the algorithm complexity of our methods higher than the three single methods?
whereV is the predicting rating by all kinds of methods, V is the rating we have got from test datasets, and total is the number of reviews in test datasets.

Setting Parameters and in Our
Frameworks. In our framework, there are two parameters that have to be set, namely, and . We perform 10-fold cross validation in training datasets to get the optimum value of parameters and . Figure 1 shows how MAE and RMSE of MBLR + MBKKN change with parameter in two different training datasets. Figure 2 shows how MAE and RMSE of MBLR + MBMF change with parameter in two different training datasets. Figure 3 shows how MAE and RMSE of MBLR + MBKNN + MBMF change with parameter in two different training datasets.
From Figures 1, 2, and 3, we can see that MAE and RMSE of RRP in our methods change with different parameters and , which proved that parameters and play very important roles in our methods. So, we need to find optimum parameters and in our methods to improve the performance of RRP.
In Figure 1, when parameter = 0.01, MAE and RMSE of MBLR + MBKNN are the lowest. The reason is that reviewer-items rating matrix of dataset 1 and dataset 2 is very sparse. It is well known that -nearest neighbor method based on CF has poor performance on very sparse datasets. At the same time, linear regression model based on review text content has better performance than -nearest neighbor method based on CF in dataset 1 and dataset 2.
In Figure 2, when parameter is set between 0.5 and 0.6, MAE and RMSE of MBLR + MBMF are the lowest. The reason is that MF method based on CF has slightly better performance than linear regression model based on review text content.
In Figure 3, we fix = 0.01. When parameter is chosen between 0.5 and 0.6, MAE and RMSE of MBLR + MBKNN + MBMF are the lowest. The reason is that both MF method based on CF and linear regression model based on review text content have better performance than -nearest neighbor method based on CF when reviewer-items rating matrix of datasets is very sparse. At the same time, MF method based on CF has slightly better performance than linear regression model based on review text content.
Therefore, we set those parameters of our three methods according to the above results. The special parameters of the three different methods are shown in Table 3. The MAE and RMSE of the three single methods are changed in dataset 1 and dataset 2, but the relative performance of the three single methods is basically unchanged in dataset 1 and dataset 2. For example, the gap between RMSE of the MBKNN and MBLR in dataset 1 and dataset 2 is basically identical (0.353). The gap between RMSE of the MBLR and MBMF in dataset 1 and dataset 2 is basically identical (0.035). So when we combining the three different single methods, we can preliminary choose parameter according to the relative performance of the three different single methods. The gap between RMSE of MBLR and MBMF is 10 times bigger than the gap between RMSE of MBKNN and MBLR.
From Figures 1, 2, and 3, we can see that the performance of our proposed methods is continuously better than single methods when parameter is chosen according to the relative performance of the three different single methods. At the same time, when parameter changed in range of the relative performance of the three different single methods, our proposed methods are continuously better than single methods. For example, we can see that parameter beta changes between 0.1 and 0.9; our methods obtain always better performance than MBKNN, MBLR, and MBMF in Figures 2 and 3. So, in real world applications, our methods, which are obtained by training with one dataset, can be applied to many different ones.

Effect of Different Methods to MAE and RMSE of RRP.
In order to verify the performance of our proposed framework and methods, we compare our methods with three baseline methods in two different datasets. The experience results of those six methods on two datasets are presented in Table 4.
In both two datasets, MAE and RMSE of MBLR + MBKNN are lower than that of MBLR and MBKNN; MAE and RMSE of MBLR + MBMF are lower than that of MBLR and MBMF; MAE and RMSE of MBLR + MBKNN + MBMF are lower than that of MBLR, MBKNN, and MBMF. Experimental results prove that combining text content information and review-item matrix information can enhance the performance of RRP. This is because both of the two different information sources are not redundant and can therefore play their own role. When we combine the two types of information: text content information and reviewitem matrix information, performance of RRP is improved in a certain extent. For

Analyzing Complexity of Different
Methods. Firstly, we compute the algorithm complexity of three simple approaches in Section 3. When computing the parameter of the objective function in the three methods, a simple stochastic gradient descent algorithm is adopted. According to the complexity of the stochastic gradient descent algorithm, we can get the algorithm complexity of the three simple approaches in Section 3. The algorithm complexity of those three methods on two datasets is presented in Table 4. Then we compute the algorithm complexity of three special RRP approaches in Section 4. Similar to Section 3, when computing the parameter of the three special RRP  methods, we also adopt a simple stochastic gradient descent algorithm to solve the optimization problem of minimizing the objective function. According to the complexity of the stochastic gradient descent algorithm and the computing course of the three special RRP methods, we can get the algorithm complexity of the three specific special RRP methods. The algorithm complexity of those three methods on two datasets is presented in Table 4. From Table 4, we can see that the algorithm complexity of our proposed three methods in Section 4 is one order of magnitude with the algorithm complexity of the three simple methods in Section 3. When our methods obtain better results than the individual methods, the cost of our method is acceptable.

The Relations between Performance of RRP and RIRMD.
In order to evaluate the effect that the reviewer-item rating matrix density may have on review rating prediction, we experimented on two movie review datasets which are different in RIRMD. The experimental results are shown in Figures  4 and 5.
RIRMD of dataset 2 is denser than dataset 1. From Figures  4 and 5, we can see that MAE and RMSE values of dataset 2 are always lower than that of dataset 1 in our methods, which suggests that higher RIRMD always brings about lower MAE and RMSE. This is because higher RIRMD means more sufficient reviewer-item matrix information. When we combine more sufficient reviewer-item matrix information with review text content, we can get more accurate result statistically.

Conclusion
In this paper, we studied the previous methods of RRP and proposed a RRP framework combining review text content and reviewer-item rating matrix to make full use of all information sources to improve the performance of prediction. Based on RRP framework, we further contrived three specific RRP methods. Our methods have significantly enhanced the performance of RRP, compared to methods based solely on review text content or collaborative filtering. In the future, we will further experiment on frameworks   combining review text content and reviewer-item matrix while employing the probability graph models.