Learning to Make Document Context-Aware Recommendation with Joint Convolutional Matrix Factorization

,


Introduction
With the explosive growth of the number of users and items in e-commerce services, the user-to-item rating data often suffer from the data sparsity issue; that is, users can only interact with a small number of items, and items can only be visible to limited users. To overcome this challenge, several CR-based methods [1][2][3][4][5][6][7][8] have been developed to consider not only the rating matrix but also the context information (such as demography of users, social networks, and item description documents). For example, Wang and Blei [9] adopted topic modeling (LDA) techniques to discover latent aspects from reviews, and based on this, they proposed collaborative topic regression (CTR) to improve traditional collaborative filtering (CF). Similarly, in [5,10], LDA was also used to discover potential aspects from the reviews of items or users. Bao et al. [2] employed a topic modeling technique to model the latent topics in review texts and utilized matrix factorization (MF) for rating prediction. Chen et al. [6] suggested adopting a novel context-aware hierarchical Bayesian method to take rating context and social relationships into consideration for prediction. ese methods have achieved significant improvement compared with the methods that only utilize user-item ratings.
However, as these existing document-based modeling methods mainly adopt the bag-of-words model, they cannot fully capture the contextual information of documents, which will lead to suboptimal results. To address this issue, Kim et al. [11] resorted to the CNN model, which has achieved great success in computational graphs and natural language processing, to make a deeper understanding of the documents. Specifically, they integrated CNN into probability matrix factorization (PMF) and proposed a document context-aware recommendation method ConvMF, which not only can capture the contextual information of item description documents but also can improve the rating prediction accuracy. However, ConvMF only models the document context from an item view and ignores the following observations.
Firstly, in real-world scenario, items are not i.i.d, and the item relationships are important factors to impact the user's decision. For example, when buying products on an e-commerce website, a user may tend to buy the products related to what he/she has bought in the recent past. To incorporate item relationships into the recommendation method, many previous studies have focused on learning similarities between items through the historical user-item interactions [12][13][14][15][16]. For example, Zhang et al. [17,18] explored the temporal relationship between items from the user's historical check-in data when recommending locations, and the experimental results demonstrated that incorporating item relationship into the recommendation algorithm can improve the recommendation accuracy. But these existing methods are not developed for content context-aware recommendations and cannot model the item relationship and document contextual information simultaneously.
Secondly, as we are social animals, we often turn to our friends for recommendations. e social relationship among users is another key factor that can help users make the right choice. For example, when we want to go to a restaurant or go to a cinema or hesitating between two T-shirts, we often ask friends for advice. Based on the social homogeneous theory [19] that friends often have similar preferences, Ma et al. [20] proposed a PMF-based Social Recommendation (SoRec) method, which not only compensates the data sparseness through social relationships but also improves the recommendation performance significantly. Besides local friends, users also tend to seek advice from users with high global reputations. Motivated by this, Tang et al. [21] developed a novel framework LOCABAL by taking advantage of both local and global social context for recommendation. As a user may trust different subsets of friends regarding different domains, Yang et al. [22] tried to develop a circlebased recommender system, which focuses on inferring category-specific social trust circles from available rating data combined with social network data. However, the above social recommendation methods mainly focus on modeling user's social influence in the traditional user-to-item rating matrix, and user's content information is ignored.
irdly, as users often express their interests by directly writing down their opinions in the posted reviews, user's reviews provide us a meaningful way to infer their preferences to items. Recently, several recommendation methods have been proposed to leverage content information to enhance the rating prediction task. For example, Tan et al. [5] exploited textual review information, as well as ratings, to model user preferences and item features in a shared topic space and subsequently introduced them into a MF model for recommendation. To take into account the document contextual information, Kim et al. [11] introduced CNN into the PMF framework to model the document content and rating matrix simultaneously. Shen et al. [23] proposed an automatic learning resources recommendation algorithm based on CNN, where CNN is used to predict the latent factors from the text information. In [24], Zhang et al. developed a new hybrid model to jointly model content information implicit user feedback to make accurate recommendations. However, these existing methods pay more attention to the content of items rather than users. Modeling the content information of both users and items in a unified framework is still largely unexplored.

Motivating Example.
In context-aware recommendation scenario, there are four kinds of information that can be utilized, that is, the reviews of both users and items and the relationships of both users and items. Suppose a user wants to buy a T-shirt on the e-commerce website, as shown in Figure 1, he/she will first ask his/her friends for recommendations, and the friends with highest expertise levels in recommending clothes (they usually have many meaningful comments to the related products) will impact him/her the most. en, the user will further filter out the items with low ratings and negative reviews. Moreover, if a T-shirt is related to the pants (e.g., they have the same band or style with the T-shirt) that he/she has bought recently, it will be recommended with high probability.
Motivated by the above observations, in this work, we focus on the CR problem and propose a JCMF method to consider the reviews of both users and items and the relationships of both users and items simultaneously. More specifically, to jointly model the document contextual information and the user-item rating matrix, we introduce ConvMF as our basic recommendation framework, which integrates CNN into PMF to enhance the rating prediction accuracy. To exploit the relationships among items, we propose an item relation-aware method CMF-I by sharing the item latent factor between item relation network and user-item matrix. To model the user's social influence, similar to CMF-I, we incorporate the user's social relationships into CMF-I by sharing the user latent factor between the social network and user-item matrix. To consider user's reviews, we exploit another CNN to model 2 Complexity the document contextual information of users and finally reach the JCMF model. e main contributions of this study are summarized as follows: (i) We systemically address CR in online systems and propose a joint convolutional matrix factorization (JCMF) to consider the reviews of both users and items and the relationships of both users and items simultaneously (ii) We propose an item relation-aware recommendation method CMF-I to exploit the relationships among items (iii) We incorporate the user's social relationships into CMF-I by bridging the social network and user-item matrix space with a shared user latent factor (iv) We incorporate another CNN network to model the document contextual information of users (v) We conduct extensive experiments on the realworld dataset Yelp to demonstrate the effectiveness of our proposed method e remainder of this work is organized as follows. Section 2 describes the research problem and our proposed solution. Sections 3 and 4 present the experimental settings and results, respectively. Section 5 conducts model analysis to validate the effectiveness of different model components. Section 6 provides related works, and Section 7 concludes this work.

Recommendation Framework
In this section, we first formulate the recommendation problem and then describe our joint convolutional matrix factorization recommendation framework.

Task definition.
In this work, we use bold capital letters (e.g., X) to represent matrices and graphs and use squiggle capital letters (e.g., X) to denote sets. We employ bold capital letters with subscripts (e.g., X i ) to denote vectors and use normal lowercase or capital letters (e.g., x or X) to denote scalars. All vectors are in column forms if not clarified. e notations used in this work are summarized in Table 1.
In CR scenario, there are four kinds of information that can be utilized, that is, user-item rating matrix, users' social network, the relationship among items, and the document reviews of both users and items. Let U � u 1 , u 2 , . . . , u N be the set of users and V � v 1 , v 2 , . . . , v M be the set of items, where N and M are the numbers of users and items, respectively. We use R � (r ij ) N×M to denote the user-item rating matrix with each entry r ij indicating the rating of user u i on item v j . We define C � (c ij ) M×M to represent the itemitem relationship matrix with each entry c ij � 1 denoting that there exists a direct relationship between item v i and v j , and c ij � 0 otherwise. Similarly, we use S � (s ij ) N×N to denote the adjacency matrix of user's social network, where s ij � 1 means the user u i has direct social connection with u j , and s ij � 0 otherwise. Suppose Y and X be the set of review texts from users and items, respectively. en, the task of CR is defined as how to make accurate rating predictions in R for U and V by exploring the relation matrices C, S and review texts X and Y.
Input. User set U, item set V, user-item rating matrix R, the adjacency matrix of user's social network S, the item's relation matrix C, the user's review text Y, and the item's review text X Output. A personalized score function that maps an item to a rating score f: V ⟶ R

e ConvMF Method.
To capture the item's document context, we introduce ConvMF [11] as our basic recommendation framework, which integrates convolutional neural network (CNN) into probability matrix factorization (PMF) to capture the contextual information of documents and further enhance the rating prediction accuracy. e graphical model of ConvMF is shown in Figure 2.
Suppose the observed rating matrix is denoted by R � (r ij ) N×M , where N and M are numbers of users and items, respectively. Let U ∈ R l×N and V ∈ R l×M be the latent user and item factor matrices, with the column vectors U i and V j denoting user-specific and item-specific latent feature factors, respectively. en, the conditional distribution over the observed ratings can be defined as where N(x | μ, σ 2 ) is the probability density function of the Gaussian normal distribution with mean μ and variance σ 2 . I ij is the indicator function that is equal to 1 if user i rated item j and 0 otherwise. We place zero-mean spherical Gaussian prior on the user feature vector: For the item latent factor V, ConvMF assumes it is generated from three variables: (1) internal weights W 1 in CNN, (2) the document X j of item j, and (3) the Gaussian Complexity noise (denoted by ϵ) that is used to optimize the item's latent factor for the rating. en, we can obtain the item latent factor by the following equations: Similar to user latent factor U, we also place zero-mean spherical Gaussian prior on each weight w 1 k in W 1 : us, we can arrive at the conditional distribution over item latent factor V: where cnn(W 1 , X j ) is the item latent factor learned from the CNN model that is used as the mean of the Gaussian distribution (denoted in equation (4)). It plays a role to bridge CNN and PMF and can capture the document content features and the latent factors in the rating matrix simultaneously.
However, ConvMF assumes items are i.i.d and only models the document context from an item view. e relationships of both items and users and the document context of users are not considered, which might be beneficial for the CR task.

Recommendation with Item Relationships.
ConvMF ignores the relationship between items, which is an important factor in many recommendation scenarios. For example, in product recommendation scenario, when a user bought a fishing rod in the past, he/she will probably buy the fishing bait later. Items that are bought within a short time interval have a strong correlation with each other. With this intuition in mind, we construct the item relation network by the following data policy; that is, if two items are rated by the same user within a short time window (in experiments, we set ΔT as 24 hours) (denoted by ΔT), we assume these two items are correlated. For the items that are rated beyond the time window or not rated by the same user, we cannot infer any correlations. e definition of the item correlation network is given as follows.
Let C � (c ij ) M×M denote the adjacency matrix of item correlation network G V , with each entry c ij denoting whether item v i correlates with v j , that is c ij � 1 if v i directly connects to v j and 0 otherwise. We factorize C to learn a lowdimensional representation of items' correlation relationship. We use K ∈ R l×M and V ∈ R l×M to represent the latent item relation and item factor matrices, with column vectors K i and V j denoting relation-specific and item-specific latent vectors, respectively. en, conditional distribution over the observed item correlation network can be defined as where I C ij is the indicator function. For the relation latent factor K, we place zero-mean spherical Gaussian prior on it: As we have shown in Figure 3, to integrate the item relation network into ConvMF, we share the item latent factor V between item relationships and rating matrix. en, the item latent factor will not only be affected by the ratings and contents but also their correlations. After combining item relationship matrix, the joint probability distribution of our convolutional matrix factorization with item relations (CMF-I) method can be written as   Complexity

Recommendation with Social
Influence. CMF-I only considers the influence of relationships from an item view and does not explore the social influence between users [20]. But in real-world, as we often turn to our friends for recommendations, the social relationships are also the key factors to affect our decisions. To utilize the social influence to improve the recommendation accuracy, we follow the method of Wang et al. [25] and propose a social-enhanced CMF-I method ConvolutionalMatrixFactorizationwithSocialandItemRelations (CMF-SI), which exploits a shared user latent factor to bridge the latent space between user's social network and rating matrix. Figure 4 shows the graphical model of CMF-SI. Let S � (s ij ) N×N be the adjacency matrix of social network G S , with each entry s ij denoting whether user u i has social connection with user u j ; that is, 1 if u i directly connects to u j and 0 otherwise. As in CMF-I, we adopt a similar matrix factorization technique to factorize S to learn low-dimensional latent representation of user's social relationship. Let U ∈ R l×N and Z ∈ R l×N be the latent user and social feature factor matrices, with columns U i and Z j representing user-specific and socialspecific latent factors, respectively. en, the conditional distribution over the observed social relation matrix S can be defined as where I S ij is the indicator function. Similar to the item relation factor K, a zero-mean spherical Gaussian prior is also applied to the social latent factor Z: As we have shown in Figure 4, we incorporate the social relation into CMF-I by sharing the user latent factor U between user's social network and the user-item rating matrix; that is, the user latent factor U is not only affected by social network S but also the rating matrix R.
e joint probability distribution of CMF-SI can be written as 2.5. Recommendation with User Reviews. In CMF-SI, the relationships of both users and items and item review texts are modeled jointly in the ConvMF framework. However, it only considers the document contextual information of items, and the user's document context is not investigated, which is independent of the item's context information and should be treated as different recommendation factors. To model user's review texts, we follow the work in [11,25] and further incorporate another CNN module into CMF-SI, which arrives at our final recommendation method joint convolutional matrix factorization (JCMF). e graphical model of JCMF is shown in Figure 5, from which we can observe that a user latent factor is generated from three variables (similar to the item latent factor): (1) internal weights W 2 in CNN, (2) the review information (denoted by Y i ) of user i, and (3) the Gaussian noise variable ϵ. us, the user latent factor U i can be obtained by the following equations: For the internal weight W 2 in CNN, we also place zeromean spherical Gaussian prior on it: Figure 3: e graphical model of CMF-I: item-relation part is in right top (dashed blue).

Complexity 5
Accordingly, the conditional distribution of user latent factor can be modified as where cnn(W 2 , Y i ) is the user latent factor learned from CNN module, which try to capture the user latent features from user's reviews and user-item rating matrix simultaneously. en, the joint probability distribution of JCMF can be written as

Model Optimization.
To optimize the variables U, V, K, Z, W 1 , and W 2 in equation (15), we use maximum a posterior (MAP) estimation as the learning method: By taking negative logarithm on equation (16), the loss function of JCMF is formulated as where are the regularization parameters. ‖·‖ 2 F is the Frobenius norm.
Similar to ConvMF, we optimize the above loss function by adopting the coordinate descent; that is, we iteratively optimize a latent variable while fixing the remaining variables. For example, when updating U, we temporarily assume V, K, Z, W 1 , and W 2 are constant. en, the loss function L (denoted by equation (17)) becomes a quadratic function w.r.t U. e analytical optimal solution of U can be computed in a closed form. e analytical solutions of U i , V j , K i , and Z j are shown as follows: where I i , I S i , and I C j are diagonal matrices and R i , S i , and C j are column vectors of matrices R, S, and C, respectively.
However, because W 1 and W 2 are related to the CNN architecture such as convolution layer and max-pooling layer, they cannot be updated by analytical solutions as we do for U, V, K, and Z. Nonetheless, when U, V, K, and Z are temporarily constant, we find that L can be interpreted as a squared error function with L 2 regularization terms. e loss function of W 1 and W 2 is reached as follows: To optimize W 1 and W 2 , we use back propagation on ε(W 1 ), ε(W 2 ). e learning algorithm of JCMF is shown in Algorithm 1.

Experimental Setup
is section first introduces the research questions that we answered in experiments and then describes the dataset and baseline methods that we utilized in this work.

Research Questions.
We conduct extensive experiments on the real-world data Yelp to answer the following research questions: RQ1: how does our proposed method JCMF perform compared with other state-of-the-art recommendation algorithms? RQ2: is the item relation network useful for improving the recommendation performance? RQ3: is the user social network useful for improving the recommendation results? RQ4: how does the document contextual information of users contribute to the performance of CR recommendation? RQ5: can our model JCMF converge quickly compared to the ConvMF method?

Dataset.
In experiments, we use the real-world dataset Yelp as our data source (https://www.yelp.com/dataset). Yelp is an online review website that allows users to express their opinions by posting reviews to items (such as restaurants and businesses). Each user can also create social connections with the users that have similar interests with him/her. e dataset we used consists of users, reviews, and businesses across 11 metropolitan areas in 4 countries. We use this dataset for business recommendation. As the original dataset is too large, we only select the businesses that located in North Carolina (NC), Wisconsin (WI), and Ohio (OH) to conduct experiments. To reduce the data sparsity, we only keep users that have more than 3 ratings and businesses that at least have one review. e resulting NC dataset consists of 22,737 users who have rated and reviewed a total of 12,502 different items. e WI dataset consists of 8,386 users who have rated and reviewed a total of 4,593 different items. e OH dataset consists of 29,918 users who have rated and reviewed a total of 12,361 different items. e statistics of these three subdatasets is shown in Table 2.

Evaluation Protocols.
In experiments, we randomly select 80% of the user-item ratings as the training set (D train ), 10% of the data as the validation set (D valid ), and use the remaining for testing (D test ). e optimal hyperparameters are determined on the validation set.
We employ two popular metrics [26] root mean squared error (RMSE) and mean average error (MAE) as the evaluation methods, where MAE measures the average magnitude of the errors in a set of predictions while RMSE tends to disproportionately penalize large errors. is also means RMSE is more prone to be affected by outliers or bad predictions. e formal definition of RMSE is written as follows: where r ij is the observed rating value from user u i to item v j , r ij is the corresponding prediction, and |D test | is the number of the samples in the test set. e formal definition of MAE is written as

Baseline Methods.
To evaluate the effectiveness of our proposed JCMF method, we compare it with the following methods.
(i) PMF [27]: this is a matrix factorization-based recommendation method, which only utilizes the user-item rating matrix to learn user's latent feature factors. (ii) SoRec [20]: this is a social trust-aware recommendation method that factorizes the user-item rating matrix and user's social trust network jointly. (iii) RSTE [28]: learning to recommend with social trust ensemble is a classical algorithm which predicts user's ratings by combining their own tastes and their trusted friends' favors.
Complexity 7 (iv) SocialMF [29]: this is a state-of-the-art social recommendation method that exploits social influence to enhance the recommendation process. (v) TrustMF [30]: this is another state-of-the-art social recommendation method, which integrates the user's trust network into the matrix factorization. (vi) TrustSVD [31]: this method is also a typical social trust-based recommendation method which considers both explicit and implicit influence of user trust. (vii) SocialFD [32]: the method combines the matrix factorization model and distance metric learning. e positions of users and items are jointly determined by ratings and social relations.
(viii) DeepCoNN [33]: this is a state-of-the-art contextaware recommendation method that utilizes two parallel CNNs to process the reviews and integrates the user feature and item feature into the top-level factorization machine to predict ratings. (ix) ConvMF [11]: this is another state-of-the-art context-aware recommendation method that integrates a CNN into PMF to learn a deeper understanding of the item's document contextual information.
To evaluate the importance of the item relationship, social network, and user's reviews, we further compare JCMF with the following methods: (i) CMF-I: this is a variant version of JCMF, which only considers the correlation network of items to make recommendations. is is to demonstrate the importance of the item relationships. (ii) CMF-SI: this is another variant version of JCMF that jointly considers the item correlation network and user social network. Compared with JCMF, it does not model the user's review information.
Input: e hyperparameters λ U , λ V , λ K , λ Z , λ W 1 , and λ W 2 ; user-item rating matrix R; item relation matrix C; user social matrix S; user review text Y; item review text X; threshold Output: Latent vectors U, V, K, and Z; internal weights of CNNs W 1 and W 2 ; for each item j do Preprocess and represent j's review content as a word embedding based input: X j ; end for for each user i do Preprocess and represent i's review content as a word embedding based input: Y i ; end for Initialize U, V, K, Z, W 1 , and W 2 ; while (|Loss − PrevLoss | /PrevLoss) < threshold do for each user i do Get cnn(W 2 , Y i ) from user CNN model; Update U i according to equation (18); end for for each item j do Get cnn(W 1 , X j ) from item CNN model; Update V j according to equation (19); end for for each item i in C do Update K i according to equation (20); end for for each user j in S do Update Z j according to equation (21); end for Update W 1 according to equation (22); Update W 2 according to equation (23); Compute Loss according to equation (17); end while return U, V, K, Z, W 1 , and W 2 ; ALGORITHM 1: e learning algorithm of JCMF.  [3,11], we preprocess the reviews of both users and items by the following steps: (1) remove stop words, (2) calculate the tf-idf score for each word, (3) remove the words whose document frequency scores are higher than 0.5, (4) select 8,000 words with highest tf-idf scores as a vocabulary, and (5) remove all nonvocabulary words from the raw review documents.
For the CNN architectures of both users and items, (1) the maximum length of review documents is set to 300. (2) e word latent vectors are randomly initialized with dimension 200. (3) In the convolution layer, we use various window sizes (3, 4, and 5) for shared weights to consider various length of surrounding words and set the number of shared weights per window size as 100. (4) To prevent CNN from overfitting, we use dropout and set the dropout ratio to 0.2 for these three datasets. (5) To train the weights of CNN, we use minibatch-based RMSprop and set the minibatch size as 64. For all the hyperparameters, we tune them on the validation set and repeat each setting for 5 times and report the average results. e details of tuning the hyperparameters are shown in Section 5. After publishing, we will release our dataset and source code. Figure 6 reports the experimental results on NC, WI, and OH in terms of RMSE and MAE, from which we have the following observations: (1) Our JCMF method achieves the best performance on three datasets (the improvement between JCMF and other baseline methods is significant with p < 0.01). is result demonstrates the effectiveness of our JCMF solution, that is, jointly considering the relationships of users and items and the review documents from both users and items are helpful for making recommendations in the CR task. (2) Neural networkbased recommendations (i.e., DeepCoNN, ConvMF, CMF-I, CMF-SI, and JCMF) can perform better than traditional MF-based methods, showing the effectiveness of the neural networks in learning the user-item interactions, because of their ability of capturing the nonlinear and high-level latent features of users and items. (3) ConvMF achieves a better performance than traditional social recommendation methods (i.e., SoRec, SoicalMF, TurstMF, RSTE, and SocialFD).

Experimental Results (RQ1)
is indicates that the document contextual information is helpful for the CR task, and considering them jointly can make more accurate recommendations than utilizing the social network information only.

Model Analysis
is section investigates the importance of model components and hyperparameters.

Importance of Item Relations (RQ2).
To understand the importance of item relations in the CR task, we first compare CMF-I with ConvMF and then tune the hyperparameter λ C in JCMF to see how JCMF depends on the item relations.
e hyperparameter λ C in JCMF represents how much item relation information is utilized in JCMF. If λ C � 0, JCMF will only utilize the rating matrix, item's review document, and user's social network to make recommendations. If λ C � ∞, JCMF will only utilize the item correlation information to learn the items' latent factors.
e experimental results are shown in Figure 6 and Table 3, from which we have the following observations: (1) CMF-I significantly outperforms ConvMF on both datasets (p < 0.01), showing that the item relationship is beneficial to context-aware recommendations, and only consider the item's review documents cannot get better results than combining them together. (2) e performance of JCMF changes significantly with varying the value of λ C . In both NC, WI, and OH, with the increase in λ C , the performance of JCMF first increases (RMSE and MAE decrease). But when λ C surpasses a certain threshold (0.1 in NC and OH and 0.0001 in WI), the performance of JCMF will decrease with further increasing the value of λ C . is result, again, demonstrates the importance of item relations; that is, we can further improve the recommendation performance by the shared item latent factor.

Importance of Social Relations (RQ3).
To validate the effectiveness of social relations, we compare CMF-SI with CMF-I and tune the hyperparameter λ S in JCMF to investigate the impact of the social relationships. e hyperparameter λ S controls how much JCMF will depend on user's social influence. It plays a role to balance the information from user-item rating matrix and user's social network. Figure 6 and Table 4 present the experimental result of JCMF on NC, WI, and OH. From these results, we have the following observations: (1) CMF-SI achieves a better performance than CMF-I on both three datasets (as shown in Figure 6). is result demonstrates the importance of social relations and indicates that considering the user's social influence is helpful for the CR task. (2) e hyperparameter λ S impacts the performance of JCMF significantly (as shown in Table 4). With the increase in λ S , the recommendation performance of JCMF increases as first, but when λ S achieves at a certain value (150 in NC and 70 in WI and OH), JCMF begins to decrease with further increasing the value of λ S . From the above results, we can observe that social influence is important to the CR-based recommendations, jointly considering the social network and user-item ratings can significantly improve the recommendation performance.

Importance of User's Review Documents (RQ4).
To demonstrate the importance of user's review documents and answer the RQ4, we further compare JCMF with CMF-SI and tune the hyperparameter λ U in JCMF on NC, WI, and OH, which is a parameter that balances between the importance of ratings and the review documents of users. From the experimental results shown in Figure 6 and Table 5, we can observe that (1) JCMF works better than CMF-SI on both NC, WI, and OH datasets, demonstrating the importance of considering the user's review documents. is result is also consistent with the result in [11]; that is, incorporating the CNNs into PMF can effectively model the rating information and the document contextual information simultaneously.
(2) e parameter λ U impacts JCMF significantly. Specifically, when λ U is 15, 8, and 1, JCMF can reach the best performance on NC, WI, and OH respectively, whereas the performance of JCMF suddenly drops with other λ U values. In other words, relatively low or high λ U fails to achieve higher performance. is phenomenon demonstrates the importance of the user's review documents and the effectiveness of our joint convolutional matrix factorization solution.

Convergence Analysis (RQ5).
In this section, we further conduct experiments to compare the convergence performance of JCMF and ConvMF on NC, WI, and OH. In experiments, we set the max interaction as 200 and adopt early stopping when getting the expected prediction errors.
e experimental results are shown in Figure 7, from which we can observe that the JCMF has similar convergence performance with ConvMF while achieving a better recommendation result. Incorporating the relations of users and items and the user's review documents do not slow down the convergence of JCMF. In some cases, it can even converge faster than ConvMF (as shown in Figure 7(c)). is result directly demonstrates the efficiency of our JCMF method and answers the RQ5.

Related Work
is section briefly reviews the related works from three aspects: social recommendation, context-aware recommendation, and CNN-based content modeling method.

Social Recommendation.
Due to the recent development of social networks, the methods that leverage the social network information to improve traditional recommender systems have been widely studied [20,[28][29][30][31][34][35][36][37][38]. For example, to consider the social network information for making recommendations, Ma et al. [20] proposed a fused matrix factorization framework to model the user's social influence and rating matrix simultaneously. Zhao et al. [36] incorporated social connections into the local low-rank framework and solved the problem of constructing the submatrices by social regularization terms. Based on the intuition that users always prefer the items recommended by the friends they trust, Ma et al. [28] fused the users and their trusted friends' tastes together by an ensemble parameter. Guo et al. [31] extended the state-of-the-art recommendation algorithm SVD++ by further incorporating both the explicit and implicit influence of trusted users on the prediction of items for an active user.
Although the above methods have achieved great success in modeling user's social influence, they only focus on the social relationships, and other kinds of context information (such as description content, labels, and tags) are not investigated. To further improve the performance of social recommendation, recent works start to find ways to merge these information [6,21,22,39]. For example, Jiang et al.    [39] explored the social recommendation on the basis of psychology and sociology studies and incorporated two social context factors (i.e., individual preference and interpersonal influence) into the probabilistic matrix factorization framework. Hu et al. [40] investigated the effectiveness of fusing social relations and review texts to rating prediction by the alignment between the latent factors found by social matrix factorization (SocialMF) and topics found by topic MF. However, the above methods only understand the content information in a shallow way, and the performance of modeling the document content by a CNN network in social networks is still unexplored.

Context-Aware Recommendation.
To alleviate the inherent data sparsity issue of recommender systems, contextaware recommendation (CR) methods have been widely studied in recent years, which aim to leverage the context information (such as temporal information, location, and content) to improve the performance of traditional recommender systems. For example, Zhang and Chow [18] proposed a probabilistic framework called TICRec by taking into account time context information for recommendation. In this work, TICRec utilized temporal influence correlations of both weekdays and weekends for time-aware location recommendation and estimated the probability density over the continuous time of a user visiting a location to avoid the loss of time information. To incorporate multitypes of context for recommendation, Karatzoglou et al. [41] extended the concept of matrix factorization to that of tensor factorization and proposed the high-order singular value decomposition (HOSVD) method to construct a tensor decomposition model of "user, commodity, and context" to personalize recommendations. Baltrunas et al. [42] extended the classical MF approach by taking into account contextual information in the rating prediction step of a recommender system. Yao et al. [43] employed tensor factorization to model the multidimensional contextual information, where they exploited a high-order tensor to fuse heterogeneous contextual information about check-ins, rather than the traditional two-dimensional user-location matrix.
Besides the temporal information, locations, and profiles, the content contexts are also been considered recently. For example, Gao et al. [44] investigated various types of content information on LBSNs in terms of sentiment indications, user interests, and POI properties and then modeled the three types of information under a unified POI recommendation framework. Wang et al. [45] proposed a preference enrichment approach via leveraging auxiliary data of online reviewers' aspect-level opinions so as to predict the buyers missing preferences. To consider the opinion spam in product reviews, Raghavan et al. [46] proposed a collaborative filtering method using quality scores consisting of two stages; that is, they first estimated the quality scores for individual ratings in the dataset. en, they utilized the quality scores estimated from the previous stage as weights for the ratings in the probabilistic matrix factorization framework. However, the above content-based methods cannot model the contextual information of the content, leading to a shallow understanding of the content semantics. Furthermore, none of the above methods model the relationships and content context of both users and items simultaneously, which motivates us to develop a joint context-aware recommendation method. Recently, Wang et al. [25] explored document review and social relations to improve traditional recommendation, but they assume items are i.i.d, and did not investigate the influence of item relationships in the CR task.

CNN-Based Content Modeling
Method. CNN as one of the most powerful feature representation techniques has achieved great success in many research areas, such as computer vision and natural language processing. Recently, researchers found that CNN can also be utilized to model the document content information. For example, as CNN can effectively capture the local features of documents through modeling components such as local receptive fields and shared weighted, Kim et al. [11] utilized CNN to capture the contextual information of documents and proposed a convolutional matrix factorization (ConvMF) method to enhance the rating prediction accuracy. To cope with the ambiguity and variability of linguistic expression, He et al. [47] modeled each sentence using a CNN that extracts features at multiple levels of granularity and uses multiple types of pooling, and proposed a model for comparing sentences that use a multiplicity of perspectives. Ruder et al. [48] used a CNN for both aspect extraction and aspect-based sentiment analysis and proved its effectiveness in sentence modeling tasks. Zheng et al. [33] utilized two parallel CNNs to jointly model the reviews text from both users and items and proposed a deep collaborative neural network (Deep-CoNN). In their method, a shared is introduced on the top to couple these two networks together. However, DeepCoNN does not consider the relationships of users and items and can only achieve the supoptimal result. e above methods demonstrate the ability of CNN in extracting document features and provide us an effective way to mine the semantic clues from review documents.

Conclusions
In this work, we investigated the CR problem and proposed a joint convolutional matrix factorization (JCMF) to jointly consider the review text and connections of both users and items in a unified recommendation framework. More specifically, to consider the impact of item's correlations, we assume items that are bought/clicked by the same user within a short time window having a strong correlation with each other. en, we incorporated the predefined correlation network into ConvMF by a shared item latent feature space. To consider the impact of user's social influence, we further integrate the social relationship of users by bridging the user latent space between user-item matrix and social network. To consider the impact of user's review document, we exploit another CNN module to extract user's document contextual information.
e experimental results on the real-world Complexity 13 dataset demonstrate the superiority of our proposed method compared with other baseline methods.
Data Availability e data supporting this study are from previously reported studies and datasets, which have been cited. e processed data are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.