A Hybrid Deep Collaborative Filtering Approach for Recommender Systems

Recommender Systems (RS) help the users by showing better products and relevant items efficiently based on their likings and historical interactions with other users and items. Collaborative filtering is one of the most powerful technique of recommender system and provides personalized recommendation for users by prediction rating approach. Many Recommender Systems generally model only based on user implicit feedback, though it is too challenging to build RS. Conventional Collaborative Filtering (CF) techniques such as matrix decomposition, which is a linear combination of user rating for an item with latent features of user preferences, but have limited learning capacity. Additionally, it has been suffering from data sparsity and cold start problem due to insufficient data. In order to overcome these problems, an integration of conventional collaborative filtering with deep neural networks is proposed. A Weighted Parallel Deep Hybrid Collaborative Filtering based on Singular Value Decomposition (SVD) and Restricted Boltzmann Machine (RBM) is proposed for significant improvement. In this approach a user-item relationship matrix with explicit ratings is constructed. The user - item matrix is integrated to Singular Value Decomposition (SVD) that decomposes the matrix into the best lower rank approximation of the original matrix. Secondly the user-item matrix is embedded into deep neural network model called Restricted Boltzmann


Introduction
Recommender system (RS) is one of the most powerful decision-making process.With fast growth of information in internet creates the information overload problem.To address this issue, several recommender systems (RS) has been developed to help users to find better products 21 .These recommender systems are used for providing personalized recommendations based on user profile and user behaviours' 4 .These systems have been used in various products and services on the internet to help the user find and select the items like books, movies, foods, music, etc.Many web services like Amazon, Netflix, YouTube and other social networks have adopted to recommender systems.The use of recommendation system in music to discover new songs for users 18 and like-minded friends in social network 20 is also available in literature.
The input to a recommender system is generally the user rating (numerical value), demographic data (age, gender, occupation), content data which are textual analysis of related items rated by user and also depends on the type of filtering algorithm like Collaborative Filtering (CF), Content-Based Filtering (CBF) and Demographic Filtering (DF) used for RS.The output of recommender systems can be either prediction or a top recommendation.A prediction is a numerical value ra,j, which represents the anticipated opinion of active user ua where a=1,2,...m , for item ij where j=1,2,...n.Recommendation is representing a list of top N items where N ≤ n, which the active user is expected to like most.
Collaborative Filtering (CF) is one of the key techniques of personalized recommender systems 6 .This method is classified as memory-based and model -based.The main idea of Collaborative Filtering is that if two users have similar likings of certain items in the past, they will also like similar items in future.
These recommendations are not only based on user past but also an users' explicit ratings and implicit rating.The traditional CF creates the user-item matrix based only on preferences for particular items given by users.Some limitations of CF are cold-start and data sparsity problems.The cold-start problem is related to recommendations for new user, new item or new community.In case of new users or new items arrives, the system does not have sufficient data to predict about their user / item preferences in order to make recommendations.If a new item does not have any user ratings, the user-item matrix is very sparse and the recommendation accuracy usually drops very obviously.
The matrix factorization (MF) is one of the most popular collaborative filtering techniques.The MF approach is one of the most accurate approaches to reduce the problem of high levels of data sparsity 1 .MF can map both usersitems to a joint latent factor space of dimensionality, by factoring the user-item interaction matrix.Using matrix decomposition techniques, User-item interactions are showed as inner products in that space 8 .In latent space, based on similarities among the users and items the recommender system predicts a personalized ranking over a set of items for each individual user.So, this model finds the hidden structure behind the data accurately.The popular decomposition models are Principal Component Analysis (PCA), Singular Value Decomposition (SVD), Probabilistic Matrix Factorization (PMF) and Non-Negative Matrix Factorization (NMF).MF invariably reduces the dimension of data, therefore causes difficulty in improving the prediction accuracy.In addition, it also suffers from cold start problem and data sparsity issues.In order to address these issues, the idea is to build deep learning-based hybrid recommendation method.
Recently, a powerful representation learning approach called deep learning methods have been successfully applied to various fields such as  16 in recommendation systems.AutoEncoder (AE) is neural network model attempting to reconstruct its input data in the output layers, which consists of encoder and decoder component 19 .There are many variants of autoencoders such as marginalized denoising autoencoder, sparse autoencoder, denoising autoencoder, contractive autoencoder and variational autoencoder.Autoencoder and denoising autoencoder have also been applied for recommendation 10 .These methods are used to reconstruct the users' ratings through learning hidden structures with the explicit historical ratings.

Nowadays researchers have explored designing collaborative filtering
based Deep Neural Network (DNN) for recommendations 21 .However, traditional CF methods and deep learning train their models in a batch learning algorithm or train entire data set at once.The growth of information, continuous data streams are stored in the system from time to time.In each time the user interest and preferences are changing all the time.When new data arrives in the system, traditional CF and Deep learning have to train the model from scratch.These methods are expensive for them to retrain and update the model's parameters 12 .
The Convolutional Neural Networks (CNN) model is used to extract useritem features from users and items side information to resolve the new data arrival problem by deep bias probabilistic matrix factorization (DBPMF) model 8 .One of the major limitations in CF is data sparsity problem and it addressed by fuzzy interference rules to enhance the recommendation accuracy 1 .This model first categorizes the user like, user dislike and common users' like and dislike both and finally measures the loss at attribute at feature due to sparsity.
The rule of interference plays a vital role at product rating and measures the correlation of recommendation matrix.
Many recommendation models focus on user ratings, user past behaviour, item information and user/item side information.This information cannot fully expose user-item relations for accurate prediction.Hybrid weighting is a hybridization technique that computes the predication score of all recommendation approaches by giving weight to each approach and summing these weights to produce a new output recommendation.In order to elevate cold start problem a Weighted Parallel Deep Hybrid Collaborative Filtering recommendation method is proposed.

The proposed work is summarized as follows:
 A user-item relationship matrix with explicit ratings is constructed.

Problem formulation
The goal of a recommender system is to recommend new items or to predict a certain item for a particular user based on the user's previous information.In a collaborative filtering strategy, there is a list of  users and  items with extremely sparse rating matrix  ∈  × .Each entry   represents the rating about the user 's rating on item .If   ≠ 0 means the rating about the user  on the item  is observed, otherwise not observed. ∈  × and  ∈  × denote the user and item latent factor vector respectively, where  is the dimensionality of the latent space.Moreover, the additional side information matrix of user and item is denoted by  ∈  × where  is the side information of user and  ∈  × where  is the side information of item.
Then, the corresponding matrix forms of latent factors for user and items are  = 1:  and  = 1:  respectively.Given the sparse rating matrix R and side information matrix  and , the goal is to learn user latent factors  and item latent factors  and hence to predict the missing ratings in .

Matrix Factorization
Matrix Factorization (MF) map both users -items to a joint latent factor space of dimensionality, by factoring the user-item interaction matrix.Using matrix decomposition techniques, User-item interactions are showed as inner products in that space 4 .The MF decomposes the original rating matrix  into low-rank matrices  and  consisting of the user and item latent factor vectors respectively, such that  ≈ .Given the latent factor vectors for users and items, a user's rating for a movie is predicted by the inner product of those vectors. ∈  × and  ∈  × denote the user and item latent factor vector respectively, where the dimensionality of the latent space is denoted as .
Accordingly, each user  is associated with a vector   ∈   , and each item  is associated with a vector  ∈   .For a given item j, the elements of   measure the degree to which the item possesses those factors as positive or negative.The resulting dot product      captures the interaction between the user  and item , the user's overall interest in the item characteristics.This approximates user 's rating of item  which is denoted by   leading to the estimate To learn the factor vectors (  and   ), the system minimizes the regularized squared error on the set of known ratings as: Here,  is the set of (, ) pairs for which   is known as the training set.The constant  controls the extent of regularizations and is usually determined by cross-validation.

Proposed Hybrid Model
Hybrid recommendation model easily combine different algorithm that might do totally two different things by having a common interface on every recommender.There is more than one way to do a hybrid recommender system perhaps, to reserve some slots for top N recommenders and one recommender provides the accuracy for prediction.Also rating prediction from many recommenders in parallel and add average of the ratings together before ranging  Collaborative Filtering technique for recommendation system.In this approach user-item matrix with explicit rating is given as input for both SVD and RBM method.In order to alleviate the cold-start problem of recommendation, also is a non-deterministic approach, that is RBM nodes make stochastic decision to decide either to turned on or off an input.RBM is used in the domain of rating prediction for learning user preferences and item ratings by inferring latent features.The restriction in RBM is that, there is no intra-layer communication.
This restriction allows for more efficient training algorithms.To build a robust generative model, the RBM performs several forward and backward passes to learn the optimal weights.
In recommendation result, both SVD and RBM algorithms have their own predication rating score.By experimenting, weight values W1 and W2 is assigned to each algorithm for making final weighted average prediction score for hybridization.This hybridization technique leverages their benefits for learning more expressive models.Weighted parallel hybridization can overcome the shortcoming of those two techniques when implemented stand alone.In order to cover the drawbacks of each technique with the advantages of other technique, both techniques are combined with an approach known as weighted parallel hybrid technique.

Singular Value Decomposition (SVD)
The Singular Value Decomposition (SVD) is the powerful technique of dimensionality reduction.The key issue in SVD decomposition is producing a low-rank approximation.Given a  ×  matrix , with rank , the singular value decomposition ()is defined as Where The reconstructed matrix   is the closest approximation to the original matrix .The best- rank approximation of matrix  with respect to the Frobenius norm can be represented by: Prediction generation using : Once the  ×  ratings matrix  is decomposed and reduced into three  component matrices with  features   ,   and   , prediction can be generated from it by computing the cosine similarities (dot products) between  pseudo-customers   .√   and  pseudo-products   .√   .In particular, the prediction score  , for the  th customer on the  th product by adding the row average   ̅̅̅ to the similarity.

Restricted Boltzmann Machines (RBM)
RBM is a two-layer stochastic neural network consisting of visible and hidden units.It has one layer of visible units (users' movie preference), one layer of hidden units (the latent factors) and a bias unit (whose state is always on, and adjust for the different inherent popularities of each movie) in Figure 2. Also, each visible unit is connected to all the hidden units in an undirected form, so each hidden unit is connected to all the visible units and each bias unit is connected to the all visible and hidden units.In order to make the learning easier, the network is restricted that there is no interconnection within visible layer and also hidden layer.

Figure. 2. A Framework of Restricted Boltzmann Machine
The stochastic, binary visible units encode user preferences on the items from the training data, therefore the state of every visible unit is known.Hidden units are also stochastic, binary variables that capture the latent features.The network assigns a probability to each possible pair of a hidden and a visible vector via this energy function: Where  is the energy of the system and  is a normalizing factor as defined in 5 .To train for the weights, a contrastive divergence method was proposed by Hinton (Hinton, 2002).
Consider, there are  movies,  user and integer ratings on scale from 1 to .The first problem in applying RBM to movie ratings is efficiently dealing with the large number of missing ratings.The solution proposed in 16 is to consider that the visible units corresponding to the movies that the user did not rate simply do not exist.In practice, these visible units are always turned off and hence their state is always zero.Alternatively, this can be seen as using a unique RBM per user, all sharing the same units but each only including the softmax units for the movies rated by their user.

Figure. 3. Restricted Boltzmann Machine for Collaborative Filtering
In Figure .3binary visible units are replaced with softmax units.For each user, the RBM only includes the softmax units for the movies rated by that user.
Let us consider some user  rated  movies.Let visible unit  be a  ×  matrix such that    =1 if  rated movie  as  and 0 otherwise.Let ℎ , j= 1,……,F, be the binary values of the hidden units of user features.Columns of  are modelled using a multinomial (a softmax) and hidden latent feature ℎ are modelled just like in equation ( 6) and (7).
Where    is the weight on the connection between the rating  of movie  and the hidden unit ,    is the bias of rating  for movie  and   the bias term of hidden unit .
The marginal distribution over the visible ratings  is equation (10): With an energy term given by: The movies with missing ratings do not make any contribution to the energy function.

Learning
The parameter updates required to perform gradient ascent in the loglikelihood can be obtained from equation ( 10) (): Where  is the learning rate.The expectation <    ℎ  >  defines the frequency with which movie i with rating k and feature j are on together when the features are being driven by the observed user-rating data from the training set using equation ( 9) ( ℎ  = 1|ℎ), and <    ℎ  >  is an expectation with respect to the distribution defined by the model.

Making Recommendations
Restricted Boltzmann machine can model a rating distribution over the visible units.In order to infer the missing ratings, user ratings are clamped on the softmax units and single Gibbs sampling step is performed over all the missing ratings.The exact prediction algorithm is given below:

Algorithm: RBM -Making Recommendations
Input: a user , an item  Output: an estimation of (, ) 1. Clamp the ratings of u over the softmax units of the RBM.

Weighted Parallel Deep Hybrid Model
Hybrid weighting is a hybridization technique that computes the predication score of all recommendations approaches by giving weight to each approach and summing these weights to produce a new output recommendation.
There are two ways to determine the weight, namely empirical bootstrapping and dynamic weighting.Both SVD and RBM recommendation approaches are combined using weighted strategy, the prediction score of users  to item  can be computed as follow Where   denotes the weight of algorithm   (, ).In proposed model, weighted hybridization combines two techniques, so set  = 2. Accordingly, if  = 2, the computation of predication score can be written as: and the optimized weight can be gained by computing: Finally, the proposed weighted parallel hybrid model provides accurate prediction for personalized recommendation.The cold-start problem is alleviated by weighted hybridization prediction score of two algorithms as combines the benefit of both SVD and RBM.         1.

Table 1. Description of the MovieLens Datasets
Both datasets also have additional information of users and item such as age, gender, occupation of users, release date and genres of movies.

Experimental setup
The proposed model is implemented on windows 10 and 64bit operating system running on Intel® Core™ i7-7500U CPU @ 2.70 GHz x64based processor and hard disk of 500GB.This hybrid model is implemented in python language version 3.7 and tensor flow 3.0 as backend.

Parameter settings
The parameters for RBM are, the hidden dimension layer is 50, rating value is 10, learning rate is 0.001 and batch size is 100.

Evaluation Metric
Where  is the number of testing data samples and   is the rating of user  on the item  and  ̂ represents the predicted rating.A smaller value of RMSE and MAE indicates better performance of the method.

Performance analysis
The proposed hybrid algorithm is compared with the traditional recommendation algorithms given below.

RBM (Restricted Boltzmann Machines
) 16 : RBM is a two-layer stochastic neural network consisting of visible and hidden units.To make the learning easier, restrict the network so that no visible unit is connected to any other visible unit and no hidden unit is connected to any other hidden unit.SVD (Singular value decomposition) 14 : SVD is the powerful technique of dimensionality reduction.In this algorithm, where the rating matrix is decomposed into three matrixes.These matrixes will be used to make prediction.

Tables captions
Table 1.Description of the MovieLens Datasets.
Table 2.The performance of RMSE and MAE on Movielens 100K.
Table 3.The performance of RMSE and MAE on Movielens 1M.
them.The proposed model is a Weighted Parallel Deep Hybrid RS, which is a combination of two technique called as collaborative filtering and deep learning technique.A weighted hybridization technique combines two or more recommendation systems by computing weighed sums of their prediction rating scores.These scores are hybridized by using a uniform weighting scheme.It enhances the overall performance of the system.

Figure. 1 .
Figure. 1. Architecture diagram of Proposed Hybrid Recommendation System Figure.1 shows the architecture of a Weighted Parallel Deep Hybrid A Weighted Deep Parallel hybrid method: The Weighted Parallel hybrid model, makes use of user-item rating matrix as input.A weighted hybridization technique combines the two recommendation systems by computing weighed sums of their prediction rating scores.Both SVD and RBM algorithms have their own predication rating score.The evaluation metrics RMSE and MAE are used to measure the performance among the different models.The five-fold cross-validation approach has been used in each dataset.The RMSE and MAE values achieved by the proposed model and the other algorithms in comparison are given in

Figure 6
Figure 6 and 7 shows the convergence of RMSE and MAE values for the proposed hybrid method and their corresponding individual algorithms namely, SVD and RBM using MovieLens-100K dataset correspondingly.The gradual decrease in both the metrics indicate the superiority of the hybrid approach

) 12 :
A widely used matrix factorization model.PMF is a model to factorize the useritem matrix in recommendation.

) 11 :
SCC is a recommendation algorithm based on clustering.It is an item-based clusters using a self-constructing clustering method.RMbDn (A recommendation model based on deep neural networks) 13 : This recommendation based on QPR (Quadric Polynomial Regression) model to obtain the latent features of user -item and combine with DNN (Deep Neural Network) model for predicting user's rating score.6.ConclusionThe proposed hybrid collaborative filtering model, combines matrix factorization method SVD and deep learning technique RBM.The hybridization model can make use of both user-item rating matrix and side information for better prediction in recommendation.The performance of proposed model is evaluated through two different datasets namely, MovieLens -100K and MovieLens -1M using RMSE and MAE metrics.The five-fold cross-validation approach is used for evaluation.The experimental results show that the proposed hybridization model achieves better performance among the existing methods on both the datasets based on RMSE and MAE values.

Figure 1 .Figure 2 .
Figure 1.Architecture diagram of Proposed Hybrid Recommendation System.(Attached in separate PDF document)
Computing the SVD is far more expensive than most of the other techniques.Singular value decomposition provides inferior solution for cold start problems.Hence, SVD alone is not suitable for recommendation.SVD integrate user-item side information such as age, gender, occupation of users, release date and genres of movies as input for both SVD and RBM method.In recommendation result, both SVD and RBM algorithms have their own predication rating score   and   .By experimenting, weight values W1 and W2 are assigned to each algorithm for making final weighted average prediction score  ℎ for hybridization.Collaborative filtering can predict accurately but the cold start problem exists in this technique.The SVD is a powerful technique of dimensionality reduction and to producing a low-rank approximation with excellent scalability and accuracy.energy-based model.It is probabilistic, unsupervised, generative deep machine learning algorithm.RBM can learn a probability distribution over its set of inputs and is a more efficient learning algorithm.RBM's objective is to find the joint probability distribution that maximizes the log-likelihood function.RBM and  are orthogonal matrices with dimensions  ×  and  ×  respectively.Matrix  is a diagonal matrix also called singular matrix with dimension  ×  having non-negative real numbers.The set of initial  values of  ( 1 ,  2 , … … ,   ) are all positive with  1 ≥  2 ≥  3 , … … … ., ≥   .The first r columns of  are eigen vectors of   and represent the left singular vectors of .Similarly, the first  columns of  are eigen vectors of and represent the right singular vectors of . provides the best low-rank approximation of the original matrix  .It is obtained by retaining the first  diagonal values of, by removing  −  columns from  and by removing  −  rows from , which can be represented as follows:

Table 2 and
Table 3 for Movielens 100K and Movielens 1M correspondingly.The tables clearly show that the proposed hybrid method gives better performance in terms of both RMSE and MAE compared to other algorithms for both the dataset.

Table 2 .
The performance of RMSE and MAE on Movielens 100K.

Table 3 .
The performance of RMSE and MAE on Movielens 1M.
. 7. MAE performance on Movielens 100K.Based on the experimental results, the proposed hybrid model has the highest recommendation accuracy under both metrics on two datasets.The :