Word Sequential Using Deep LSTM and Matrix Factorization to Handle Rating Sparse Data for E-Commerce Recommender System

Recommender systems are essential engines to deliver product recommendations for e-commerce businesses. Successful adoption of recommender systems could significantly influence the growth of marketing targets. Collaborative filtering is a type of recommender system model that uses customers' activities in the past, such as ratings. Unfortunately, the number of ratings collected from customers is sparse, amounting to less than 4%. The latent factor model is a kind of collaborative filtering that involves matrix factorization to generate rating predictions. However, using only matrix factorization would result in an inaccurate recommendation. Several models include product review documents to increase the effectiveness of their rating prediction. Most of them use methods such as TF-IDF and LDA to interpret product review documents. However, traditional models such as LDA and TF-IDF face some shortcomings, in that they show a less contextual understanding of the document. This research integrated matrix factorization and novel models to interpret and understand product review documents using LSTM and word embedding. According to the experiment report, this model significantly outperformed the traditional latent factor model by more than 16% on an average and achieved 1% on an average based on RMSE evaluation metrics, compared to the previous best performance. Contextual insight of the product review document is an important aspect to improve performance in a sparse rating matrix. In the future work, generating contextual insight using bidirectional word sequential is required to increase the performance of e-commerce recommender systems with sparse data issues.


Introduction
e development of recommender systems (RS) aims to support marketing by increasing target selling. RS has been developed to generate product recommendations to help customers choose a product automatically. RS has been adopted in many large e-commerce companies such as Amazon, Google, Netflix, iTunes, Facebook, eBay, and Alibaba. Many experts explained that the successful adoption of recommender systems could significantly influence the marketing target [1]. Most e-commerce companies in the world decided to implement recommender systems to increase service satisfaction for their company by making it more enjoyable for the customers to look for the products they need. A recommender system is an essential tool to promote the products and services for many online websites and mobile applications. For instance, 80% of the movies watched on Netflix came from recommendations [2], and 60% of video clicks on YouTube came from home page recommendations [3]. According to Schafer et al. [4], sales agents with recommendations from the NetPerceptions system achieved a 60% higher average cross-sell value and 50% higher cross-sell success rate than agents using traditional cross-sell techniques, based on experiments conducted at a UK-based retail and business group.
Based on a general algorithm approach [5][6][7][8][9], e-commerce RS are divided into four types: (1) content-based, which is a method to generate recommendations according to a product classification approach-it involves information retrieval to generate product recommendations; (2) knowledge-based, which develops a specific and/or necessary recommendation and includes providing product information rarely needed for individual purposes (e.g., houses, loans, insurance, and cars.); (3) demographic-based, which refers to product recommendations established according to demographic information; and (4) collaborative filtering, which is a mechanism used to produce recommendations based on the user's behaviour in the past, such as a product rating, product review, comment, testimony, and purchase.
Collaborative filtering is considered as the most successful recommendation technique to be implemented in many large e-commerce companies, as it can provide recommendations with special character information such as providing product fit information, giving relevant information, having highly accurate recommendations, and being serendipitous [10]. In common use, most collaborative filtering adopts ratings as explicit feedback for the basic calculation method to compute the similarity in users' behaviors. Unfortunately, the number of ratings is very small. In general, customers are lazy to give ratings for a product. GroupLens product is the most popular e-commerce dataset containing the movie rating matrix, which includes ML-100k, ML-1M, ML-10M, and ML-20M [11]. Amazon is the second most popular dataset that contains ratings of only less than 1%. e most common problem in collaborative filtering is generating rating prediction in sparse data rating matrix conditions. Traditional collaborative filtering implemented memory-based popular neighborhood model to obtain rating prediction. Most traditional statistical approaches have been created by several researchers during the early emergence of collaborative filtering in the mid-90s. Collaborative filtering calculates the nearest neighbor among users with similar behaviors with respect to product interest. Unfortunately, the findings for the calculation of a neighbor's vector require heavy computation in large-scale datasets. From a practical point of view, memory-based methods adopt neighbor heuristics, so they may meet several challenges on large datasets. e neighborhood algorithm uses several kinds of traditional statistic mechanisms, such as cosine similarity, Spearman's rank, Pearson's correlation, etc. An example of the nearest neighborhood model to calculate the similarity in user behaviors is shown in the following equation: e memory-based model results in a simpler product recommendation, which is easy to be implemented and requires no training data to gain product ranking. is is the benefit of memory-based collaborative filtering. However, memory-based face a serious problem with respect to scalability. e increasing number of users and products may cause computation levels to be heavy or high. is is the essential reason that emerges in the modern collaborative filtering model, popularly known as a model-based or latent factor model that functions to exploit the latent correspondents between the user and product relationship.
Matrix factorization, popularly called model-based, becomes more popular over memory-based since the Netflix competition has been held in 2006. Model-based involves matrix factorization to handle the completion of the rating matrix. In fact, the matrix factorization model was introduced by Sarwar et al. [12] in the early 2000s by using a lowrank dimensional implementation called Singular Value Decomposition (SVD). SVD tried to calculate the latent relationship between the user and items. Koren et al. [13] proposed a novel SVD to improve the traditional SVD in order to increase its effectiveness in generating rating prediction, including time-stamp rating given by the user that named temporal effects, namely, TimeSVD. According to an experiment report, TimeSVD succeeded in improving the performance of Sarwar et al.'s traditional SVD. Koren et al. attempt to enhance the previous work using SVD combined with neighborhood representation [14]. Another model considers mathematical and statistical approaches that only consider the rating information proposed by Salakhutdinov and Mnih [15]. Salakhutdinov and Mnih popularized the probabilistic approach to be integrated with matrix factorization, called probabilistic matrix factorization (PMF). e PMF model is claimed to be an extended version of the SVD model. PMF works by transforming user and item information into a 2D vector dimension using Gaussian normal distribution. e PMF model succeeds in generating rating prediction in large datasets, and, surprisingly, it is also robust when faced with imbalanced data. Figure 1 shows an example of the rating matrix representation of collaborative filtering, where the red color represents the unrated items. e latent factor models have succeeded in increasing the performance of an accurate rating prediction based on memory. However, there was a shortcoming when dealing with extremely sparse data conditions. Several experts have proposed various models to support latent factor performance. e researcher considered integrating the product review with matrix factorization. One of the researchers is Ling et al. [16], who proposed a novel model using item review to support the latent factor model. A document of a product review is the representation of a user's satisfaction over a product. In that research, Ling et al. used the Latent Dirichlet Allocation (LDA) model to interpret product review documents. e LDA model proposed by Ling et al. succeeded in refining the traditional latent factor using SVD, TimeSVD, and PMF. Another model suggested by Wang and Blei [17] also proposed a model using LDA to interpret product documents and integrate them with a latent factor called Collaborative Topic Regression (CTR). Wang and Blei employed probabilistic matrix factorization to produce rating prediction. Both LDA models that were integrated into matrix factorization were successful in increasing the effectiveness in generating rating prediction. Table 1 shows the previous state-of-the-art methods, including the traditional latent factor and the hybrid between the latent factor and product review document. e interpretation of text documents becomes an essential factor in the field of Natural Language Processing (NLP). e traditional Bag of Words (BOW) mechanism was a popular method in the early decades and has been 2 Computational Intelligence and Neuroscience applied in several commonly used applications in the field of NLP. Unfortunately, the LDA model fails to capture the contextual understanding of sentence documents. Some experts have tried to refine the BOW mechanism by exploiting deep learning models. For example, for sentence classification, they applied the Convolutional Neural Network (CNN) that has been successful in refining accuracy levels in the traditional sentence classification in previous works [23]. According to current studies, in recent years, the application of CNN in recommender system territory has been proposed by Kim et al. [20]. Another researcher applied a subclass of deep learning, called autoencoder (AE), which aims to refine the performance of matrix factorization [24]. According to an experiment report, using a deep learning class, either AE or CNN, is successful in increasing the effectiveness of rating prediction as compared to the traditional BOW mechanism. However, according to contextual semantic insight perspective, most of the models ignored the contextual understanding of product documents. e contextual understanding of a sentence can be captured by the following two essential aspects explained as follows: (1) considering word order or word sequence and (2) considering subtle words to each other. A novel collaborative filtering method involving social information representation, called SSDAE, integrates collaborative filtering based on PMF and the social behaviour of the user [25]. e different way of this approach is that it involves social information documents to support matrix factorization as a latent factor representation approach. Several previous works only adopted product document representation, which may have had the limitation of user information representation. Similar to ConvMF, SSDAE can also consider PMF as a latent factor machine in order to obtain rating prediction.
Long Short-Term Memory (LSTM) is a subclass of deep learning. LSTM has a unique characteristic over other deep neural networks in which it can recognize sequential information aspects. is is an important aspect of learning contextual semantic understanding in the context of a document. LSTM can be integrated into a recommender Method Description SVD A collaborative filtering recommender system based on the latent factor model that is applied by using singular value decomposition (SVD) as the low-rank dimensional factorization, aimed at generating rating prediction [12] PMF An advanced version of the SVD model that considers a probabilistic approach to enhance the correspondent users and items. PMF has become a standard rating prediction approach that only involves ratings for collaborative filtering [15] LDA An early proposed model that integrates product review document and matrix factorization and aims to interpret the document by exploiting LDA to increase the effectiveness in rating prediction [16] CTR A state-of-the-art recommendation model, which combines collaborative filtering (PMF) and topic modeling (LDA) to utilize both ratings and documents [17] CDL Another state-of-the-art recommendation model, aimed at enhancing the accuracy of rating prediction by analyzing product documents using a deep learning machine approach based on the autoencoder (AE) that is integrated into the latent factor based on PMF [18] DCCR Deep collaborative conjunction recommender (DCCR), a model resulting from multilayer perceptron (MLP) and autoencoder (AE). e autoencoder is responsible for extracting the latent features of an item representation, and the MLP is responsible for detecting the correspondent user and item based on fusion [19] ConvMF A collaborative filtering model that involves the traditional matrix factorization model and the document of a product review. Capturing product document understanding involves the convolutional neural network (CNN) with dimensional reduction feature and word embedding [20]. is model is an enhancement of the CDL and CTR models Att-ConvCF A version of the collaborative filtering approach, combining matrix factorization and document product review using the attention method in the convolutional process. Matrix factorization is responsible for producing rating prediction [21] SRCMF Social review from customer integrated into matrix factorization to achieve effectiveness in generating rating prediction. is approach also requires a product document to be integrated into matrix factorization [22] Hanafi Mel Bert Computational Intelligence and Neuroscience system algorithm to improve sentence document interpretation. e implementation of LSTM is expected to support matrix factorization to increase the effectiveness in the generation of rating prediction. In this research, we proposed a novel method by including LSTM to transform the product review document into a 2D semantic latent space and integrate it with probabilistic matrix factorization (PMF). We evaluated our model using the evaluation metrics based on Root Mean Square Error (RMSE). We also applied our model to two real datasets: MovieLens (ML-1M) and Amazon Information Video (AIV). Our novel algorithm model includes LSTM-GLOVE-PMF. is research contribution is presented in Table 2, where contextual understanding using word embedding and LSTM is a novel hybrid latent factor model. Our proposed model is called LSTM-PMF.
In this paper, we demonstrated two contributions, including (a) a novel model to capture the contextual documents by considering the sequential aspects of LTSM and word embedding and (b) integration of the contextual documents into probabilistic matrix factorization. us, this experiment results require evaluating the aims to identify the performance achievement using RMSE evaluation metrics.

Materials and Methods
is research exploited two essential methods, namely, PMF and LSTM. PMF is responsible for generating rating predictions by learning the correspondence between items and user's information. Meanwhile, the role of LSTM is to support latent factors in generating rating prediction to enhance its effectiveness. LSTM works by utilizing product review documents to gain a 2D space document vector. e details of our proposed method involve two essential mechanisms that are explained in the three sections below. e architecture of LSTM-PMF is presented in Figure 2. e architecture figure consists of five-layer stages. Every layer territory carries out a specific task. e first layer on the top is responsible for collecting datasets, including ML-1M, ML-10M, and AIV. e second layer is to conduct preprocessing using an NLTK module and to develop the preprocessing results using word embedding based on GLOVE. After being processed in the second layer, the third layer territory will generate contextual understanding with the word sequential detection process using LSTM. is process is also responsible for transforming the document product review into a 2D vector space 50. e fourth layer is responsible for bridging the user latent space and item document latent space. e second task of this layer is to generate a rating prediction by learning the correspondence between variable U as a user representation and vector V as an item representation. In this layer, the probabilistic matrix factorization links the item document and user representation. e last layer is responsible for evaluating the rating prediction output using RMSE evaluation matrices that include several standards. A detailed description of the computation method is presented in the methodology section.
e Materials and Methods section contains sufficient detail so that all procedures can be repeated. It may be divided into headed sections in case several methods are described.

Probabilistic Matrix Factorization.
Since the latent factor model has been exploited in collaborative filtering in early 2006, several researchers have tried to solve the major problems, specifically regarding the sparse data issue. e latent factor model based on matrix factorization is a very effective method to generate rating prediction. Rating is an essential factor in producing product recommendations. Using a rating matrix obtained from customers, the recommender machine produces a product ranking, which is then presented to the customer or customer candidate. e basic principle of matrix factorization is to rotate, invert, and reduce the matrix content. erefore, a complete rating matrix can then be obtained [27]. SVD is an example of a successful matrix factorization model with a low-rank dimensional that is used to learn the correspondence between the item and users. PMF is claimed to be an extension of SVD that considers the Gaussian normal distribution to generate a rating distribution based on the probabilistic work mechanism rule. An illustration of the essential factorization model of the rating matrix for two lower-dimensional matrices can be depicted as follows: for example, M represents the movie, N represents the users, and an integer represents the rating value starting from 1 to K. R ij is the representation of user i with movie j. Also, U ∈ R D×N , V ∈ R D×M . e variables of U and V become the representation of the latent user and movie matrices, respectively. e rating prediction obtained by a given user i for movie j can be computed as R ij � U T i , V j . e illustration for the basic concept of matrix factorization based on collaborative filtering can be presented in Figure 3. e idea of PMF was initially proposed by Salakhutdinov and Mnih when the Netflix competition was held in mid-2006 [15]. PMF successfully refined Netflix's recommender system up to 4%. Unfortunately, with less than 10% achievement, the PMF model could not win the competition. PMF's categorical probabilistic linear approach used the Gaussian normal distribution and the vector representation of a user and movies acquired from the distribution of a rating correspondents. A detailed formulation of the distribution is given the following equation: Aimed at transforming into a latent feature vector of the item, this model considers using a zero-mean spherical Gaussian prior to a detailed equation as follows: Aimed at transforming into a latent feature vector of the user, this model considers using a zero-mean spherical Gaussian prior to a detailed equation as follows: PMF  Computational Intelligence and Neuroscience

Capturing the Contextual Insight of a Product Document
Using LSTM. e contextual understanding of a sentence can be understood by considering its word sequence and subtly words. Most neural network technologies generalize a process from the input to produce an output. Unlike most neural network models, LSTM pays attention to the process of input by observing the sequence of processes with time series in the input process. One interesting aspect of the LSTM method is the notion where it is possible to link past information stages and the current process, for instance, enabling past video frames to introduce an understanding to the current video. Referring to the context of natural language, it is essential to reveal the contextual understanding of a sentence document, where the sequential perspective is an essential aspect to be explored; this is due to the semantic insight point of view. A specific type of RNN is Long Short-Term Memory that is commonly known as LSTM. It is specially performed for long dependency learning. e LSTM is also an enhancement of the RNN architecture. It was first published by Hochreiter and Schmidhuber [28]. e model has been improved and popularized by many people for being suitable for several tasks in the field of computer science. Some formulas explain how the hidden state of LSTM can be used to learn sequential aspects from an input. e workings of the hidden state of LSTM are explained in Figure 4 and equation (5). e hidden layer of LSTM consists of several processes to accommodate the input layer, output layer, the previous process of the hidden state, and the output of the hidden state. e ability of LSTM to detect sequential aspects leads to several essential computation processes in the hidden state. A detailed explanation of LSTM's work is shown in equation (5). Every variable obtains an output due to the learning process based on some calculations that involve several aspects, as follows: i, f, o: i represents input, f represents forget, and o represents output gates. All of them own similar equations and have only different parameter matrices.
ese are known as gates due to the sigmoid rule that determines the value as either 0 or 1. g: It represents the hidden state, which is calculated based on the existing input and the past hidden state. c t : It represents the internal memory of the hidden state. It is a combination of the previous memory c t−1 that is multiplied by a forget gate and the new hidden state g that is multiplied by the input gate. h t : It represents the memory of the hidden state. e computed output of the hidden state on h t is multiplied by the output gate.

Preprocessing of Product Review Document.
In this research, preprocessing is carried out as a standard research process to extract the raw documents based on some previous work standards [24,29]. Preprocessing is necessary for the computational process to produce a document with representative meaning. A detailed description of the preprocessing step is given in Table 3 below.

Transform the Raw Document into 2D Vector Space.
After going through some of the preprocessing phases presented above, the results are transformed into a 2D vector space. In this process, the contextual meaning is expected to be successfully captured so that the user's expression can be  Figure 3: e common approach of the latent factor model using matrix factorization to produce rating prediction. Figure 4: Basic concept of LSTM work. 6 Computational Intelligence and Neuroscience correctly understood, such as understanding the meaning of the user's expression. e explanation of several processes in transforming the product review document into a 2D vector space is presented in Figure 5. In the beginning, datasets from AIV were collected. e product review documents selected were only those related to the MovieLens movie catalog. According to the LSTM mechanism, every word vector is like the output from word embedding obtained from the GLOVE processing, placed in a unique hidden layer [30]. As a result of the previous section, every product owns a product review containing 300 words in the form of word vector representation. e complete process for transforming the raw document into 2D vector 50 is explained in Figure 5.
After the product review of the document training set process using LTSM and word embedding is finished, the output in the form of 2D vector 50 would be integrated into probabilistic matrix factorization, which was expected to support handling sparse data problems powerfully. PMF plays an important role in bridging the user latent model and item document latent model in order to learn the correspondence between them.

Hybrid LSTM and PMF.
According to the LSTM point of view, it is not appropriate to use regression applications such as rating prediction in a collaborative filtering recommender system. e output of LSTM in the form of a 2D vector representation cannot be directly applied to predict the rating. Aimed at handling the above problem, LSTM needs to be integrated with matrix factorization, such as PMF. PMF is responsible for calculating the relationship between the latent model of users and the product latent space that strengthens the user and item correlation. For example, we have N as the representation of the user and M as the representation of the item. e formula to calculate the rating value is R ∈ R N×M matrix, while the formula for user representation and item representation is given by U ∈ R K×N and V ∈ R K×N , respectively. Finally, the table of products is obtained by U T V, with the objective to recalculate the table rating matrix R. Following the role of the probabilistic perspective, the normal distribution representation is as follows:

Method
Step description Set the maximum words e product review contains long sentences. However, in this experiment, this section limits the number of words in a sentence to a maximum of 300 words. In movie reviews, most of them consist of less than 300 words. Based on the above considerations, the number of words is limited to a maximum of 300 words. Following the previous works, this scenario is sufficient to generate information on the user's expression representation Remove stop words ere are many categories of words that can be selected, such as stop words, to achieve the goal. In the case of search engine applications, there are several existing words, concise purpose words, etc.; for example, there are, in, where, also, and on, and, especially in labels like "the on," "the also," "there are," or "in where." In another method, a search engine erases some of the most famous words, for instance, lexical words, such as "need" in a query that aims to increase achievement Remove frequently occurring words is section removes the data corpus for special stop words for documents that occur frequently (more than 0.5). is process is essential to avoid words that appear too often so that they dominate emergence Remove non-English vocabulary is section aims to remove all non-English vocabulary words from a catalog document. As an output, the average number of words per document is 97.09 and 92.05 on the MovieLens dataset 1 million (ML-1M) and Amazon instant video (AIV), respectively. In this section, items without a description document in every dataset catalog, and specifically in the Amazon dataset table, are removed. Besides, users without ratings below three are also removed. As an output, every data table demonstrates three datasets with distinct specifications. Even though many users were removed in preprocessing, the Amazon dataset remained quite lacking over the other data Remove frequently occurring words is section removes the data corpus for specifically stop words for documents that frequently occur with more than 0.5. is rule is essential to avoid the word from appearing frequently  Computational Intelligence and Neuroscience where µ is the mean of population number, σ 2 is variance value, andi j is an indicator function as a generative model for user latent models. e probabilistic model is a key factor in developing the LSTM-PMF model. Figure 6, as presented below, shows the role of PMF to bridge the item latent representation and user latent model within a document vector representation. e blue color is a matrix factorization territory consisting of U, V, and R. e red color is the item document representation territory. GLOVE-LSTM supports document representation to generate the weight of W variables.
e LSTM-PMF model, as illustrated above, obtained both item and user latent model processes. A detailed explanation of the two processes has been given in the following sections.

User Latent Model Representation. User information representation collected by
MovieLens contains user and rating information only. e user latent model territory uses zero mean spherical Gaussian prior by involving the variance value of user data σ 2 , and the following equation is given:

Item/Product Latent Model Representation.
Item information representation is collected from AIV in the form of item documents. A 2D vector 50 is obtained after passing several processes following the LSTM mechanism. According to the probabilistic point of view, the item latent model follows the following equation: Meanwhile, the item variable v j is obtained as follows: v j � lstm W, X j + ε j . (9) e probability density function in the probabilistic point of view with normal distribution can be obtained as follows: Document latent representation produced by word embedding and LSTM is required to be transformed to normal distribution and follows the following equation: Optimization of the learning latent space model between variables U, V, and W is explained in the following sections below.

Optimizing the Latent Space Dimension and Generating the Rating Model.
e optimization process works to strengthen the correspondence between the overall variables such as user latent variable, item latent variable, share weight variable, and bias variable of LSTM. We adopted the model to apply Maximum A Posteriori (MAP) [15]. MAP is a Bayesian statistic aimed to calculate an unknown quantity. It is similar to the posterior distribution. Specifically, it aims to optimize the learning variable in consideration of the MAP application. is method adopted log a posteriori through user and movie features using hyperparameters. e complete formula of MAP is presented as as follows: is experiment also applied a negative logarithm to learn the user and item feature for the training process with minimized loss function Las follows: where λ U is σ 2 /σ 2 U as the representation of users' variance, λ v is σ 2 /σ 2 V as the representation of item variance, and λ W is σ 2 /σ 2 W as the representation of W variance. us, to develop a coordinate descent, the researchers used the squared function to learn the correspondent U, V, and W. e following equation represents the coordinate descent and is given as follows: 8

Computational Intelligence and Neuroscience
We used a backpropagation algorithm to optimize W, in which W represents the weight variable and bias variable for every layer, which is an important step in it. Aimed at optimizing every layer, including V, U, and W, the update mechanism until convergence is required. e formula used to predict the unknown rating is given below: 2.9. Datasets. MovieLens is one of the most popular datasets to conduct an e-commerce experiment. It was initially developed in 1997 by the School of Computing, University of Minnesota [11]. Majority recommender system experiments applied MovieLens datasets [11,31]. It aimed to obtain information for personal suggestions. MovieLens datasets contain some categories that depend on the number of ratings, number of users, number of products, and the density level of sparse ratings. is experiment adopted the product review document from AIV, which is a popular dataset collected from Amazon [32][33][34]. e description of the dataset's characteristics is presented in Table 4. is experiment involves 2 MovieLens categories, including ML-1M that contains 1 million ratings with a sparse level of 4.64% and ML-10M that contains 10 million ratings with a sparse level of 1.41%. is is an important factor to be observed in the performance of LSTM-PMF in various sparsity level conditions.

Evaluation Result.
e performance of LSTM-PMF needs to be evaluated. RMSE evaluation matrices are the most popular method to evaluate the effectiveness of rating predictions [35,36]. e scenario of the experiment is divided into nine parts, in which every part splits the dataset by . (16) In essence, the result of rating prediction obtained by LSTM-PMF is compared with the actual rating based on dataset resources.

Experiment Tools.
In this research, some tools and library modules were used to make sure the experiment follows the standards of previous work, including deep learning tools, hardware, and supporting modules. e listing of tools and libraries are presented in Table 5.

Results and Discussion
e results of the rating prediction using several training data scenarios are demonstrated in the following figures. e results and comparisons consider presenting the existing state of the art using traditional matrix factorization based on PMF and the previous best result based on CNN and matrix factorization. is experiment consisted of 2 training scenarios, including implementation on real datasets from ML-1M and ML-10M.

Experiment Scenario on ML-1M.
Dataset ML-1M was categorized into the middle dataset from the scalability point of view and a normal sparse level, with a density factor of 4.64%. Aimed at investigating the performance of our model, we implemented the model into real datasets. e MovieLens dataset represents rating sparse data without product review. Meanwhile, Amazon is a categorized e-commerce dataset without a rating matrix, with rich product information in terms of product review document.  According to the experimental results as depicted in the nine figures above, the use of product reviews is very helpful in enhancing the effectiveness of rating predictions even in extremely sparse rating conditions. As reported in Figure 15, it can be inferred from the model to apply PMF where this model does not apply product reviews, while CNN-PMF and LSTM-PMF involve product review documents obtained outperform in accuracy and are faster to achieve convergence. Moreover, the implementation of the LSTM model aimed at capturing the contextual meaning of the product reviews achieves outperformance over the CNN model due to the fact that the LSTM model produces a higher share weight over CNN.
As shown in Figure 16, the model applying product review documents is superior in comparison to the traditional PMF model even in the extremely sparse rating condition (e.g., 10%, 20%, 30%, and 40%). e implemented LSTM model slightly outperforms in each training scenario over CNN. As reported in Table 6, it is believed that LSTM is successful in improving the traditional latent factor using PMF and modern deep learning using CNN. LSTM-PMF improved 15% on an average as compared to the traditional PMF model and improved by 0.71% on an average as compared to CNN-PMF.

Experiment Scenario on ML-1OM.
e dataset ML-10M was categorized as a large dataset from a scalability point of view. is category was quite extreme in its sparsity level, in which the density factor was 1.4%. e training result, as shown in Figures 17 to 25, showed that LSTM-PMF outperformed the traditional matrix factorization significantly; however, it lost CNN-PMF.
A summary of the evaluation training scenarios on ML-10M is shown in Figure 26. LSTM-PMF significantly succeeded in improving the effectiveness of the rating predictions as compared to PMF. However, the rating number was very sparse, with a density level of 1.4%.
A detailed comparison of ML-10M is presented in Table 7. It can be concluded that LSTM was successful in refining the effectiveness of the rating prediction, either in traditional or modern matrix factorization, by incorporating the deep learning classes based on CNN. LSTM-PMF achieved 10% on an average over the traditional PMF and performed 1.41% on an average over CNN-PMF. In this case, LSTM-PMF was more powerful and achieved significant performance in normal conditions over sparse rating levels such as 10/90 and 20/80, in which the performance achieved was similar to that of CNN-PMF. e achievement was quite significant when this model was applied to the 50 : 50 training ratio above. Compared to the ML-1M results, the performance of LSTM-PMF was more powerful, with a significant performance of 1.44% on average achieved over that of CNN-LSTM. e significant performance of LSTM-PMF over the traditional PMF was due to the document latent vector, which is a key factor for better achievement. Latent factor vector document representation in W supports the item latent model V to learn the correspondence between the item and users. e implementation of document latent representation also increased effects in the effectiveness of the training process, in which a smaller number of iterations are required to achieve convergence over the traditional PMF.

Conclusion
Sparse data issues caused due to a minimum rating are a major concern in the recommender system. In this research, we proposed a latent factor model using LSTM, word embedding, and PMF. LSTM and word embedding consider word sequential to interpret document understanding to capture the contextual insight of the product review documents. According to our experiment report, our model was superior over previous works. It was believed that the superior performance of LSTM-PMF was due to the impact of the contextual insight representation of the document in supporting the latent factors based on PMF in increasing the effectiveness in generating ratings. Moreover, the involvement of product documents using LSTM and GLOVE also achieved better efficiency in the training process and helped to achieve convergence in an overall training scenario. Contextual insight interpretability can be learnt through bidirectional encoder representation (BERT). Considering the bidirectional model to enhance contextual understanding of the document will possibly improve the matrix factorization performance in predicting the rating matrix. It will become challenging for future research work. PMF is a variant of the matrix factorization method. LSTM-PMF can be expanded by mixing other matrix factorization methods, for example, SVD, SVD++, and nonnegative matrix factorization, that only consider the rating factor. Combining LSTM-PMF with some of the approaches mentioned above can possibly boost the effectiveness of rating prediction with sparse data in large datasets.

Data Availability
Data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this work.