Improved DeepFM Recommendation Algorithm Incorporating Deep Feature Extraction

Ma, Mengxin; Wang, Guozhong; Fan, Tao

doi:10.3390/app122311992

Open AccessArticle

Improved DeepFM Recommendation Algorithm Incorporating Deep Feature Extraction

by

Mengxin Ma

,

Guozhong Wang

^* and

Tao Fan

School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(23), 11992; https://doi.org/10.3390/app122311992

Submission received: 17 October 2022 / Revised: 1 November 2022 / Accepted: 22 November 2022 / Published: 23 November 2022

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In recent years, deep learning has been applied to the field of recommendation, which can learn complex user interaction features and make better recommendations. However, deep learning only focuses on the interaction of high-order features and neglects the low-order features. The DeepFM model combines the linear FM (Factorization Machines) model and the deep DNN (Deep Neural Network) model to realize the interactive learning of low-order and high-order features, but it does not take into account that user interests will change dynamically with time. When the data sparsity is high, it cannot be effectively recommended. Based on this, an improved DeepFM recommendation algorithm that combines depth feature extraction was proposed, named fDeepFM. Firstly, the word features are transformed into low-dimensional dense vectors through the Embedding layer. Then Doc2Vec is combined to mine item features with context, and the two are stitched together as the input to the FM model and DNN model. Subsequently, user features are input to the GRU (Gated Cyclic Unit) model according to different cycles to mine user features. Finally, the results of the FM model, DNN model, and GRU model are combined by linear stitching as the overall output of the fDeepFM model. Experiments were carried out on Movielens-20M and Amazon data sets. The experimental results showed that MAE, RMSE, F1-score, and AUC on the Movielens-20M data set were optimized by 1.69%, 2.4%, 1.67%, and 2.28%, respectively; On the Amazon dataset, MAE, RMSE, F1-score, and AUC are optimized by 3.2%, 3.86%, 1.63%, and 2.2% respectively compared with DeepFM.

Keywords:

feature extraction; deep learning; DeepFM; vector representation; GRU

1. Introduction

With the development of the Internet, there are various service platforms on the network, such as shopping, music, news, and short videos. These platforms urgently need to find a way to help users quickly find items suitable for themselves. A recommendation system was born based on this situation, which can find the information that users are interested in from among the huge amount of data when their needs are not clear. The recommendation system solves the data explosion problem, and now personalized recommendation has become the trend of modern society.

In recommendation systems, the CTR (Click-Through Rate) prediction task is used to estimate the probability of users clicking on items, so it is important that recommendation algorithms improve the prediction accuracy of CTR. Rendle first proposed the FM model [1], a machine learning algorithm based on matrix decomposition to solve the problem of feature combination in a coefficient matrix. Juan added the field concept to the FM model and proposed the FFM (Field-aware Factorization Machines) model [2]. However, the weights of each cross feature in FFM are the same, and based on this, Xiao et al. introduced an attention mechanism to assign different weights to different cross features and proposed the AFM model [3]. Pan et al. proposed the FwFm model [4], a model similar to the FFM model in that it considers the interaction weights between different fields, but it assigns a uniform weight. These two models are all second-order feature intersection models, which are prone to the problem of combinatorial explosion. Hence He et al. added deep neural networks to the FM model to improve the expressiveness of the model and proposed the NFM model [5]. Tao et al. proposed the HoAFN model [6] because the AFM model cannot extract higher-order features. However, all the above methods are linear improvements based on the FM model, and the model results are still unsatisfactory. The PNN model [7] was proposed by the team at Shanghai Jiao Tong University at the ICDM conference. It can obtain the cross information between features in a more targeted way. Cheng et al. proposed the classical Wide & Deep model [8], which combines linear models with deep learning, but Wide partially relies on manual feature screening and cannot extract high-order and low-order features simultaneously. Guo et al. proposed the DeepFM model [9] based on the Wide & Deep model, which combines high-order and low-order features while alleviating the problem of insufficient low-order feature representation and easy overfitting of high-order features. It has attracted people’s attention. Wang et al. combined the advantages and features of the DeepFM model based on the FwFm model, considered the changes in the external environment and internal perception, and constructed the FG_DRFwFm model for learning the interaction of high- and low-order features of multiple features. Experiments showed better recommendation results [10]. Wang et al. introduced the attention mechanism into the DeepFM model and proposed the DIFMN model, which can better consider the diversity of user interests [11]. Inspired by the DeepFM model, Chen et al. proposed a DCFM model by combining Factorization Machine crossover networks and deep neural networks, combining user and commodity characteristics in many aspects [12]. Yu et al. improved the parameters of the DeepFM model and combined the sensors of loMT to form a healthcare system for disease prediction, which performed better in terms of accuracy and time [13].

The time factor is often an important factor in influencing the user’s choice, and the items preferred in the same period are likely to be the same type of items. The effective extraction of the item and user features has a great impact on the recommendation effect. However, the above algorithms and models do not effectively extract the features of items and do not take into account the change in users’ interests over time. To address these problems, this paper proposes an improved DeepFM recommendation algorithm incorporating deep feature extraction. The proposed method has the following contributions:

Word features are trained by the Embedding layer, text description features are processed by Doc2Vec to get feature vectors containing contextual semantics, and the two are stitched to item features, making the extracted features more detailed;
Introducing the GRU model to learn users’ changing interests using its supplementary memory mechanism;
Training FM models, DNN models, and GRU models in parallel. It not only learns the combination of high and low-order features but also considers the changing pattern of user interest over time.

This paper is structured as follows. Section 2 introduces the model of fDeepFM. Section 3 makes experiments to verify the effect of the fDeepFM model. Section 4 concludes the paper and discusses future work.

2. Improved DeepFM Recommendation Algorithm Incorporating Deep Feature Extraction

For the recommendation algorithm, the effective extraction of item features is an important factor affecting the recommendation effect. The user’s interest changes with time. Recommending what the user liked in the past often makes the recommendation meaningless, so the recommendation algorithm should also consider timeliness. However, data sparsity is a common problem for recommendation datasets. In general, the word vector is sparse after One-hot coding, especially for textual item descriptions, and One-hot only exacerbates the sparsity. Therefore, a new model is designed in this paper, named fDeepFM.

The algorithm model is shown in Figure 1. The model is divided into three parts, the FM model, the DNN model, and the GRU model. Firstly, item features are divided into word features and text description features. The word features are trained through the Embedding layer to alleviate the sparsity of data, the text description features are combined with Doc2Vec [14] to process the item description information, and the two are spliced to form new vector representation of the item as input to the FM model and DNN model. Secondly, the GRU model can analyze the characteristics of the long-term sequence to predict the user behavior at the next moment, so the user’s historical behavior records are trained according to different cycles through GRU [15] model to get the user’s feature representation. Finally, the three parts of the model are trained in parallel, a fully connected layer is added to the output features, and the final results are output after the activation function is applied. The model can improve the effect of feature learning and learn the feature changes of the user’s historical interest.

2.1. Distributed Vector Representation Incorporating Contextual Semantics

Data sets within the domain of recommendation algorithms are often sparse, and sparse data can lead to ineffective recommendations. Therefore, separate feature extraction for the text of the item description class and the words of the item name class can extract features in more detail. Then the two are stitched together to form a feature representation of the item.

The vectorized representation of item features used to use a bag-of-words model, such as One-hot coding, TF-IDF, and so on. But this model does not consider lexical and sequential issues. It considers each word independent, which leads to the loss of word order features and ignores semantic features. However, the word vector model is a model that considers word position relations, which can make up for the shortcomings of the bag-of-words model and better characterize words.

Doc2Vec is an unsupervised deep learning algorithm that can be trained without fixed sentence length and can learn from long texts to get fixed-length feature representations. And it takes into account the word order relationship of the context during training, so the word vector obtained from training will contain a semantic understanding of word order.

This algorithm uses jieba splitting to split the item text description

i_{c}

,

i_{c} = i_{c}^{1} + i_{c}^{2} + \dots + i_{c}^{m}

,

m

is the number of splitting words, then inputs to the PV-DM model of Doc2Vec for training, as shown in Figure 2.

The algorithm obtains the document vectors by training a neural network that predicts the probability distribution of words in a paragraph. It gives randomly sampled words from the passage and generates word vectors

W

for each word and document vectors

D

for each document, then trains the weights of the Softmax hidden layer and finally trains new samples of content descriptions using gradient descent until convergence. Finally, a low-dimensional vector representation

i_{d}

of item features is obtained.

For phrases such as item names,

i_{w}^{j} (1 \leq j \leq k)

are trained with word embeddings. The vectors after One-hot are sparse and large dimensional, so the Embedding layer needs to be trained to transform them into dense vectors. The concept of field in FFM is introduced here, where features of the same nature are grouped into the same field, the dimensionality of the input can be different for different fields, but the output should be the same to obtain the item feature representation

i_{w}

,

i_{w} = i_{w}^{1} + i_{w}^{2} + \dots i_{w}^{k}

,

k

is the number of features. The final item feature vector

i

obtained by stitching

i_{d}

and

i_{w}

.

2.2. DeepFM Model

The DeepFM model consists of two parallel neural networks, an optimization of the Wide & Deep model. It inherits the advantages of the Wide & Deep model and replaces the LR model of the Wide part with an FM model. The model structure is shown in Figure 3. It has a Factorization Machine (FM) part on the left and a Deep Neural Network (DNN) part on the right, forming end-to-end training and avoiding the work of manual feature extraction engineering. The FM part and the DNN part share feature vector parallel training and learn low-order feature crossover and high-order feature crossover simultaneously, which has good memory characteristics and generalization ability.

The FM part is used to extract the low-order features of the input data, which is based on the linear regression model to introduce cross-terms. It trains the weights of the combined features by learning a hidden weight vector for each feature. The inner product of the two vectors as a weight can not only alleviate the data sparsity problem but also effectively learn the feature combination. The calculation formula is as follows:

y (x) = ω_{0} + \sum_{i = 1}^{n} ω_{i} x_{i} + \sum_{i = 1}^{n} \sum_{j = i + 1}^{n} ω_{i j} x_{i} x_{j}

(1)

where

ω_{i j}

is the weight of the feature combination

x_{i} x_{j}

.

On the right side is the DNN (Deep Neural Networks) part, which forms end-to-end training and avoids the manual extraction of feature engineering. The DNN network is a feed-forward neural network for learning higher-order feature crossover, and in the fully connected layer, let the feature vectors be combined to form higher-order features.

The input of the DNN layer is

a^{(0)} = [e_{1}, e_{2}, \dots, e_{m}]

where

e_{i}

is the

i

th feature vector,

m

is the total number of features, and then

a^{(0)}

is fed back to the DNN, and the forward processing is as follows:

a^{(l + 1)} = σ (W^{(l)} a^{(l)} + b^{(l)})

(2)

where

σ

denotes the activation function,

a^{(l)}

denotes the output of the l-layer,

W^{(l)}

denotes the l-layer weight matrix, and

b^{(l)}

denotes the bias matrix.

The final output of the DNN model is shown in Equation (3)

y_{D N N} = σ (W^{|H| + 1} a^{H} + b^{|H| + 1})

(3)

where

H

denotes the number of network layers of the model.

The output of the final model is the sum of the two

y = s i g m o i d (y_{F M} + y_{D N N})

(4)

where

y_{F M}

is the output of the FM model part and

y_{D N N}

is the output of the DNN model.

2.3. Representation of User Characteristics Incorporating Temporal Factors

A basic user representation can be obtained based on the user’s historical interaction items, but the sequence of historical items that the user has interacted with is temporal in nature. In general, items that the user has recently interacted with have more influence on the user’s choice than items that the user has previously interacted with. The GRU model can be used to process and predict sequential data. It will influence the subsequent output based on the previous memory information, and there is no gradient disappearance. Therefore this paper uses the GRU model to train to get the vector representation of the user. The model is shown in Figure 4.

The overall representation of user characteristics can be obtained based on the historical items

v

that users have interacted with, defined as

v = \{v_{u}^{1}, v_{u}^{2}, \dots v_{u}^{n},\}

. Human behavior generally varies with weekly and monthly cycles; thus, a historical sequence of user interactions is divided according to weeks and months, then input into the GRU model separately, as shown in Equation (5)

v^{T} = v_{1}^{T} + v_{2}^{T} + \dots + v_{k}^{T}

(5)

where

v_{j}^{T}

denotes the set of item sequences that have interacted in the

j

th cycle with

T

as the period.

The GRU model obtains the state of the gating

r

and the gating

z

based on the state

h_{t - 1}

passed down from the previous node, and the input

x_{t}

from the current node. The gating

r

is the control reset gating. The gating

z

is the control update gating. The equations are as follows:

r = σ (W_{r} \cdot [h_{t - 1}, x_{t}])

(6)

z = σ (W_{z} \cdot [h_{t - 1}, x_{t}])

(7)

where

σ

is the sigmoid function that controls the data in the range of (0, 1) and

W

is the parameter where the closer the gating

z

is to 1 indicates the more data to stay.

After getting the gating signal, the reset gate is used to get the reset data

h_{t - 1}^{'} = h_{t - 1} ⊙ r

,

⊙

is the Hadamard product, Then

h_{t - 1}^{'}

and the input

x_{t}

are spliced, and deflate the data to the (−1, 1) by the function

\tan h

to get

h^{'}

.

h^{'} = \tan h (W_{h^{'}} \cdot [h_{t - 1}^{'}, x_{t}])

(8)

where

W_{h^{'}}

is the parameter to be learned. The final update of memory requires both forgetting and remembering steps. The formula is shown in Equation (9).

h_{t} = (1 - z) ⊙ h_{t - 1} + z ⊙ h^{'}

(9)

This step excludes some dimensions of the previous step and adds some of the current stage input.

(1 - z) ⊙ h_{t - 1}

means removing some unimportant information in the dimension

h_{t - 1}

,

z ⊙ h^{'}

means selectively retaining the information of the current node. The final obtained

h_{t}

is the user vector representation.

The user representation with a weekly cycle is

u_{T 1}

, the user feature representation with a monthly cycle is

u_{T 2}

, and finally the two are stitched to obtain the user feature representation

u

.

2.4. The Output of the fDeepFM Models

The output of the fDeepFM model is stitched together from the output of three parts. The formula is expressed as follows:

y_{f D e e p F M} = s i g m o i d (y_{F M} + y_{D N N} + y_{G R U})

(10)

After stitching, the output features are added to a fully connected layer, and the final result is output after the activation function. The model mines the relationship between users and items for recommendation. It has good memory properties as well as generalization ability and can effectively alleviate the problem of data sparsity.

3. Experimental Verification

3.1. Description of the Dataset

This subsection validates the proposed algorithm with experimental data from the Amazon product data and MovieLens-20M datasets. The Amazon dataset was collected between July 1995 and March 2013 and contains about 35 million datasets, including user information, item information, and review information. We conduct experiments on a subset named Electronics, which contains 192,403 users, 63,001 items, 801 categories, and 1,689,188 samples. User behaviors in this dataset are rich, with more than 5 reviews for each user and item. In this paper, the user’s rating of each item is considered positive. The MovieLens-20M evaluation dataset was collected between January 1995 and March 2015. This dataset contains 138,493 users for 27,278 movies, 20,000,263 ratings, and 465,564 tags. Scoring is based on a 5-point scale, we label the samples with a rating of 4 and 5 to be positive and the rest to be negative. Since this dataset does not have information such as movie summaries, thus this paper uses a web crawler to crawl movie content data from IMDB. The sources website of movies in this dataset, which include movie summaries, storylines, etc. Punctuation, special symbols, and some stopwords are removed from the text. For content text data, data pre-processing work such as word separation and word frequency analysis is performed using jieba to collect useful content information.

The user’s choice is mostly related to his gender, age, occupation, etc., and the rating level also indicates the user’s preference for the item. To make the classification effect more accurate when training the model, for the MovieLens-20M dataset, the selected user dimensions include user ID, user age, occupation information, user characteristics, and user rating. The movie dimensions are movie ID, movie summary, movie type, and storyline synopsis. Similarly, on the Amazon dataset, the selected user dimensions include user ID, user age, occupation information, user characteristics, and user rating. The product dimensions are product ID, product name, product details, and product price.

3.2. Evaluation Indicator

In this paper, the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) between the actual and predicted scores are used to measure the computational errors of different algorithms. The smaller the value of the evaluation index is, the smaller the prediction error and the higher the accuracy rate, and the formula is defined as follows:

R M S E = \sqrt{\frac{1}{m} \sum_{i = 1}^{m} {(y_{i} - {\hat{y}}_{i})}^{2}}

(11)

M A E = \frac{1}{m} \sum_{i = 1}^{m} |y_{i} - {\hat{y}}_{i}|

(12)

The F1-Score value is used to measure the recommendation effectiveness of different algorithms. The higher the value, the higher the recommendation accuracy. The F1-Score is calculated from Precision and Recall and is defined by the following formulas:

P r e c i s i o n = \frac{T P}{T P + F P}

(13)

R e c a l l = \frac{T P}{T P + F N}

(14)

F 1 - S c o r e = \frac{2 \cdot P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l}

(15)

Among them, TP (True Positive) means the positive class is a positive class. FN (False Negative) means missed report. A positive class is decided to be a false class. FP (False Positive) means false report, false class is decided as a positive class. TN (True Negative) means a false class is a false class.

AUC (Area Under Curve) is used to measure the sorting ability of the algorithm. AUC is the area of the ROC (Receiver Operating Characteristic) curve and the coordinate axis. The larger the value, the better the recommendation effect. The formula is defined as follows:

A U C = \frac{\sum I (P_{p}, P_{n})}{M * N}

(16)

\sum I (P_{p}, P_{n}) = \{\begin{matrix} 1, p_{p} > p_{n} \\ 0.5, p_{p} = p_{n} \\ 0, p_{p} < p_{n} \end{matrix}

(17)

where

M

denotes the number of positive samples,

N

denotes the number of negative samples, and

M * N

denotes the logarithm of samples.

The experiments randomly divide the dataset into a training set (80%), test set (10%), and validation set (10%) in the ratio of 8:1:1, using the training set to train the model. And use the experimental results of the algorithm on the validation set to evaluate the prediction effect of the algorithm. The vector length of Doc2Vec is 100 dimensions; the embedding dimension is set to 64. The number of feature dimensions of GRU is 5, and the number of neuronal nodes is 32. The number of hidden layers in DNN is 3, with 32 neuronal nodes in each layer, and the layer is fully connected and calculated without activation function. The Batch Size is set to 256 and 1024 on the two datasets, the learning rate is set to 0.001, the optimizer uses Adam, and the activation function is used when the epoch is set to 7. In the output layer, the activation function is ReLU, the loss function adopts using Logarithmic loss function, and to prevent over-fitting of the experimental model training, the L2 regularization factor is used.

3.3. Performance Comparison Experiments of Different Algorithms

This paper chooses to compare experiments with other similar FM family recommendation algorithms: AFM, NFM, ONN, DeepFM, and xDeepFM.

AFM: The AFM model is based on the FM model by introducing an attention mechanism to learn the importance of different second-order crossover features;

NFM: The NFM model uses a Bi-interaction Pooling structure for feature crossover learning as a way to improve the ability of FM to capture multi-order interaction information between features;

ONN [16]: the ONN model distinguishes between different Embedding layer operations on the same feature and proposes a new Embedding model;

xDeepFM [17]: xDeepFM proposes a new neural model called a compressed interaction network to replace the FM part in DeepFM.

On the MovieLens-20M dataset, the evaluation metric values of each algorithm are shown in Table 1, and the training time of models is shown in Figure 5. On the Amazon dataset, the evaluation metric values of each algorithm are shown in Table 2.

From Table 1 and Table 2, it can be seen that because the Amazon dataset is more sparse, all kinds of algorithms generally work better on the MovieLens-20M dataset. AFM introduces the attention mechanism but does not learn the higher-order features, so the results of each index are poor. ONN, DeepFM, xDeepFM, and fDeepFM all modify the Embedding layer to improve the model expression in the shallow layer. Then use DNN to learn higher-order features, which makes the algorithm increasingly more effective. The algorithm of fDeepFM works best because the item features are divided into word-like features and text description-like features for extraction separately. This makes the item feature content information integrate more emotion and contextual semantics and introduces the GRU model to learn the change of user interest, and over time the recommendation effect is improved. So the MAE value, RMSE value, F1-Score value, and AUC value are improved by 1.69%, 2.4%, 1.67%, and 2.28%, respectively, on the MovieLens-20M dataset compared with DeepFM; on the Amazon dataset, the MAE value, RMSE value, F1-Score, and AUC values were optimized by 3.2%, 3.86%, 1.63%, and 2.2%, respectively, compared to DeepFM.

From Figure 5, it can be seen that the AFM and NFM training time is short, but the AFM time is the shortest, indicating that the addition of an attention mechanism can shorten the training time. ONN takes the longest to train because this model sacrifices complexity to improve recommendation accuracy. The training time of DeepFM, xDeepFM, and fDeepFM increases sequentially, and the fDeepFM increases the training time compared with the DeepFM model, but its accuracy is improved.

3.4. Comparison Experiments of Algorithm Effects under Different Scales of Sparsity

To verify the performance of each algorithm under the different scales of data sparsity, in this paper, using MovieLens-20M datasets, the original data is divided into a training set and test set according to different ratios. The training set accounts for 80%, 70%, 60%, 50%, and 40%, respectively, and each type of algorithm is tested to obtain the value of RMSE. The test results are shown in Table 3.

The above table shows that the recommendation performance of each algorithm decreases as the percentage of the training set decreases. Among which, the AFM model has the worst effect because it is a shallow model and cannot alleviate the problem of data sparsity well. The other models have a better effect, which indicates that the addition of a neural network can alleviate the effect of data sparsity on the algorithm. The fDeepFM algorithm has the best effect, which indicates it can effectively alleviate the problem of data sparsity.

4. Conclusions

This paper proposes an improved DeepFM recommendation algorithm incorporating deep feature extraction to address the problem that the DeepFM model does not consider the effect of time on user selection and has low recommendation accuracy under sparse data. The model makes full use of the attributes of items and user features and mines the low-order features and high-order features for training. At the same time, it can also learn the changing pattern of user interest. Through comparison experiments with several algorithms of the same series on MovieLens-20M and Amazon datasets, this algorithm has the best results in four evaluation indexes, MAE, RMSE, F1-Score, and AUC. This proves that the proposed methods alleviate the data sparsity problem and also effectively improve the accuracy of the recommendation algorithm. The next step in the research is to study the application of deep learning algorithms in recommendation systems.

Author Contributions

Conceptualization, M.M.; Formal analysis, M.M.; Funding acquisition, G.W.; Methodology, M.M.; Writing—original draft, M.M.; Writing—review and editing, G.W. and T.F. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Key R & D Program of China (2019YFB1802700).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: [Amazon product data: https://snap.stanford.edu/data/web-Amazon.html, accessed on 16 October 2022; MovieLens-20M: https://files.grouplens.org/datasets/movielens/ml-20m.zip, accessed on 16 October 2022].

Conflicts of Interest

The authors declare no conflict of interest.

References

Rendle, S. Factorization Machines. In Proceedings of the 2010 IEEE International Conference on Data Mining, Sydney, NSW, Australia, 13–17 December 2010; pp. 995–1000. [Google Scholar]
Juan, Y.; Zhuang, Y.; Chin, W.; Lin, C. Field-aware Factorization Machines for CTR Prediction. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016; ACM: New York, NY, USA, 2016; pp. 43–50. [Google Scholar]
Xiao, J.; Ye, H.; He, X.; Zhang, H.; Wu, F.; Chua, T.S. Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks. arXiv 2017, arXiv:1708.04617. [Google Scholar]
Pan, J.; Xu, J.; Ruiz, A.L.; Zhao, W.; Pan, S.; Sun, Y.; Lu, Q. Field-weighted factorization machines for click-through rate prediction in display advertising. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018; pp. 1349–1357. [Google Scholar]
He, X.N.; Chua, T. Neural Factorization Machines for Sparse Predictive Analytics. arXiv 2017, arXiv:1708.05027. [Google Scholar]
Tao, Z.; Wang, X.; He, X.; Huang, X.; Chua, T.S. HoAFM: A High-order Attentive Factorization Machine for CTR Prediction. Inf. Process. Manag. 2020, 57, 102076. [Google Scholar] [CrossRef]
Qu, Y.; Cai, H.; Ren, K.; Zhang, W.; Yu, Y.; Wen, Y.; Wang, J. Product-based neural networks for user response prediction. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, Spain, 12–15 December 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1149–1154. [Google Scholar]
Cheng, H.T.; Koc, L.; Harmsen, J.; Shaked, T.; Chandra, T.; Aradhye, H.; Anderson, G.; Corrado, G.; Chai, W.; Ispir, M.; et al. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, New York, NY, USA, 15 September 2016; pp. 7–10. [Google Scholar]
Guo, H.; Tang, R.; Ye, Y.; Li, Z.; He, X. DeepFM: A factorization-machine based neural network for CTR prediction. arXiv 2017, arXiv:1703.04247. [Google Scholar]
Wang, S.W.; Ou, O.; Zhang, W.J.; Ouyang, F. Depth recommendation based on FG_DRFwFm model. Comput. Appl. Res. 2021, 38, 3030–3034. [Google Scholar] [CrossRef]
Wang, R.P.; Jia, Z.; Liu, C.; Chen, Z.W.; Li, T.R. Deep interest factor decomposition machine network based on DeepFM. Comput. Sci. 2021, 48, 226–232. [Google Scholar]
Chen, B.; Zhang, R.M.; Zhang, Q. DCFM: Hybrid recommendation model based on deep learning. Comput. Eng. Appl. 2021, 57, 150–155. [Google Scholar]
Yu, Z.; Amin, S.U.; Alhussein, M.; Lv, Z. Research on Disease Prediction Based on Improved DeepFM and IoMT. IEEE Access 2021, 9, 39043–39054. [Google Scholar] [CrossRef]
Le, Q.; Mikolov, T. Distributed Representations of Sentences and Documents. In Proceedings of the International Conference on MachineLlearning (ICML), Beijing, China, 21–26 June 2014; PMLR: New York, NY, USA, 2014; pp. 1188–1196. [Google Scholar]
Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
Yang, Y.; Xu, B.; Shen, S.; Shen, F.; Zhao, J. Operation-aware neural networks for user response prediction. Neural Netw. 2020, 121, 161–168. [Google Scholar] [CrossRef]
Lian, J.; Zhou, X.; Zhang, F.; Chen, Z.; Xie, X.; Sun, G. xDeepFM: Combining explicit and implicit feature interactions for recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1754–1763. [Google Scholar]

Figure 1. Algorithm structure diagram.

Figure 2. PV-DM model.

Figure 3. DeepFM model.

Figure 4. GRU model structure.

Figure 5. Comparison of training time.

Table 1. Numerical comparison of evaluation indexes of the algorithm in MovieLens-20M data set.

Algorithm	MAE	RMSE	F1-Score	AUC
AFM	0.7451	0.9442	0.8523	0.7861
NFM	0.7367	0.9395	0.8475	0.7794
ONN	0.7309	0.9321	0.8584	0.7843
DeepFM	0.7334	0.9354	0.8623	0.7950
xDeepFM	0.7256	0.9240	0.8738	0.8074
fDeepFM	0.7165	0.9114	0.8790	0.8178

Table 2. Numerical comparison of evaluation indexes of the algorithm in Amazon data set.

Algorithm	MAE	RMSE	F1-Score	AUC
AFM	1.5723	1.7452	0.6853	0.6165
NFM	1.5358	1.6736	0.6754	0.5952
ONN	1.5007	1.6513	0.6881	0.6193
DeepFM	1.5048	1.6519	0.7023	0.6221
xDeepFM	1.4921	1.6434	0.7148	0.6360
fDeepFM	1.4728	1.6133	0.7186	0.6441

Table 3. Comparison experiments of algorithm effects under a different scale of sparsity.

Algorithm	80%	70%	60%	50%	40%
AFM	0.9442	0.9573	0.9668	0.9831	1.1892
NFM	0.9395	0.9490	0.9592	0.9778	1.0323
ONN	0.9321	0.9489	0.9634	0.9790	1.0528
DeepFM	0.9354	0.9499	0.9684	0.9743	1.0044
xDeepFM	0.9240	0.9391	0.9430	0.9577	0.9733
fDeepFM	0.9114	0.9273	0.9322	0.9481	0.9694

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ma, M.; Wang, G.; Fan, T. Improved DeepFM Recommendation Algorithm Incorporating Deep Feature Extraction. Appl. Sci. 2022, 12, 11992. https://doi.org/10.3390/app122311992

AMA Style

Ma M, Wang G, Fan T. Improved DeepFM Recommendation Algorithm Incorporating Deep Feature Extraction. Applied Sciences. 2022; 12(23):11992. https://doi.org/10.3390/app122311992

Chicago/Turabian Style

Ma, Mengxin, Guozhong Wang, and Tao Fan. 2022. "Improved DeepFM Recommendation Algorithm Incorporating Deep Feature Extraction" Applied Sciences 12, no. 23: 11992. https://doi.org/10.3390/app122311992

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improved DeepFM Recommendation Algorithm Incorporating Deep Feature Extraction

Abstract

1. Introduction

2. Improved DeepFM Recommendation Algorithm Incorporating Deep Feature Extraction

2.1. Distributed Vector Representation Incorporating Contextual Semantics

2.2. DeepFM Model

2.3. Representation of User Characteristics Incorporating Temporal Factors

2.4. The Output of the fDeepFM Models

3. Experimental Verification

3.1. Description of the Dataset

3.2. Evaluation Indicator

3.3. Performance Comparison Experiments of Different Algorithms

3.4. Comparison Experiments of Algorithm Effects under Different Scales of Sparsity

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI