Research on location and similar comments in point-of-interest recommendation system for users

By combining deep learning algorithms and potential probabilistic topic models, this paper proposes a new modeling strategy for individual users and constructs an effective point of interest recommendation system. First, Doc2vec is used to vectorize the user’s comments, and the Bow model is used to match the similarity of the points of interest, and then the potential probability model is used to cluster the positions of the points of interest. Finally, relying on the user’s historical comment data as a basis, the Top-K ranking recommendation scheme is obtained. The experimental results show that for large-scale location-based social network data sets, the algorithm in this paper has an ideal final effect on users’ personalized recommendations. The recommendation model for user location information and similar comment points of interest proposed in this paper is more subjectively and objectively in line with user behavior needs. On the premise of ensuring the real-time performance of recommendation, the model proposed in this paper effectively improves the accuracy of recommendation compared with traditional point-of-interest recommendation algorithms.


Developing status
The personalized recommendation system recommends information and products of interest to users based on the user's points of interest and historical purchase behavior.
Literature [1] shows the influence of data sparsity on the experimental results. In the literature [2], the lack of data also has a negative impact on the experimental results. Literature [3] considers the relevant information contained in the comments and associates the comments with the content posted by users. Finally, through collaborative filtering for recommendation, a new recommendation model is proposed to improve the accuracy of recommendation. Literature [4] aims to solve the sparseness of data, and establish a new recommendation model through collaborative training and combining relevant recommendation content. Literature [5] improves the recommended accuracy based on the stacked noise reduction autoencoder.
This paper proposes the LSCRS model (Location and Similar Comments Point-of-intersest Recommendation System for Users). In this model, first, the user comment data is vectorized. The calculation of similarity of comments is realized, so that users' potential points of interest can be discovered. Next, this article uses the potential probability model to find the potential region with the highest probability for each point of interest. Finally, the area to which the user belongs is obtained through the user's current location information. Use the potential interest points of the user obtained in the first step to find similar interest points in the area. This paper uses the bag-of-words model to get the similarity between different points of interest. Sort the interest points in the recommendation list according to the similarity, remove the top K recommended content, and finally get the Top-K recommendation.

LSCRS method
The frame of LSCRS Method is shown in Figure 1:

Regional division of space projects
In reality, users may visit multiple different points of interest in a certain geographic area. Therefore, this article divides the geographic space into R regions, and uses multiple distributions θu on geographic spatial locations to simulate the distribution of users in R regions (as shown in Figure 2). Given hyperparameters γ, τ and observable data v, lv. The purpose of this article is to calculate the potential variables θ, φ. For each user's potential area r, the posterior distribution is used instead of expressing it as a parameter to be estimated. Since this posterior distribution is difficult to calculate directly, this article uses the Marco chain Monte Carlo (MCMC) method for sampling.
Sample the region r according to the following posterior probability: p(r|r ¬u , v) ∝ n u,r ¬u,v + γ r ∑ (n u,r , ¬u,v + γ r ,) r , Among it, n u,r is the number of times the user u has visited the area r, and n u,r is the number of times the interest point v is generated from the area r.
For simplicity and convenience, this article sets the hyperparameters as γ=50/R and τ=0.01. Then in each iteration. Use equations (2) and (3) to update the parameters of the Gaussian distribution. A total of about 3000 iterations are required. The parameters θ and φ are estimated as follows: θ u,r = n u,r +γ r ∑ (n u,r ,+γ r ,) r , After the iteration is completed, a Gaussian distribution is applied to each region r and the position of the interest point v is represented by lv~Ν(μr, Σr) as follows (3): Through the processing of implicit semantic model, finally, each point of interest is divided into the area with the greatest probability to which it belongs.

Analysis of review similarity
Through statistical analysis of user comment data in the recommendation system, it is found that the comment data is mainly long text. Usually short text is handled very differently from long text. Therefore, this article uses sentence vectors (Doc2vec). It includes two variants of the distributed memory model of sentence vectors (PV-DM) and the distributed bag-of-words model of sentence vectors (PV-DBOW). This article uses PV-DBOW. For a given sentence, w1, w2,..., wT represent the words in the sentence, and D represents the sentence. The objective function g(w) of the PV-DBOW model is to maximize the average logarithmic probability. g(w) = 1 T ∑ ∑ log(w t+j |w t , D j ) −c≤j≤c,j≠0 T t=1

(5)
In formula (5), c is the number of training texts. The larger the value of c, the higher the accuracy of the model may be. The PV-DBOW model uses the hierarchical softmax function to define p (wt+j|wt, DJ).
Using PV-DBOW training sentence vector model, each comment text is represented as a Kdimensional real value vector. Using the calculation of similarity, the calculation method of similarity between comment sentences can be realized.

Similarity analysis of space items
In the data set used in this article, most of the content descriptions of points of interest are based on tags, that is, they are composed of multiple nouns and are disordered. According to the characteristics of this data, this paper adopts the Bag-of-words model (Bow model).
The formula of cosine similarity is shown in formula (6): In formula (6), a and b represent two high-dimensional vectors, and x i and y i represent the value of each dimension in phase a and b, respectively.

Experimental results and analysis
In this section, first explain the analysis method of the model, the algorithm for comparison and the setting of the experiment, and then show the results of the experiment.

Dataset description
This experiment uses Yelp's real-world data set. The details of this data set are shown in Table 1.

Algorithm comparison and analysis methods of this model
This article compares LSCRS model with UPS-CF algorithm [7] and CKNN algorithm [8]. In order to fully evaluate the performance of the model, follow the method framework proposed in [6] [9][10] to calculate Accuracy@k. If the interest point v appears in the result of top-k, define the value of hit@k for a single test sample as 1, otherwise it is 0.
For the overall Accuracy@k, use all test examples to define: Accuracy@k = #ℎ @ | | Among it, #hit@k represents the number of hit@k in all test samples, and |D_test | represents the number of all test samples.

Accuracy analysis
In order to show the improvement of the accuracy of the experiment in this article. The comparison result of LSCRS algorithm with CKNN and UPSCF is shown in figure3: Figure 3. Comparison of accuracy between LSCRS algorithm and CKNN and UPSCF recommended algorithms. It can be seen from the above table that the recommended accuracy of the LSCRS algorithm is greatly improved. It is precisely because of the introduction of geographic location parameters that the recommendation system makes it easier for the recommendation system to find the place that the current user wants to go. Reducing the geographic scope makes the recommendation more precise. This article uses user comment data as the user's historical behavior, which increases the diversity of recommended content. It is not limited to making recommendations to users based on the content of points of interest.
For the LSCRS algorithm, this paper selects users with different numbers of comments for testing, showing the accuracy of the final recommendation results for different categories of users. As shown in Figure 4: It can be seen from figure4 that for different TOP-K, the more historical comments of each user in the LSCRS algorithm, the higher the accuracy of recommendation. This shows that the more sufficient the user's historical record, the more basis for reference, and the more accurate the user's interest can be captured.
In order to observe the clustering of regions, this article divides regions into different numbers of categories to observe the experimental results, as shown in Figure 5: It can be observed from figure5 that as the number of regions increases, the accuracy decreases. The best parameter setting recommended by points of interest is R=10. This shows that R represents the complexity of the model. When R is too small, the model's ability to describe data is limited. When R exceeds a certain threshold, the model has enough expressive power to process the data. Therefore, further increasing the value of R has little effect on the model. Similarly, the increase of R will increase the time cost of training, so this article sets the value of R to 10.

Conclusions
Since personalized recommendation based on individuals has become a research hotspot today, this paper proposes a new LSCRS model. Firstly, Doc2vec is used to vectorize user comments. Secondly, the Bow model is used to match the similarity of spatial items, and the potential probability model is used to divide the geographical area of interest points, to find points of interest that match the user's potential interests through the user's historical comment data, determine the user's area through the user's location, and finally find the points of interest that the user may visit in this area, and then get Top-K recommended items.