A Markov Chain Position Prediction Model Based on Multidimensional Correction

User location prediction in location-based social networks can predict the density of people flow well in terms of intelligent transportation, which can make corresponding adjustments in time to make traffic smooth, reduce fuel consumption, reduce greenhouse gas emissions, and help build a green cycle low-carbon transportation green system. -is paper proposes a Markov chain position prediction model based on multidimensional correction (MDC-MCM). Firstly, extract corresponding information from the user’s historical check-in position sequence as a position-position conversion map. Secondly, the influence of check-in period, space distance, and other factors on the position prediction is linearly weighted andmerged with the position prediction of the n-orderMarkov chain to constructMDC-MCM. Finally, we conduct a comprehensive performance evaluation ofMDC-MCM using the dataset collected from Brightkite. Experimental results show that compared with other advanced location prediction technologies, MDC-MCM achieves better location prediction results.


Introduction
With the development of the world's industrial economy, the rapid increase in population, and the unrestrained production and lifestyles, the world climate faces more serious problems. Greenhouse gas emissions are increasing, and the earth's ozone layer is suffering from unprecedented crises. Catastrophic climate changes have repeatedly appeared globally, which have seriously endangered the living environment and health and safety of human beings. e communication network and the positioning system are combined to form a new type of social network-locationbased social network [1]. In a location-based social network, people can share their location and location information at any time through communication devices, also known as sign-in. ese data can be used for user location prediction, friend relationship prediction, and personal behavior patterns [2][3][4]. e user's location prediction is of great use in intelligent transportation. It can predict the density of people flow and make corresponding adjustments in time to make traffic smooth [5], reduce fuel consumption, reduce greenhouse gas emissions, and help build a green cycle low-carbon transportation green system. In addition, it also plays an important role in smart cities and epidemiological communication research.
Currently, many methods of position prediction have emerged. Among them, Yuan et al. [6] explored the influence of time and space on location prediction in locationbased social networks. Ye et al. [7] used power law distribution to model spatial factors and combined user preferences and friend relationships to predict location. Cheng et al. [8] used a first-order Markov chain based on the influence of the most recently visited location on the next location and integrated the matrix decomposition method to predict the location. Based on the high-order influence of norder weighted Markov chain, Zhang and Chow [9] combined time and space with friend relationship and popularity factors for location prediction.
In this paper, we adopt the n-order Markov chain [10] and then consider the period of check-in, space distance, friend relationship, and popularity of check-in points and propose a Markov chain position prediction model based on multidimensional correction (MDC-MCM), which realizes the position prediction for LBSNs.
In short, our contribution to this research work has three aspects.
Firstly, we link user location prediction in locationbased social networks with intelligent transportation to help build a green, circular, low-carbon transportation green system. Secondly, Markov chain position prediction model based on multidimensional correction (MDC-MCM) comprehensively considers the check-in time period, spatial distance, friend relationship, and check-in point popularity.
e dimensions considered are more comprehensive. Finally, we evaluated the proposed location prediction method on the Brightkite dataset.
e experimental results show that our proposed location prediction method has better prediction performance compared with other methods. e rest of the paper is organized as follows. Section 2 describes the Markov chain position prediction model based on multidimensional correction in detail. In the third section, we will experiment with the proposed model in the Brightkite dataset to get the results and discuss further. Finally, in the fourth section, the conclusion is drawn and the future work arrangements are described. [11]. A data structure composed of a set of vertices and a set of relations between vertices defined as Graph � (V, E).

LLTG Diagram. Figure
Out Degree [11]. e number of edges associated with a vertex is called a degree. In a directed graph, a vertex is the end of the arc and the number of arcs starting from the vertex.
Location to Location Transition Graph (LLTG graph, Location to Location Transition Graph). is contains a series of vertices L and edges E � L 2 . Each vertex l i (l i ∈ L) represents a point of interest, and each vertex l i has an out-degree, denoted as O Count (l i ), and the transition frequency from l i to l j is denoted as T Count (l i , l j ). For example, in Figure 1, the out-degree of location node l 1 is 8, the out-degree of location node l 2 is 3, the out-degree of location node l 3 is 7, and the out-degree of location node l 4 is 0.
It can be seen from Figure 1 that the LLTG graph describes the transfer frequency from one location node to another location node and the outgoing degree of each node.
Transition probability represents the probability of one location node to another location node, and the transition probability from l i to l j is recorded as TP(l i ⟶ l j ). And, considering that the out-degree of the location node may be 0, we assume that the transition probability of the out-degree of the location node is 1:

n-Order Markov
Chain. Markov chain [12] is a sequence of random variables X 1 , X 2 , X 3 , and so on. e range of these variables, the set of all their possible values, is called the state space. If the state corresponding to time n is X n , then X n+1 is regarded as a function of X 1 , . . ., X n , also known as an n-order Markov chain [10], which has n-order memory. e matrix composed of transition probabilities is the transition probability matrix.
Assuming that user u has m location nodes and is now at time n and the location node is l n , the transition probability matrix is as follows: Among them, P ij � TP(l i ⟶ l j ). e probability distribution vector of the initial state is as follows: P 1 � [1, 0, . . . , 0]. en, the probability distribution vector of user u going to each location node at time n + 1 is as follows:

Time Zone When Signing in.
Studies have shown that the user's sign-in behavior largely meets the regularity of time [13]. erefore, analyzing the data from the perspective of time is essential to improve the accuracy of position prediction. We select the Brightkite dataset and make a map of the week distribution and hour distribution of user sign-in (Figures 2 and 3). From Figure 2, it is found that the proportion of check-in times varies periodically with the week. e number of check-ins from Monday to ursday is relatively even, the number of check-ins on Friday and Saturday has increased significantly, and the number of check-ins on Saturday is the highest, and the number of check-ins on Sunday and Monday to ursday is similar.
From Figure 3, it is found that the proportion of the number of check-ins changes periodically with the hour. From 0 : 00 in the morning, the number of user check-ins showed a downward trend, until the lowest peak of check-ins appeared at about 10 am. As the number of check-ins increased, the highest peak appeared at about 7 pm, after which the number of check-ins fluctuated within a small range. According to the law of change, a day is divided into three-time intervals: interval 1, interval 2, and interval 3. Let T � interval 1, interval 2, interval 3 { }, then corresponding time range is 0 : 00-10 : 00, 10 : 00-19 : 00, and 19 : 00-24 : 00.
Consider the week and time interval comprehensively to study user sign-in location prediction.
Define the probability P h u,l i of the user u checking in at location l i in the time interval h as Among them, h is the element in the previously defined time interval T, m is the size of the location set  erefore, the check-in probability P t u,l i of the user u at the location node l i on the t day of the week can also be obtained: Among them, M h t is the number of check-ins in interval h on the t day of the week and M t is the total number of check-ins on the t day of the week.
To simplify the calculation, the obtained probability is subjected to min-max normalization processing [14]: en, the probability distribution vector of user u going to each location node at time n + 1 is as follows: 2.4. Spatial Distance. Since the spatial distances of the two consecutive check-in points are different, it is necessary to estimate the distribution of the two consecutive check-in points with the spatial distance. e sampling data of the space are collected from the check-in set D as shown in the following: Among them, the Haversine distance formula [15] is as follows: where r is the radius of the Earth, about 6371 km. Assuming that the spatial distance d between two consecutive check-in points approximately obeys the power law distribution [16], the probability density formula of the power law distribution is as follows: According to the maximum likelihood estimation method [17], we can estimate from sample D e Brightkite dataset is selected to plot the probability density and the spatial distance of two consecutive check-in points, as shown in Figure 4.
In Figure 4, we find that the spatial distance between the probability density and two consecutive check-in points is very similar to the estimated power law distribution. It shows that our hypothesis is reasonable and effective. e spatial distance d of two consecutive check-in points can be regarded as obeying the power law distribution.
Assuming that user u has m location nodes and is now at time n and the location node is l n , the probability of going to each node is as follows: Among them, l i is the i-th sign-in point.
To simplify the calculation, the obtained probability is subjected to min-max normalization processing [14].
4 Complexity en, the probability distribution vector of user u going to each location node at time n + 1 is as follows: 2.5. Friendship. Based on the previous research [9], user sign-in points are related to friends. Different friends have different influences. In order to measure the influence of different friends, we have introduced the Jaccard coefficient to measure the similarity and difference between different friends.

Jaccard Coefficient.
Jaccard coefficient [18] is widely used in the field of information retrieval. It is often used as an index to measure the similarity of two objects, that is, to judge the probability that a certain characteristic is shared by two objects. Here, a certain characteristic is defined as the number of common friends, that is, the number of common friends owned by two user accounts for the sum of the number of friends owned by two users. e formula is as follows: Among them, Γ(i) is the set of neighbors of user node i and Γ(j) is the set of neighbors of user node j. e larger the Jaccard coefficient value, the higher the similarity between friends and the closer the relationship.
Assuming that user u has m location nodes and p friends and he is now at time n and his location node is l n , then the probability that a friend will influence user u's check-in at location node l i is as follows: Among them, T k (l i ) represents the check-in frequency of the k-th friend of the user u at the location node l i .
To simplify the calculation, the obtained influence probability is subjected to min-max normalization processing [14]. P t * Jaccard l i � F l i − min P Jaccard l 1 , P Jaccard l 2 , . . . , P Jaccard l m max P Jaccard l 1 , P Jaccard l 2 , . . . , P Jaccard l m − min P Jaccard l 1 , P Jaccard l 2 , . . . , P Jaccard l m , 1 ≤ i ≤ m. (17) en, the probability distribution vector of user u going to each location node at time n + 1 is as follows: Assuming that user u has m location nodes and is now at time n and the location node is l n , the probability of going to each node is as follows: Among them, T u (l i ) represents the historical check-in frequency of the user u at the check-in node l i .
To simplify the calculation, the obtained probability is subjected to min-max normalization processing [14]. Complexity 5 en, the probability distribution vector of user u going to each location node at time n + 1 is as follows: e linear weighted fusion of the various predicted probabilities that affect the next check-in position proposed above is used to obtain a Markov chain position prediction model based on multidimensional correction (MDC-MCM).
e probability distribution vector of each check-in point of user u at time n + 1 is as follows: Among them, μ 1 , μ 2 , μ 3 , and μ 4 are all correction coefficients.

Experiment
In this section, the proposed model is compared with the latest position prediction technology, and the accuracy and recall rates are obtained on the Brightkite dataset [2].

Brightkite Dataset.
e Brightkite dataset is a dataset based on user sign-in data in the LBSN sign-in website. e data format for check-in in Brightkite dataset is <userid, check-in time, latitude, longitude, locationid>. Brightkite is the second-largest sign-in site after Foursquare. e statistics of the dataset are shown in Table 1.
In Table 1, we need to preprocess the data in Table 1 to ensure the quantity and quality of the data. In the preprocessing, to prevent the sparse data from affecting the experimental results, users with less than ten check-ins and points of interest with a total of fewer than ten check-ins are filtered out. According to the check-in time, the check-in data are divided into training set and test set. And, the first 80% of the check-in data are used as the training set, and the last 20% of the check-in data are used as the test set. In the experiment, the training set adopts the Markov chain position prediction model based on multidimensional correction to predict the test data.

Evaluation Technology.
We will compare the Markov chain location prediction model (MDC-MCM) we built based on multidimensional corrections and previous location recommendation technologies, including the following:

STI.
is method considers time and space factors, independently predicts the user's preference for location nodes in each time interval, and users are more inclined to visit nearby points of interest [6]. USG. is method uses comprehensive location prediction model spatial factors according to a power law distribution and combines user preferences and friend relationships [7]. FMC. is method is based on a first-order Markov chain, which uses the influence of the most recently visited location on the next location and incorporates the matrix factorization method [8]. AMC. is method uses a sequence prediction algorithm based on an n-order weighted Markov chain, combined with a simple weight decay method, so that the recommendation results are more inclined to check-in to places that are closer [19]. LORE. is method uses a high-order sequential influence based on an n-order weighted Markov chain and combines time and space with friendly relations and popularity factors [9]. MDC-MCM. e MDC-MCM proposed in this paper is based on the high-order sequence influence of the n-order Markov chain and combines the check-in period, space distance, friend relationship, and checkin point popularity factors.

Performance Metrics.
To evaluate the performance of each method, we selected two metrics, precision [20] and recall [20] as follows: Among them, |U| is the number of users to be predicted, H u is the predicted hit number of user u, R u is the number of location prediction sequences of user u, and C u is the set of locations visited by user u in the test set.

Result.
e number of next positions (top-k) for each prediction is set from 1 to 20. Repeatedly adjust the correction coefficient in the training set, and finally get the current correction coefficient: μ 1 � 0.67, μ 2 � 0.84, μ 3 � 0.24, and μ 4 � 0.13. Better prediction results can be obtained in the test set, and draw precision and recall separately with other position prediction techniques curve. e results are shown in Figures 5 and 6.

Analysis.
Here, we analyze the experimental results. Figures 5 and 6, it can be observed that as the number of recommended check-in point top-k increases, the accuracy rate gradually decreases and the recall rate gradually increases. is is also in line with expectations. As the number of recommended check-in point top-k increases, if the location visited by the user is already in the recommended check-in point, it will change as the remaining recommended check-in points increase and the user will be at the recommended check-in point. e proportion of CM becomes lower, and the accuracy rate decreases; as the recommended check-in points increase, the more likely the place users visit is in the recommended check-in points, the greater the recall rate.  Figures 5 and 6, through the prediction curve of the FMC method and the prediction curve of the STI method, it can be found that the time factor plays an important role in the position prediction. rough the prediction curve of the STI method and the prediction curve of the USG method, it can be found that the friend relationship plays an important role in location prediction. rough the prediction curve of the USG method and the prediction curve of the AMC method, it can be found that spatial distance plays an important role in position prediction.

e Number of Check-In Points Recommended for Users Top-k. In
rough the prediction curve of the AMC method and the prediction curve of the LORE method, it can be found that the popularity of the check-in point plays an important role in the location prediction. MDC-MCM models the sequence influence based on the n-order Markov chain and considers the influence of check-in period, space distance, friend relationship, and check-in point popularity to ensure that MDC-MCM is superior to other location prediction algorithms. However, MDC-MCM uses an n-order Markov chain and has many correction parameters, which makes each run time very long; there are too many correction parameters, and parameter adjustment is cumbersome. In the future, we will consider using the community as a unit to make predictions and then make predictions in the community to reduce the workload of computer operations. In addition, consider deploying the model on a distributed computing platform, which greatly shortens the running time and makes it easier to adjust the correction parameters.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this paper.