Random Walk Based Location Prediction in Wireless Sensor Networks

With the development of wireless sensor network (WSN) technologies, WSNs have been applied in many areas. In all WSN technologies, localization is a crucial problem. Traditional localization approaches in WSNs mainly focus on calculating the current location of sensor nodes or mobile objects. In this paper, we study the problem of future location prediction in WSNs. We assume the location histories of mobile objects as a rating matrix and then use a random walk based social recommender algorithm to predict the future locations of mobile objects. Experiments show that the proposed algorithm has better prediction accuracy and can solve the rating matrix sparsity problem more effectively than related works.


Introduction
Recently, WSNs have been applied in many areas, such as environmental monitoring [1], target tracking [2], and intrusion detection [3]. However, in all of these areas, location is very important, for data without location in WSNs will be useless. Research of localization is a hot topic in WSN, and it includes two kinds, that is, localization of sensor nodes in the WSN itself and localization of mobile objects using a WSN. For localization of sensor nodes in the WSN itself, several anchor nodes with known location are selected, and locations of other sensor nodes can be calculated with predefined anchor nodes [4]. For localization of mobile objects using a WSN, all sensor nodes are assumed to be anchor nodes, and the location of mobile objects can be calculated with observed anchor nodes [5]. The most popular localization system of this kind is the global position system (GPS).
Given a WSN, where all sensor nodes are anchor nodes, the locations of mobile objects can be calculated with the WSN. Furthermore, if we save all location records of mobile objects, then we can predict future locations of mobile objects with the observed location histories and the WSN. In this paper, we study the problem of future location prediction in WSNs. Here, we assume a WSN as a connected network, where each sensor node has connections with all its neighbors. The reason is that if a mobile object visits location , it will probably visit 's neighborhood. In a WSN, each node records the number of visits of all mobile objects to itself. Hence, we have a network of sensor nodes and a rating matrix of sensor nodes on all mobile objects. The purpose of this paper is to predict which mobile object will probably visit the desired locations or sensor nodes in the future.

Related Works
If we assume each sensor node as a user and each mobile object as an item or a commodity, then the number of visits from a mobile object to a sensor can be assumed to a |User| × |Item| rate matrix and the sensor network becomes a |User| × |User| graph, and thus the location prediction problem can be transformed into a social recommender problem. In this section, we review related works on recommender systems.

Traditional Recommender Systems. As mentioned by
Huang et al. [6], collaborative filtering (CF) method is one of the most commonly used and successfully deployed methods in recommender systems. The CF recommender method can be classified into memory based and model based [3,4].
The memory-based CF method is heuristic, and it aims to find a neighborhood of similar users [7,8] or items [9][10][11] for recommendation. The user-based method predicts ratings based on ratings of neighborhood users, and the item-based method predicts ratings based on ratings of neighborhood items. Here, the similarity between users or items could be Pearson correlation coefficient [12], vector space similarity [13], cosine similarity [14], and so on.
In model-based CF method, one can build a model by the priori |User| × |Item| matrix and predict unknown User-Item ratings with the model [15,16]. The clustering models divide the matrix into several smaller clusters and use a neighborhood cluster to predict unknown user-item ratings [17,18]. If the user-item ratings belong to different categories, then classification models can be applied [19]. Here, the priori |User| × |Item| matrix, where each element belongs to a category, is assumed to be training data. With the training data, classification model can be used to predict unknown ratings.

Trust-Based Recommender Systems.
Traditional recommender systems have been well studied and developed in both academia and industry, but they all assume that users are independent and identically distributed and ignore the relationships between users. However, in practice, recommendations from different users should have different weights. For example, a preference or advice from a close friend should be more trustable than that of a stranger. Based on this idea, many researchers have recently begun to analyze trust-based recommender systems [25][26][27][28][29][30]. Trust-based recommender systems use both the rating matrix and the network of users to predict unknown ratings.
TidalTrust [25] applies the social network to find neighbors and aggregates their ratings weighted by the trusts between the source user and his neighbor. The searching of neighbors is based on breadth first search, which finds the nearest raters in the social network, and the trust between the source user and a rater is aggregating the trust value between source user's direct neighbors and the rater, weighted by the trust values between the source user and its direct neighbors. MoleTrust [26] is similar to TidalTrust, except that the computation of trust between the source user and a rater is based on direct neighbors of the raters.
Golbeck and Hendler [27] studied how trust information can be mined and integrated into social-rating systems. They defined trust as "trust in a person is a commitment to an action based on a belief that the future actions of that person will lead to a good outcome, " presented two algorithms for inferring trust relationships between individuals that are not directly connected in the network, and proposed a prototype email client-TrustMail, which could filter spam for an email system.
TrustWalker [28] is a combination of item-based CF and trust-based recommendation, and it considers both ratings of similar items by trustable friends and ratings of the exact item by further neighbors in the social network. The random walk model of TrustWalker is as follows. Starting from the source user , at step and at node V, if V has rated , then it returns to the rating V, ; with the probability Φ V, , , the random walk stops, randomly chooses an item rated by V, and returns to V, ; and with the probability 1 − Φ V, , , the random walk goes to a direct neighbor of V. If Φ V, , = 0, TrustWalker is a pure trust-based recommender system; if Φ V, , = 1, the random walk never starts, and thus TrustWalker becomes a traditional item-based CF recommender.
The works in [25][26][27][28] are all heuristic or memory based. However, model-based recommender methods, such as SocialTrustEnsemble [29] and SocialMF [30], can also be used in social recommender systems. Both Social-TrustEnsemble and SocialMF apply the matrix factorization method, and they factor both the |User| × |User| matrix and the |User| × |Item| matrix into two low-rank matrixes. While choosing the neighborhood users, although they both use direct neighbors of the source user, they have different weighting methods. SocialTrustEnsemble gives different weights to ratings from different neighborhood users, whereas SocialMF assumes that the weight of a rating recursively depends on the neighbor's neighbors.

Other Types of Recommendation.
Besides recommending items for users, the social recommender systems can also recommend users [31][32][33], communities [34,35], and tags [36,37], as well as all kinds of recommendations for a group of users [38][39][40]. For more details about social recommender systems, readers can refer to [5].

Problem Definition.
In this paper, we consider each sensor node as a user, each mobile object as an item, and the number of visits from a mobile object to a sensor as the rating of user (sensor node) on that item (mobile object), and then we have a |User| × |User| graph and a |User| × |Item| rating matrix.
Assuming a WSN with sensor nodes and mobile objects, the WSN network = ( , ) can be represented as an × adjacency matrix , Similarity between user nodes and by the random walk model or a transition probability matrix , where No is the set of nodes linked by outgoing edges of node and |No | = ∑ =1 , is the cardinality. Moreover, the rating matrix is × , and , = { , , if user has rated item null, otherwise.
If we consider as an unweighted graph, then the adjacency matrix of is , and the transition probability matrix of is , where |No | is the same as (2). Given a WSN with a network and a rating matrix , the purpose of the paper is to predict unknown ratingŝ, = ( , ) in for , = null, where is built with and . Table 1 lists all notations used in the paper.

Overall Recommender System.
While predicting an unknown rating of user on , our overall recommender system works as follows.
(1) It constructs a social-rating graph with the WSN graph and the rating matrix. Here, we merge the common nodes of users in the two graphs. Figure 1 is an example of a social-rating graph, and Table 2 is its rating matrix, where (1 ≤ ≤ 7) are users and (1 ≤ ≤ 5) are items.   (2) While predicting the ratinĝ, of user on , we compute the random walk based similarity, defined in Section 3.3, between user and the other users.
(3) Once we get the similarities between and other users, we can weigh the ratings of each user based on its similarity to . With the rating matrix, we design a social slope one algorithm to predict the ratinĝ, , and this is described in Section 3.4.

Random Walk Based Similarity.
While predicting the ratinĝ, of user on , we can compute the similarities between user and all other users, that is, − { }, with a random walk with restart model. The main idea of the model is as follows: (i) starting from the source node ; (ii) at each step and at user node : (a) with probability , return to ; (b) with probability 1− , continue the random walk: (1) firstly, randomly walk to a direct neighbor user of in ; (2) secondly, randomly walk to a direct neighbor item of in ; (3) thirdly, randomly walk to a direct neighbor user of in .

4
International Journal of Distributed Sensor Networks The fingerprint of a random walk is a chain of nodes, which is ⟨user, ⟨user, item, user⟩ ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ 1 , ⟨user, item, user⟩ ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ 2 , . . ., ⟨user, item, user⟩ ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⟩. After the th steps, the random walk would stop at a user node. For example, ⟨ 1 , ⟨ 3 , 3 , 5 ⟩, ⟨ 4 , 2 , 2 ⟩⟩ is a fingerprint on Figure 1 with = 2. We use the stable distributions, which are the probabilities of stopping at all user nodes, to represent the similarities between the source user and others and call them random walk based similarities. For a source user , the random walk based similarity vector is where the superscript is the transpose of a matrix, and is a personalized vector defined as follows: 3.4. The Social Slope One Algorithm. As the slope one algorithm is an effective CF algorithm in recommender systems, we apply the idea of slope one for predicting unknown ratings. The slope one algorithm tries to find slope one suggestions in the rating matrix and uses slope one to predict unknown ratings. However, in a social recommender system, the rating matrix is sparse, and the slope one algorithm cannot predict ratings for users who do not have any slope one suggestion. In order to solve the sparsity of the rating matrix, we propose a social slope one algorithm. While predicting a rating for user on , our proposed algorithm works as follows.

Predicting Ratings for New Users.
In social recommender systems, a new user is a person that has not rated any item yet, that is, No = Ø. In this situation, we can take suggestions for the specified item from other users. However, the suggestions from different users should have different weights. Here, we use the average of ratings, which other users have rated for the specified item, weighted by their similarities to the active user,̂, where ,V is the normalized similarity between and V, Taking Table 2, for example, 6 is a new user and̂6 ,4 is the average of 2,4 and 5,4 weighted by 6,2 and 6,5 , respectively, that is,̂6 ,4 = 6,2 ⋅ 2,4 + 6,5 ⋅ 5,4 .

Predicting Ratings for Non-Cold-Start Users.
While predicting the ratinĝ, of user on , if user is not a new user and the item is not a new item, then we try to construct Input: Ni and Ni Output: ⟨diff , dis ⟩ for 1 ≤ ≤ min ( Ni , Ni ) (1) For = 1 to min ( Ni , Ni ) do (2) Find ∈ Ni , ∈ Ni , such that min dis( , ); slope one suggestions from the rating matrix, so that the slope one algorithm can be used to predict unknown ratings. If all items that has rated have been rated only by itself, that is, Ni = { } for all ∈ No , then we can treat̂, as a new user problem, and use (8) to predict the rating. Taking user 1 and item 5 in Table 1, for example, No 1 = { 1 } and Otherwise, if Ni ̸ = { } for at least one ∈ No , then we can construct slope one suggestions from the rating matrix, so that the slope one algorithm can be used. For each item ∈ No , which has been rated by others besides the user , we construct slope one suggestions using Ni and Ni . While constructing the rating differences between items and , at each time, we select one user from Ni and one user from Ni and minimize the following distance metric: Equation (10) means that it is better to choose two ratings for items and from users similar to each other, and moreover, both of them are similar to the active user. If users and are the same user, then dis( , ) = 0 is the smallest, and thus the slope one suggestion between and is , − , and the weight is ( , + , )/2 = , . If users and are not the same user, then the slope one suggestion is , − , , and the weight is ( , + , )/2. For Ni and Ni , we can construct min(|Ni |, |Ni |) slope one suggestions and add them together weighted by the normalized distance, so the predicted rating , using the slope one algorithm is where the difference diff and its weight dis (1 ≤ ≤ min(|Ni |, |Ni |)) are calculated by Algorithm 1.

Complexity Analysis.
In the problem of location prediction, our approach include two steps, that is, computing nodes' similarities with the random walk model and computing recommender results with the social slope one algorithm. (6), is ( 2 + 2 ), where is the number of iterations.

Theorem 1. The computational complexity of the random walk based similarity, defined by formula
Proof. The computation of formula (6) is the process of iterations, and the number of iterations decides the precision of the results. Here, we assume the number of iterations is . At each iteration, there are three matrix-vector multiply operations, and their computational complexities are × , × , and × , respectively. So, the computational complexity of formula (6) is ( 2 + 2 ). Proof. We assume that the rating matrix contains nonnull elements, and then each item is rated by about / users and each user rates about / items. While constructing slope one suggestions from one item, the SocialDiff( / , / ) algorithm iterates / times (in line 1). At each iteration, the algorithm needs 2 (1 ≤ ≤ / ) time (from line 2 to line 4), that is, ( / ) ⋅ (( / ) + 1) ⋅ (2( / ) + 1)/6. So, the computational complexity of the social slope one algorithm is (1/3) ⋅ ( / ) ⋅ ( / ) 4 .
From Theorems 1 and 2, we can have the following corollary.

Experiments
In this section, we perform experiments on two datasets, report the experimental results, and compare the results with existing CF algorithms. As the acquirement of sensor network data is very difficult, we use two social network data instead.
However, both of them have similar characteristics with WSN data.

Datasets.
The datasets in our experiments are Epinions [26] and Flixster [30], and both of them contain a social network and a rating matrix. The Flixster dataset is a social network service, where users can rate movies, the relationships between users are bilateral, and the rating values are 10 discrete numbers in the range [0.5, 5] with step size 0.5. The Epinions dataset is also a social network service containing a social network and a rating matrix. However, the relationships in the Epinions dataset are unilateral, and the rating values are integers from 1 to 5. Basic statistics of the two datasets are in Table 3.

Metrics.
In the experiments, we treat the social network as known data and make 90% of the rating data as the training set and the remaining 10% as the test data. Our evaluation metrics are RMSE (root mean square error), precision, and coverage.
The RMSE of a model prediction with respect to the estimated variable , is defined as the square root of the mean squared error: where | test | is the cardinality of test set test . Precision (also called positive predictive value) is the fraction of retrieved instances that are relevant: where is the number of the highest rating items that we recommend to the active user and #hit is the number of recommended items which are real in the top-list. In our experiments, we choose = 5, that is, Precision@5.
Coverage is the ratio of successful recommendations in the whole test set: where #success is the number of test data which we can make a recommendation successfully.

Baseline Algorithms.
In our experiments, we compare the predicting results of different algorithms. Following is the description of labels that we use to denote each of these algorithms.
PearsonCF. It is user based CF algorithm with Pearson correlation coefficient as similarity measure [12].
SlopeOneCF. It is the slope one CF algorithm proposed in [16]. However, the algorithm does not take the social network into consideration and cannot deal with sparse matrix effectively.
TrustWalker [28]. This algorithm takes the social network into consideration and considers both ratings of similar items by trustable friends and ratings of the exact item by further distance users.
SocialMF [30]. It is model based matrix factorization algorithm considering both the social network and the rating matrix.

Results.
In this subsection, we present the results of our experiments, first for all users and then for cold start nodes, and finally evaluate the impact of number of neighbors on the results of the proposed SocialSlopeOne algorithm.

Performance on All Users.
We evaluate the performances (RMSE, precision, and coverage) of all algorithms on the whole test set, and the results are listed in Table 4. For better illustrating the results, we compare the results in Table 4 with Figures 2 and 3. As shown in Figure 2, for the Flixster dataset, SocialSlope-One has similar prediction error to SlopeOneCF, and both of them have lower prediction error than other algorithms; for the Epinions dataset, SocialSlopeOne has the lowest prediction error. In Figure 3, for precision, SocialMF is the best in the Flixster dataset, whereas SocialSlopeOne is the best in the Epinions dataset; for coverage, SocialSlopeOne is the best in both datasets. Moreover, SocialSlopeOne is the best for the sum of coverage and precision for all datasets. Hence, we can have that SocialSlopeOne has the best prediction accuracy and coverage in both datasets.

Performance on Cold Start Nodes.
We also evaluate the performances (RMSE, precision, and coverage) of all algorithms on cold start nodes, and the result is listed in Table 5.
In our experiments, we define cold start nodes as ratings for new items and new users. For better illustrating the results, we compare the results in Table 5 with Figures 4 and 5.   As shown in Figure 4, SocialSlopeOne has the lowest prediction error in both datasets for cold start nodes. In Figure 5, for precision, SocialSlopeOne and TrustWalker have similar precision, and they are better than others; for coverage, SocialSlopeOne is also the best in both datasets. Moreover, SocialSlopeOne is the best for the sum of coverage and precision for all datasets. Hence, we can conclude that SocialSlopeOne has the best prediction accuracy and coverage for cold start nodes in both datasets.

Impact of on the Results.
In the SocialSlopeOne algorithm, we need to construct min(|Ni |, |Ni |) slope one suggestions and distances from Ni and Ni . In large Ni and Ni , constructing all slope one suggestions will take a long time. However, small number of these suggestions, especially those from the same user, can approximate the results effectively. We set as the top list with small distances defined in (10) and use the top-slope one suggestions to approximate the results.
This top-method cannot be applied in new items, so we observe the RMSE along with for new users and all nodes. From Figure 6 we can see that, for both all nodes and new users, the RMSEs decrease with the rising of . For small ,  RMSEs decrease quickly, but as the increasing of , RMSEs decrease slower and slower, which means that we can use small to approximate the results for large Ni and Ni while taking suggestions from other users.

Conclusions
Traditional localization technologies focus on calculating the location of sensor nodes or mobile objects using WSNs. However, this paper studies the problem of future location prediction for mobile objects in WSNs. In addition to the WSN, we assume the location histories of all mobile objects as a rating matrix and transfer future location prediction problem to a social recommender problem. In social recommender systems, although current recommender algorithms could give recommender results, they have either low prediction accuracy or high computational complexity and cannot deal with sparse data efficiently. In this paper, we propose a random walk based similarity metric to find similar users and use slope one suggestions from similar users to construct recommender results. With our defined similarity metric, we can find more similar users, and this solves the matrix sparsity problem. In addition, we propose the social slope one recommender method, which has higher prediction accuracy and lower computational complexity. Experiments show that the proposed algorithm has better prediction accuracy and can solve the rating matrix sparsity problem more effectively than related works.