Mobile Localization Based on Received Signal Strength and Pearson’s Correlation Coefficient

an open


Introduction
As mobile devices spring up and with the corresponding improvements of wireless communication, localization in mobile networks has become one of the hottest topics in wireless and mobile computing research [1,2]. However, it is a key problem to acquire sufficient localization accuracy for location-based services (LBSs). According to the regulations, U.S. Enhanced 911 (E-911) adopted by the U.S. Federal Communications Commission, all emergency calls made by cellular phones have to be localized within an accuracy of 125 m in 67% of the cases [3]. GPS is favorable due to its high accuracy, but it is the most power-consuming positioning method. When the GPS receiver is turned on, current mobile phone's battery can last only a few hours.
Despite of its low accuracy, Cell-ID positioning is regarded as the basic positioning method in most cellular-communication systems [4,5]. It reports the identity or a geographical description of the cell to which the terminal is connected. Cell-ID finds the center of the associated (usually the nearest one) cell as its estimated position. Transmitted over the control channel, Cell-IDs are easy to be obtained. It can be used in any GSM devices without additional devices or prior statistical knowledge. Therefore, it is applicable in almost all situations when there is cellular coverage. Another advantage of Cell-ID method is that it owns short response time because the Cell-ID is generally stored in the mobile terminal together with other basic information related to the connection. Due to its simplicity and low cost, Cell-ID has become the most preferable way for mobile localization.
The main drawback of Cell-ID is that its accuracy depends on the center of cell area [6,7]. Conventional Cell-ID positioning method can only provide low accuracy because the cell size in GSM networks, especially in rural areas, is relatively large. This leads to different attempts to enhance the accuracy of the Cell-ID positioning method. For instance, timing advance value is adopted to reduce the cell size and improve the accuracy [8].
In this paper, an improved Cell-ID method is proposed for GSM localization. Not restricted to the serving BS Cell-ID only, the proposed method fully utilizes the information of all the seven cells. With received signal strength indication (RSSI) and Pearson's correlation coefficient (PCC), the proposed method can accurately estimate the mobile location. International Journal of Distributed Sensor Networks Furthermore, environmental interference and shadow fading can be restrained effectively with redundant information. Experimental results show that the proposed method can acquire higher accuracy than conventional Cell-ID and its enhanced version.
The contributions of this paper can be summarized as follows: (1) Compared with TOA, AOA, city-wide WiFi, and augmented sensor-based systems, our proposed method uses RSSI which requires no additional hardware. It is more implementable on common mobile devices. Furthermore, the proposed algorithm does not depend upon prior statistics, so it can be used in any place covered by GSM signal.
(2) We fully utilize the information of all available cells, including the serving cell and its six neighboring cells, to attain better accuracy in localization. By contrast, typical Cell-ID methods only use the information of serving cell.
(3) We use Pearson's Correlation Coefficient, instead of the Euclidean distance of the two vectors, as the evaluation function. It can provide better robustness.

Related Works
A broad spectrum of solutions, such as received signal strength (RSS), time of arrival (TOA) [9], time difference of arrival (TDOA) [10], and angle of arrival (AOA) [11], has been proposed to attain mobile localization by measuring the radio signal traveling between a mobile terminal and base stations (BSs) [12,13]. Some researchers have proposed a number of methods including fingerprinting and max-minbox [5]. These techniques, except for RSS, often depend on additional hardware and database, which means additional cost and more computational burden. For example, AOAbased methods always require an antenna array to identify the signal's angle, while TOA-based methods often need strict time synchronization. Furthermore, the cellular radio propagation often causes bad influence upon these methods. When obstacles exist in the propagation path of the signal, these methods will suffer from the non-line-of-sight (NLOS) and multipath propagation. However, due to the electromagnetic propagation properties, particularly in urban areas [14], NLOS errors are very likely to corrupt the original signal and increase the estimation error significantly [15]. Comparatively, fingerprint positioning can attain good performance, but it always requires time-consuming site survey [5] and cannot adapt to dynamic environment. As the fundamental positioning method of most cellularcommunication systems, Cell-ID has been extensively researched and implemented. For typical Cell-ID based positioning, the area under study is divided into several cells. Generally, the shape of cells is irregular and highly depends on the propagation environment. For classical Cell-ID approach, the smaller the cell sizes are, the better accuracy one can get from Cell-ID based localization [6]. Therefore, an investigation of cell sizes can give one a rough idea about the accuracy that can be obtained. In [8], the statistical modeling of user motion and the measurements are done via a hidden Markov model (HMM). The obtained results show smaller cells in the Pre-WiMAX network than in the GSM network. Hence, using the Cell-ID positioning in the Pre-WiMAX network will provide better accuracy than that in the GSM network. However, this might not be the case for the other parts of the world because Pre-WiMAX network only covers a part of countries.
In many places in the world, the density of cell towers is so small that the available cell tower information for localization is very limited. To enhance the accuracy of localization, probabilistic approach can be utilized. In [13], the signal strength history from only the associated cell tower is utilized to achieve accurate GSM localization. Compared to current RSSI-based GSM localization systems, the authors declared at least 156% enhancement in median error in rural areas and 68% in urban areas. To some extent, uncertainties in powerdistance mapping and dynamics of propagation models can have bad influence upon the performance of the positioning system. In [16], the authors also present a Cell-ID Aided Positioning System (CAPS), which leverages near-continuous mobility and the position history of a user. CAPS is designed based on the insight that users exhibit consistency in routes traveled and that Cell-ID transition points that the user experiences can uniquely identify position on a frequently traveled route. With a Cell-ID sequence matching technique, CAPS estimates the user's position based on the history of Cell-ID and GPS position. In [17], the authors propose the time-delay neural network to efficiently learn the mobile location from sequential received signal strength. By embedding the temporal structures of RSS into the spatial structures of networks, the proposed algorithms can extract location information from temporal variation of RSSs rather than removing them.
Generally, the positioning techniques based on RSSI can give a more precise estimation than Cell-ID [18]. It does not require directional antennas or extra time synchronization hardware. In fact, some enhanced Cell-ID algorithms exploit radio measurements to determine a distance to the terminal. Measurements of path loss or round-trip time (RTT) have been proposed. However, the path-loss measurement suffers from shadow-fading effects. In [7], the author proposes an algorithm that clusters all the reference points into several clusters and allows multiple reference points per region for mobile localization. Each cluster is tagged according to the detected set of neighbor cells, auxiliary connection information, and auxiliary measurements that are simultaneously performed with high-precision positioning. This method can produce areas, with a high prespecified confidence, of a size equal to 20%-50% of the original cell. It can also be viewed as a robust fingerprinting algorithm. Collecting realistic RSS data in the target area may also reduce the uncertainties, but it requires site survey which is time consuming. To develop a calibration-free RSS-based localization system, [19] proposes to utilize the pairwise information between base stations to localize the user based on multidimensional scaling. This approach further considers the geometric structure between base stations to compensate for distance estimation. Therefore, it can achieve better accuracy.
International Journal of Distributed Sensor Networks 3 The mobile station continuously measures signal strengths from both the serving cell and its neighbouring cells. Undoubtedly, more information is better for target positioning. In [20], a novel Cell-ID localization algorithm based on hidden semi-Markov model (HSMM) is proposed. All the Cell-IDs detected by mobile nodes are utilized. Furthermore, the positioning results are acquired by maximizing a posteriori estimation criterion via HSMM. The method can obtain superior positioning accuracy of 455 m compared with the classical Cell-ID approach on average. By evaluating measurements from each neighbouring site presented in the network measurement report (NMR), the original area obtained from the Cell-ID (and possibly time arrival) can be cropped down to a smaller one by removing the parts that are unlikely to enclose the terminal's location [21,22]. Absolute RSS values received from a base station change with time, but the relative RSS (RRSS) values which refer to the relations of the RSS values between different BSs are more stable. In [23,24], the authors propose Database Correlation Method (DCM) on the basis of a database of a premeasured RSS. Real test shows that the mean positioning accuracy is about 29 m in urban areas and velocity estimation is about 1 km/h in rural areas.
Some Cell-ID enhanced algorithms are proposed by utilizing the signal of neighboring cell perceived by GSM device. In [25], the authors considered the effect of shadow fading and obtain several propagation distance candidates between the MS and each BS. A Gaussian model was built to represent the RSSI of each BS. One of its disadvantages is that some parameters must be obtained according to empirical model decided by the environment around the user. Therefore, it cannot be widely used in different environments without modification. Another data-fusing algorithm to enhance Cell-ID (ECID) is proposed in [26]. This method is based on a standard parameter separation least-square algorithm by following the convergences of the gradient-descent algorithm to determine the MS location. It uses iterative algorithm which takes time in embedded system. Furthermore, the leastsquare algorithm only solves the problem of using different antennas but ignores the environmental change. Similar problems occur in most fingerprint-based or database-matching algorithms [27,28]. These algorithms only work with the databases which are built beforehand. Additionally, the database built for a particular model of the phones will not fit all the phones in the world.
Cell-ID localization accuracy may also be improved by further techniques, such as map-snapping, movement prediction, or combination with other technologies [29]. Some research also proposes the combination of GPS and GSM Cell-ID positioning, while the energy efficiency must be considered [30].

Problems in Triangulation.
With the RSSI and output power of base station (BS), it is possible to estimate the distance between the mobile device and BS. If the device can get its distance from three nearby cells or BSs, RSSI-based mobile location will turn into the well-known triangulation positioning problem, which is shown in Figure 1. This problem can be described as below: Here, ( , ), ∈ {1, 2, 3}, is the position of BS, , ∈ {1, 2, 3}, is the distance from the mobile station (MS) to BSs, and ( , ) is the position of the MS. Equation (1) has two unknown variables ( and ) and three equations. If can be acquired accurately, the solution of (1) will be the position of the MS. The problem is that the distance is calculated with RSSI. In the propagation, signal would suffer interference and shadow fading, and would not be the exact distance between BS and MS. At this case, (1) will evolve into where ( ), ∈ {1, 2, 3}, are the distance errors caused by the interference and shadow fading which are functions of time . Figure 2 shows the situation. Equation (2) may not have an analytical solution. Though it might be solved with numerical algorithms, for example, Newton algorithm iterations, the computational cost is too high for low-end embedded system.

Enhanced
Cell-ID Algorithms. From its neighboring cell, GSM devices can get such information as RSSI, Cell-ID number, and LAC number. The latter two factors can determine the position of the cell. Then the device can get the location and RSSI of the seven cells including six neighboring cells and one serving cell. These seven cells are shown in Figure 3.
With Bayesian estimation, we can form enhanced Cell-ID method [25]. Gaussian model is built to represent the RSSI of each BS: where is RSSI from the th BS, is distance between the th BS and ( , ), is the long-term median (in dB) at , and is the long-term median (in dB) at . The long-term median is calculated with an empirical formula with respect to , for example, the Okumura prediction method. By using Bayes' rule, the Probability Distribution Function (PDF) of the MS is estimated as being located at ( , ) when RSSI can be obtained as follows: where ( , ) is the PDF of the MS estimated as being located at ( , ), ( | , ) is the PDF of when the MS is located at ( , ), ( ) is the PDF of over all ( , ), and is constant for ( , )/ ( ) of th BS. Note that ( ) is independent of ( , ) and ( , ) is assumed to be uniformly distributed, so can be a constant. The PDF of device location given ⇀ can be obtained according to the following equation: Here, the maximum value of ( , | ⇀ ) is the estimated position.

Pearson's Correlation Coefficient Localization. In this paper, we propose a localization algorithm for GSM mobiles based on RSSI and Pearson's Correlation Coefficient (RPCC).
The main idea is to make good use of the seven-cell information, which is redundant to locate a two-dimensional position and minimize the influence of disturbance, barriers, or NLOS errors, for example, on performance.
GSM signal propagation and attenuation can be described by Okumura-Hata model [ where denotes frequency of the signal and ℎ denotes the height of the mobile station which is assumed to be 1 m. ℎ denotes the height of the BS. denotes the distance of propagation. denotes the attenuation. (ℎ ) denotes a correction function which is decided by the environment: For GSM localization, (frequency), ℎ (height of the mobile station), ℎ (height of the BS), and (ℎ ) are constants. Then, we can get such that where 0 denotes the RSSI in practice, 0 denotes transmitting power of the BS in practice, which is generally a timeinvariant constant, and 0 denotes a constant for different antenna gain. Set Then The RSSI in theory is = − (44.9 − 6.55 lg (ℎ 0 )) lg ( ) − 1 + , where denotes the RSSI in theory, denotes the transmit power of the BS in theory, denotes constant for different antennas, and ℎ 0 is a predicted value of ℎ . Set 2 = 44.9 − 6.55 lg(ℎ 0 ) and 2 = − 1 + . Then According to (12) and (14), we can get where = −( 1 / 2 ) and = 1 2 / 2 − 1 + 0 + .0 . Evidently, there is a linear relationship between and . Even if ℎ 0 is different from ℎ , the linear relationship still exists. Pearson's Correlation Coefficient (PCC) quantifies the liner relationship between vectors: The absolute value of PCC is less than 1. The closer it is to 1, the stronger is the linear relationship between the two vectors.
As shown in Figure 4, for any location ( 0 , 0 ) in cell number 1, the distance to seven cells is Serving cell If the location ( 0 , 0 ) is exactly the location of the device, there must be a linear relationship between ⃗ and ⃗ ; then ( ⃗ , ⃗ ) = 0. As a function of ( , ), we set = ( , ). Figure 6 shows the distribution of ( , ) calculated under the environment shown in Figure 5.
There are seven BSs in the area. Assuming that ℎ , ℎ , and are constants, the RSSIs can be calculated according to (11).
As shown in Figure 6, the closer to the device, the less ( , ). The location with the minimum ( , ) should be the real location of the device. The idea of RPCC is to find the location that makes ( , ) reach its minimum value.
The expectation of is zero, and the standard deviation is 8% of the mean value of . Figure 7 shows the results.
As shown in Figure 7, ECID works well because all parameters are set to adapt the environment. RPCC gives a similar estimation in this situation.
To observe the environmental influence, we changed some environmental parameters including the height of the BSs, the height of the MS, and the frequency of the signal. At the same time, the antenna gain is also changed. According to the simulation, the results of RPCC and ECID are shown in Figures 8 and 9, respectively.
As Figure 9 shows, the distribution of ( , ) is influenced by the parameter change. The location of minimum ( , ) value moved to (500, 0), about 220 m far from the location of the device. By contrast, the distribution of ( , ) is influenced but not as seriously as ( , ). The location of minimum ( , ) value changed to around (375, 150), about 90 m far from the location of the device. RPCC fits the environmental change better than ECID.

Statistics Test without Environmental Change.
To take a full test on performance of the proposed algorithm, we take a simulation in the area shown in Figure 10. In a cellular network, if the serving cell of the device is not on the edge of the network, the device must be in a serving cell surrounded by six neighboring cells. The situation in this simulation is an ordinary case in the real world.
The location of every device is calculated, respectively, according to the ECID and the RPCC. The RSSI in this simulation is calculated according to (12), regardless of the environmental change.
As Figure 11 shows, ECID and our proposed algorithm have similar performance. Both algorithms get the maximum errors of about 50 m. With the maximum error of about 100 m, the accuracy of Cell-ID is worse than ECID and the RPCC.

Statistics Test with Environmental Change.
Although ECID and RPCC perform well in the last test, the parameters of environment are different in different places. An algorithm can be widely used only if it can be adapted to different environments without any modification. In this test, we increased the gain of the antenna by 0.2 dB from 2 dB. Simultaneously, we added a random variation by −10∼+10 m to the height of BSs to emulate the different heights of BSs and increased the height of the MS by 3 m. We also changed the frequency from 900 MHz to 1800 MHz. Figure 12 shows the accuracy performances of Cell-ID, ECID, and RPCC in different environmental parameters.
According to Figure 12, Cell-ID does not suffer from the environmental change and shows the same performance in Figure 11. Suffering from the environmental parameters change, ECID loses accuracy significantly. The maximum error of ECID is over 150 m. The accuracy of RPCC and Cell-ID stayed the same with previous test. According to the result of this test, suffering environmental changes, the probability of error less than 50 m is about 50% for ECID and almost 100% for RPCC.

Implementation and Field Test
As a localization algorithm for mobile device, RPCC is easy to be implemented on a mobile device with MCU, a smart phone, for example. Figure 13 shows the flowchart of a program that implements RPCC on smart phone.
The details of subprocess to calculate the position using RPCC are shown in Figure 14.
To take the RPCC into a practical test, we implement it on a smart phone with embedded system. The smart phone can get the information, such as RSSI, LAC code, and Cell-ID code, of seven nearest cells including the serving cell and six

Cell information
Generate grid initial bingo node Calculate RSSI of next node according to (8)   neighboring cells. Using the information, the location of the seven cells can be acquired from some free cell localization service on Internet [32]. Then the location of the smart phone   can be calculated by its MCU using RPCC and Cell-ID. The reason of not using MSL is that MSL needs to know the transmit power of the base station (not necessary for RPCC or Cell-ID) which is easy to get in a simulation but hard in a practical test. Additionally, MSL is proved to be worse than Cell-ID in the simulation test and that means it is not necessary in the practical test.
We put the device in pocket and took a riding in the city. A set of data was collected and shown in Figure 15.
The green line is the route the device follows. The red line with square shows the error of RPCC, and the blue line with plus sign shows the error of Cell-ID. Obviously, the RPCC shows higher accuracy than Cell-ID in the practical test. The result of statistical analysis is shown in Figure 16.
The accuracy of RPCC and Cell-ID is lower than simulation, because the density of BSs in real test is lower than that in simulation. Additionally the distribution of the cells in practical is not as regular as it is in the simulation. The maximum error of RPCC is less than 550 m, and the probability of the error below 300 m is about 80%. The maximum error of Cell-ID is less than 650 m, and the probability of the error below 300 m is about 20%. The line of RPCC is above that of Cell-ID. This means RPCC is more accurate than Cell-ID.

Conclusion
The main idea of RPCC is to fuse the information of seven BSs to reduce the influence caused by the interference and shadow fading upon mobile localization. The proposed RPCC algorithm is compared with the other data-fusing algorithm, for example, ECID, which also uses the information of seven BSs. Both of them work well in the simulation test without environmental changes. But RPCC shows more immunity to the change of environmental parameters in the second simulation test which changes the parameters of the environment and antenna gain.
RPCC can estimate the position precisely without any additional devices or prior statistical knowledge. Since it does not rely on complex computing, it is easy to be implemented in most mobile devices. Compared with the Cell-ID, which is widely used in mobile devices, RPCC has a better performance in the simulation and practical test.