Evolutionary Optimization Algorithms for Sunlight-Based Positioning Sensor Networks

The sunlight intensity-based global positioning system (SGPS) is able to geolocate outdoor objects by means of the sunlight intensity detection. This paper presents the integration of SGPS into a sensor network in order to improve the overall accuracy using evolutionary algorithms. Another contribution of the paper is to theoretically solve both global and relative positioning of the sensors composing the network within the same framework without satellite-based GPS technology. Results show that this approach is promising and has potential to be improved further.


Introduction
Nowadays global positioning is taken for granted and even more since the arrival of smart phones. However, the unique acceptable solution so far is the global positioning system (GPS) based on satellites [1]. Although it demonstrates a good accuracy, the main drawbacks of this system are often ignored. First, it is government-dependent so there is not any guarantee that it will always be publicly available.
Second, GPS are vulnerable to solar storms. As a matter of fact, the US Government reported that the solar storm which occurred in early March 2012 affected satellites communications "http://www.gps.gov/news/2012/03/solarstorm/ (Sep., 2013). " Although the solar activity of the following years is not expected to be that intensive, this still represents a serious threat for the GPS. In fact, the US Department of Homeland Security has carried out a study of the risks to US critical infrastructure from global positioning system disruptions "http://www.gps.gov/news/2013/06/2013-06-NRE-publicsummary.pdf (Sep., 2013). " This serves as enough motivation to investigate satellite-independent global positioning systems.
Many different, satellite-independent approaches have been proposed: measuring sunlight intensity, irradiance, temperature, or magnetic field (see next section for a detailed state of the art). The accuracy of these systems is often hundreds of kilometers. Although none of them are yet a real substitute to satellite-based GPS, many applications could benefit from sunlight-based positioning methods: weather monitoring, ocean tides and wildlife tracking, and extraplanetary location. For example, current environmental research uses GPS-based wireless sensor networks to measure temperature, light, pollution, oceans temperature, tides, and so forth. However, the GPS limits the battery life so it is necessary to change the batteries every few days. This is especially challenging and expensive in wildlife and marine environments. The use of sunlight-based systems in sensor networks (or standalone intelligent sensors) would allow the system to run for months (or even years) without battery replacement.
In this paper, we focus on the sunlight-intensity based global positioning system (SGPS) presented in [2,3]. SGPS is a novel global positioning system based on sunlight intensity detection able to provide the earth coordinates of an static object, without any GPS-based component by measuring the sunlight intensity of one day, with the only input of the date. The main advantage of SGPS (and most of the sunlight-based systems) is its low power consumption. It can run for months of years with a set of regular batteries. Or even small solar cells could be used as both power source and light intensity sensor. On the other hand, these systems suffer from a low positioning accuracy, limiting their practical uses. Therefore, the research exposed in this paper aimed to enhance its accuracy by combining it with sensor networks. As this is the first approach to such problem, we focus on networks in which the distances among nodes are known [4]. This integration is able to solve three different problems using the same framework: (1) relative positioning of the nodes in the network: for the case of WSN, this is a very active research field during the last years [5,6]; (2) global positioning of the nodes in the earth: in the literature this is done by GPS-based systems, where at least one of the nodes of the network has an integrated GPS or cellular-based localization system [7]; (3) improvement of the positioning accuracy of the SGPS as a standalone system: the sensor network is built exclusively to improve the SGPS accuracy.
Although the tendency is to use wireless sensor networks (WSN), wired networks have been extensively applied in industry [8] and other fields [9,10]. Also, hybrid configurations have been already proposed [11]. Along this paper we are assuming any kind of sensor network since we are focusing on how to combine data from different nodes.
The proposed framework, identified with the acronym SGPSNet, includes a probabilistic model of the error of SGPS, which allows us to compensate it by combining the individual SGPS results of the different nodes with the geometry of the sensor network.
The paper is organized as follows: next section provides the state of the art of satellite-independent geopositioning. In Section 3 the SGPS is introduced. Section 4 analyzes the error of the SGPS. Then, the SGPSNet framework is detailed in Section 5. Experimental setup is described in Section 6 and results and discussion are in Section 7. Finally, the conclusions are presented in Section 8.

State of the Art
Satellite-independent geopositioning systems have been studied for several years. The results obtained in terms of positioning accuracy have been a challenge [12]. However, it has been successfully applied to some problems, such as wildlife migratory movements tracking [13]. In fact, many interesting improvements were suggested. Sunrise and sunset times can be identified more accurately by taking into account the temperature together with the light intensity [14] since the temperature signal is more stable. Kalman filters help to decrease the positioning error through the days [15] when the object to geolocate is moving. A simple motion model is given, namely, the maximum distance a specific animal could travel in one day. Therefore, results are corrected when they are in conflict with the motion model.
Ekstrom [16] propose a complex analysis of the twilight for a template-fitting based approach to irradiance data with an estimation of the error [17].
The influence of weather, topography, and vegetation on the light intensity and its measurement have been studied [18]. Therefore, by combining sunlight intensity sampling with other sensors (altitude, humidity, atmospheric pressure, etc.) the location accuracy could be improved.
Other approaches to GPS-less geolocation have been also proposed. For instance, a simple webcam can be used to estimate geolocation by analyzing picture brightness [19,20]. The magnetic field of the earth has been also tested as a mean to automatically geolocate objects [21,22].

Statement of Contributions.
A probabilistic error model for the SGPS is computed, which is a first in this field. Then, a methodology to combine SGPS solutions from different nodes within a sensor network taking into account an error model is proposed. This allows probabilistically improving the accuracy of the SGPS results for every node. To our knowledge, it is the first time a satellite-based geopositioning system is combined with a sensor network. State-of-the-art approaches are based on initial GPS measurements.

System Model
The SGPS is able to geolocate stationary outdoor objects (longitude and latitude coordinates) by measuring sunlight intensity. Since the basis of this system is deeply described in [3], only its outline is included in this paper.
The SGPS operation is presented in Figure 1. In the hardware side, the system is designed to be simple and inexpensive, hence comprising a light sensor and a microprocessor.

Sunlight-Based Global Positioning
System. The mathematical model of the SGPS relies upon a celestial model which takes into account the rotation and translation movements of the earth. The daylight parameters are influenced by the longitude and latitude coordinates of a given place as shown in Figure 2.
Hereafter, the following convention is going to be employed: times are measured in decimal hours, in the range International Journal of Distributed Sensor Networks Degrees are only shown in the results in order to help the reader in interpreting them. For the longitude the range is from −180 ∘ to 180 ∘ with the zero reference in the Greenwich meridian and positive coordinates to the east. In the case of the latitude, the range is from −90 ∘ to 90 ∘ where zero represents equator and positive coordinates represent north. From the sunlight intensity measurements for a given day , the sunrise and sunset times ( sr and ss , resp.) are identified (see Section 3.2). Therefore, the solar noon, md , for that day is simply If md < ss a fractional day occurs in which the sunset actually happened in the previous UTC day. In these cases, the computed md corresponds to the midnight time. To solve this problem, the real noon time md for a day is approximated as follows: where mod refers to the module operation. The next step is to compute the angular sunset ss , where is the fractional year, computed as Finally, the coordinates of the object can be obtained [24]. For the longitude , and the latitude can be computed by numerically solving the following equation: Note that these formulae are well known in astronomy. They are usually applied in a "forward" fashion, known as sunrise equation [24]. That is, given longitude and latitude, compute the sunrise and sunset times. In this case, we are solving the "inverse" problem, which is not trivially solved from the sunrise equation.
It is important to note that sensors usually work with UTC-referred civil times. However, SGPS celestial model is based on solar times. The conversion is straightforward:  (2 )) .

Experimental Results.
In order to test the system with real measurements, data from the National Oceanic and Atmospheric Administration (NOAA) public FTP server "ftp://aftp .cmdl.noaa.gov/data/radiation/surfrad (August, 2013)" has been employed. In this case, a simple zero-crossing algorithm is enough to accurately detect the sunrise and sunset times. Also, the sunlight intensity value chosen was determined by using data only from two different days. This ensures a worstcase scenario since more sophisticated methods could be employed. In fact, the sunlight intensity cannot be negative but, due to the hardware employed, inverse currents provoke measures to be negative during night-time. This is the critical point of the system in terms of accuracy. However, the objective of this study is to improve accuracy by other means, since a deeper study of detecting sunrise and sunset is very hardware-dependent. Figure 3 shows the results of the SGPS in a statistical way. They represent the current state of the art of sunlightbased geolocation. Note that these results contain data for almost every day during 10 years along 6 different stations. Although how different conditions affect the system has not been deeply evaluated, results show that they do not have a high impact on the error. How different conditions affect the system has not been evaluated. Clouds, for instance, affect the maximum sunlight value and therefore the rate of change of measures sunlight. However, at sunrise/sunset times there is still enough light to measure. Actually, results show that there were very few days in which the relative error is excessively high. Histograms for latitude and longitude relative errors are shown, computed as latitude % = latitude 180 100, longitude % = longitude 360 100.

SGPS Error Analysis and Modeling
One critical part of the system in terms of accuracy is the sunrise and sunset detection. A bias of several minutes can result in error of hundreds of kilometers. Since it is not possible to accurately predict the error introduced by sunrise and sunset times, the error of the system is modeled in probabilistic terms. Note that the error model (and hence the rest of the framework) is hardware-independent because of the SGPS mathematical model. For a given calibration, it is not possible to have a larger error in longitude than latitude since they are highly correlated. Only the parameters of the error model would vary by sensor calibration. In any case, we are showing a worst-case scenario in which results are shown with a trivial calibration.
If all the errors of the previous experiments are merged and referred to as the global reference frame instead of their local frame (corresponding station), they can be plotted as a dispersion chart, as shown in Figure 4. Analyzing this plot, it is possible to see that latitude error is much larger than longitude error. However, a dispersion pattern is observed: errors are concentrated about the (0, 0) point. The farther we get from the origin, the less errors are found. This suggests that the error can be modeled by a standard bidimensional Gaussian distribution: where is the means vector, representing the average value of the errors in both longitude and latitude, and Σ is the covariance matrix (2 × 2), symmetric and positive definite, giving the dispersion values in both axes. The variable is the vector composed of a pair longitude-latitude x = (long, lat). Both and Σ are computed using standard methods.
International Journal of Distributed Sensor Networks 5 The parameters of the fitting (covariance matrix and means vector) depend on the hardware and signal processing techniques used during the measurement and sunrise and sunset identification processes.

SGPS Network Integration
The error model described in the previous section suggests that using more than one sensor to measure may improve the accuracy. In this fashion, errors can be decreased by combining the distance among two or more sensors with the output of the SGPS algorithm for each node.
The algorithm described in the following paragraphs aims to be integrated within any kind of sensor network. This paper focuses on networks in which the distances among nodes are known or can be accurately computed. It is also assumed that the nodes of the network are measuring sunlight intensity throughout all the day. In order to be as general as possible, let us assume that the localization is not the main task of the network, but it is required (e.g., data geotagging). Therefore, SGPS algorithm will run "in the background. " Network nodes will independently sample sunlight intensity throughout the day, from 00 hours UTC until 23.99 hours UTC. Once the day has finished, every node will analyze the data to identify sunrise and sunset times. For this, we employed a zero-crossing algorithm as detailed in Section 3.2. Note that this can be done in an online fashion so that no data is required to be stored. Longitude and latitude are computed for every node according SGPS formulae described in Section 3. Then, the SGPS solutions are combined probabilistically with the objective of improving the location estimates.
In case of an infrastructure-less network, where there is a lack of a central controller, one of the nodes could temporally act as central node in order to carry out the SGPS solution combination. As it will be explained, the proposed optimization algorithm is only based on summations and multiplications (also the optimization method chosen), meaning it does not require high computational capacity. Besides, the algorithm does not require to be computed in real time. Therefore computational limitations are not a problem. Also, the optimization parameters could be highly optimized in order to reduce the number of computations required in order to save energy.

Formulation as an Optimization
Problem. The proposed approach formulates the problem as an optimization trying to minimize two factors: (1) distance error among sensors and (2) deviation of the candidate solution with respect to the initial SGPS location. Let us consider the scenario shown in Figure 5, composed of three nodes. "SGPS initial result" is the nonoptimized calculated position, while "candidate solution" is the optimized position.
Let us suppose a sensor network with nodes, with a real distance among nodes , of (x , x ) = (x , x ) for all , , which we assume to be known. The set of coordinates of the node of the sensor network is denoted as x = (long , lat ). Consequently, x , is the set of coordinates of the  initial solution of the SGPS for that node. x , is the candidate solution for every node . Intuitively, the farther a candidate solution goes from the initial SGPS solution, the less likely it is (given the proposed error model). Therefore, we will use the Mahalanobis distance (x , , x , ) to compute the distance between the initial SGPS solution and the current candidate solution for node . It computes the distance between two random variables subject to the same probability distribution: the probabilistic error model in our case, defined by ( , Σ) computed in Section 4. It is computed as follows: Since the error model is centered at x , , (x , , x , ) expresses in terms of distance how likely the location of x , is. The objective of the addition of the Mahalanobis distance is keeping the candidate solutions in the most probable places according to the error model.
On the other hand, let us define the distance error ( , ) as the comparison between the real distances between nodes , and the distances of the candidate solutions for those nodes: where (x , , x , ) means the Haversine distance between candidate solutions for two different nodes. The Haversine is defined as the shortest distance between two points on a sphere expressed in terms of longitudes and latitudes . Using the earth's parameters (distances in kilometers and angles in degrees), the Haversine distance is computed as follows: where ℎ is Finally, the following fitness function is defined: 6 International Journal of Distributed Sensor Networks Intuitively, this fitness function is a weighting (with as weighting factor) between the Mahalanobis distance among candidate locations and original SGPS locations and the total error of the distances nodes.
The magnitude orders of the components of the fitness function are different. The distance errors are usually around hundreds or thousands of kilometers. The Mahalanobis distance could reach such orders of magnitude, but once the optimal solution is being reached it is usually less than 1. Then, saturation is applied in order to ensure that outliers are not decisive when evaluating the fitness function. The saturation levels sat and sat are experimentally chosen because they enormously depend on the error model parameters. Once saturated, the fitness parameters are normalized to be between 0 and 1. Thus, sat .
Finally, the objective is to find the set of coordinates for all the nodes X = ⟨x ,1 , . . . , x , ⟩ that minimizes: Among all the existing optimization methods, the differential evolution (DE) algorithm [25] has been chosen. More specifically an implementation optimized to solve large-scale problems [26]. This choice is not critical in the performance, since a low computational time is not an objective. However, DE has proved to work efficiently in many applications [27] so it is robust enough to provide good results for the proposed approach.

Differential Evolution Algorithm Setup
In order to test the validity of the proposed optimization model, a subset of days among the NOAA database have been chosen, which are common for all the available stations, displayed in Figure 6. The SGPS algorithm was applied to every node individually and the aforementioned optimization was carried out. The test bench is composed by the application of the algorithm from 3 up to 6 NOAA stations.
In the DE algorithm, the latitude and longitude values for every node are introduced as variables to optimize. Therefore, there are 2 variables, where is the number of stations included. As there is no information about the initial conditions, the first population is created using a uniform distribution within [−90, 90] for latitude and [−180, 180] for longitude. Each candidate, a vector composed of 2 values ( latitudes and longitudes), is evaluated according the fitness function (18). Longitude coordinates are enforced to lie within −180 ∘ and 180 ∘ . However, the candidates which have latitude values out of range are assigned an infinite fitness value. For instance, a latitude of 91 ∘ is not equivalent to any other latitude, while a longitude of 181 ∘ is equivalent to −179 ∘ . The population size is set to 100 individuals, with a maximum of 1000 iterations. The DE step size (known as parameter ) is set to 0.7 while the crossover probability (CR) is set to 1. This increases the speed of the optimization for the proposed optimization problem. Finally, the DE strategy employed is DE/local-to-best/1/bin, which means that the mutation takes a random element and the best element of the previous population. This strategy attempts a balance between robustness and fast convergence.

Results and Discussion
In first place the Gaussian distribution for the error model is computed as described in Section 4, for the data belonging to the first two years available in the NOAA dataset (1995)(1996), totaling 2077 days. In this case the Gaussian is defined by the following parameters (given in degrees): The means of the distribution are slightly north-biased since the data available in the NOAA FTP are only from the United States.
In our experiments, we simulated the sensor network using a computer as a central node in order to perform computations. SGPS was independently applied to all NOAA stations, obtaining a set of longitude and latitude estimates. Then, the optimization was proposed in Section 5 for each available day. Therefore, we assumed perfect communication among nodes and the existence of a central node (which can be any of the network nodes).
As the optimization procedure is stochastic it is possible for the final results to be worse than the initial results since a lower fitness value does not assure a better final result. Although this would seem counterintuitive, no other variables are available in the optimization. The initial error is considered as the sum for all the nodes of distances between   the real locations of the nodes and the SGPS locations. The final error is defined as the sum of the distances between the enhanced locations and the real node locations. Table 1 shows the number of days in which the final results are improved. Figure 7 shows the mean of the initial and final errors plotted against the number of stations. As can be seen from Table 1 and Figure 7, the improvement ratio increases with the number of stations. On average, the final error is reduced compared to the initial error. This means that, in case of improvement the accuracy gain is more significant that the worsening in the rest of the cases. Also this improvement increases as more stations are used in the algorithm. In Figure 8 a geometric representation of the results for a specific day is shown. While the final position for station 4 is worsened, stations 3, 5, and 6 are improved. Stations 1 and 2 remained on the surroundings of their initial position.
In order to deeply analyze these results, let us divide the year into two different parts: central part, those days between the first equinox ( = 81) and the second equinox ( = 265), and lateral part, those days between the second equinox ( = 265) and the first equinox of the following year ( = 81).  In our case, the days in the central part of the year are favored due to the bias of the Gaussian error model. Table 2 and Figure 9 detail the same results as shown before, but only taking into account days of the central part of the year. The improvement ratios are higher. However, Figure 9 shows that the results in the worsened cases (mainly in the lateral part of the year) are almost the same as if no algorithm was applied, since the final error is near to that plotted in Figure 7. One possible solution is to fit two different Gaussian distributions, one for every part of the year. In any case the distribution chosen is enough to prove the validity of the proposed method.
Next, the results are analyzed from the point of view of the centroid of the sensor network. In this case, the initial error is measured as the distance from the centroid of all the independent SGPS solutions to the real centroid. The final error is then computed as the distance from the centroid of all the enhanced SGPS positions to the real centroid. In Figure 10 and Table 3 these errors are depicted for the whole year. As seen in the figure, the final error is much lower for the centroid computed with the enhanced SGPS locations. Intuitively the progression of the graph is expected to decrease linearly, as in Figures 7 and 9. However, a possible explanation is that the same centroid can be obtained from infinite combinations 8 International Journal of Distributed Sensor Networks   of positions, giving a more stochastic character to these results. Although the proposed method does not explicitly optimize the centroid error, it improves the centroid error really well, which can be worthy to explore in the future work. Table 3 shows the improvement ratio of the centroid distances. Comparing the results with those shown in Table 1, it turns out that the algorithm works better for optimizing the centroid of the stations.

Conclusion
Throughout this paper various novelties have been included. First, the error of the SGPS has been modeled as a probabilistic function allowing us to leverage this model in order to improve the accuracy. A novel method based on the application of the SGPS to sensor networks to geolocate the nodes both globally and locally is proposed but focusing on the improvement of the global positioning accuracy. This SGPSsensor network integration enhances the accuracy of the system by reducing the global positioning error of the SGPS that is also one of the main drawbacks of this system together with the refresh rate of the position. The proposed approach is modeled as an optimization problem. The accuracy is improved stochastically, but, thanks to the DE metaheuristic optimization method, an improvement is guaranteed for most of the cases.
Note that our proposed approach uses SGPS as an underlying system but any method capable of estimating global coordinates for an object fits the SGPSNet formulation. The only requirement is that the error model for those methods should be accurately modeled. Along this paper many assumptions were made. Namely, nodes were far away from each other and distances among them are known. Future work will focus on creating a more general methodology so that it can be applied to smaller networks (in terms of distances among nodes) in which the overlapping among probabilistic error functions is higher. Also, the application to WSN, in which the distances among nodes are unknown and have to be computed online by automatic methods, is one of the main points of the future work.