Time Synchronization in Wireless Sensor Networks Using Max and Average Consensus Protocol

This paper proposes a novel distributed time synchronization scheme for wireless sensor networks, which uses max consensus to compensate for clock drift and average consensus to compensate for clock offset. The main idea is to achieve a global synchronization just using local information. The proposed protocol has the advantage of being totally distributed, asynchronous, and robust to packet drop and sensor node failure. Finally, the protocol has been implemented in MATLAB. Through several simulations, we can see that this protocol can reduce clock error to ±10 ticks, adapt to dynamic topology, and be suitable to large-scale applications.


Introduction
As in all distributed systems, time synchronization is very important in wireless sensor networks (WSNs) since the design of many protocols and implementation of applications require precise time, for example, forming an energy-efficient radio schedule, conducting in-network processing (data fusion, data suppression, data reduction, etc.), distributing an acoustic beamforming array, performing acoustic ranging (i.e., measuring the time of flight of sound), logging causal events during system debugging, and querying a distributed database.
Time synchronization is a research area with a very long history. Various mechanisms and algorithms have been proposed and extensively used over the past few decades. However, several unique characteristics of WSNs often preclude the use of the existing synchronization techniques in this domain. First, since the amount of energy available to batterypowered sensors is quite limited, time synchronization must be implemented in an energy-efficient way. Second, some messages need to be exchanged for achieving synchronization while limited bandwidth of wireless communication discourages frequent message exchanges among sensor nodes. Third, the small size of a sensor node imposes restrictions on computational power and storage space. Therefore, traditional synchronization schemes such as network time protocol (NTP) and global positioning system (GPS) are not suitable for WSNs because of complexity and energy issues, cost efficiency, limited size, and so on.
In the context of WSNs, time synchronization refers to the problem of synchronizing clocks across a set of sensor nodes that are connected to one another over a single-hop or multihop wireless networks. To achieve time synchronization in wireless sensor networks, we have to face the following four challenges.

Nondeterministic Delays.
There are many sources of message delivery delays. Kopetz and Ochsenreiter [1] describe the components of message latency, which they call the Reading Error, as being comprised of 4 distinct components plus the local granularity of the nodes clocks. Their work was later expanded by [2] to include transmission and reception time. The most nondeterministic delay is called Access Time, which is incurred in the MAC layer waiting for access to the transmit channel, its orders of magnitude is larger than the synchronization precision required by the network.

Clock Drift.
Manufacturers of crystal oscillators specify a tolerance value in parts per million (PPM) relative the nominal frequency at 25 ∘ C, which determines the maximum amount that the skew rate will deviate from 1. For the nodes used in WSNs, the tolerance value is typically in the order of 5 to 20 PPM. If no drift compensation applied, two synchronized nodes will be out of step soon.

Robustness.
Since sensor networks are often left unattended for long periods of time in possibly hostile environments, synchronization schemes should be robust against link and node failures. Mobile nodes can also disrupt routing schemes, and network partitioning may occur.

Convergence Speed.
Nodes in wireless sensor networks always distribute in large scales, one node may get in touch with another by many hops. This increases the difficulty in reducing the convergence speed in time synchronization algorithm design.
Up to now, many protocols have been designed to address this problem. These protocols all have some basic features in common: a simple connectionless messaging protocol, exchange of clock information among nodes, mitigating the effect of nondeterministic factors in message delivery, and processing utilizing different schemes and algorithms, respectively. They can be classified into two types: centralized synchronization and distributed synchronization.
Centralized synchronization protocol, such as RBS [4], TPSN [2], and FTSP [3], usually has fast convergence speed and little synchronization error. This kind of protocol needs a physical node acting as the whole network's reference clock, so it has to divide the nodes into different roles, for example, client node and beacon node in RBS. If the node with the special role, such as beacon node in RBS, is out of work, the protocol will suffer from big damage. To deal with the WSNs' dynamic topology, centralized synchronization protocol is often designed with complexity logic. Another disadvantage of centralized synchronization protocol is that synchronization error grows with the increase of network hops.
Distributed synchronization protocol, such as TDP [5]/GCS [6]/RFA [7]/ATS [8]/CCS [9], can use local information to achieve the whole network synchronization. This kind of protocol can easily adapt to WSNs' dynamic topology property with lite computation. Currently, the disadvantage of distributed synchronization protocol is that the convergence speed may be a bit slow, relating to the network topology. This paper describes a new distributed protocol for time synchronization in wireless sensor networks called time synchronization using max and average consensus protocol (TSMA). We adapt a number of techniques to take up the challenges time synchronization has in WSNs. To eliminate the nondeterministic delays, we make use of MAC layer timestamp technique. To compensate for the clock drift, we adapt max consensus protocol, and we use average consensus protocol to compensate for the clock offset. This protocol has the advantages of being computationally light, scalable, asynchronous, robust to node and link failure, and it does not require a master or controlling node. The rest of the paper is organized as follows. Section 2 summarizes the related work. Section 3 introduces some mathematical tools and definitions that will be instrumental for the proof of convergence of the proposed TSMA algorithm. Section 4 introduces a model for the clock dynamics and formally defines the synchronization objectives, while Section 5 presents the TSMA algorithm in details. This is followed by MATLAB simulations in Section 6. Finally, Section 7 briefly summarizes the results obtained and proposes potential research directions.
TPSN [2] is designed as a flexible extension of NTP [10] for use in wireless sensor networks, which consists of two phases: a level discovery phase and a synchronization phase. The level discovery phase organizes the whole network into a hierarchical tree topology with the master node at its root, then perform the pair wise synchronization along the branches of the hierarchical structure using the classical sender-receiver synchronization handshake exchange in the synchronization phase. TPSN is a centralized synchronization protocol, utilizes MAC layer timestamping to reduce message delivery nondeterminism and improve synchronization accuracy, its convergence speed increases linearly with the max network hops. This approach suffers from two limitations. The first limitation arises because if the root node or parent node dies, then a new root election or parent-discovery procedure needs to be initiated, thus adding substantial overhead to the code and long periods of network desynchronization. The second limitation is due to the fact that geographically closed nodes might be far in terms of the tree distance, which is directly related to increase the clocks error locally.
RBS [4] aims to provide synchronization amongst a set of client nodes located within the single-hop broadcast range of a beacon node. Compared to the traditional protocols working on an LAN, its main contribution is to directly remove two of the largest sources of nondeterminism involved in message transmission, transmission time, and access time, by exploiting the concept of a time critical path, which is the path of a message that contributes to nondeterministic synchronization errors. This centralized protocol uses least squares linear regression to compensate for the clock drift, its convergence speed also increases linearly with the max network hops. Using the beacon node makes this protocol vulnerable to node failure and node mobility.
International Journal of Distributed Sensor Networks 3 FTSP [3] is an ad-hoc, multihop time synchronization protocol for WSNs. It achieves high accuracy by utilizing timestamping of radio messages in low layers of radio stack, completely eliminating access time to the radio channel required in CSMA MAC protocols. Further accuracy is achieved by compensating for clock skews of participating nodes via linear regression. Several mechanisms are used to provide robustness against node and link failures, most notably periodic flooding of synchronization messages throughout the whole network, but this approach still do not completely solve the limitations aforementioned in TPSN.
Nodes employing TDP [5] periodically self-determine to become master/diffusion leader nodes using an election/reelection procedure. Master nodes then engage neighboring nodes in a peer evaluation procedure to isolate problem nodes. Timing information messages are broadcasted from master nodes and then rebroadcasted by the diffused leader nodes for a fixed number of hops, forming a radial tree structure. This approach is fully distributed, but it does not compensate for the clock drift. To the application needing small tolerance time, this algorithm has to increase the synchronization frequency, which greatly decreases the node's battery life.
In GCS [6], nodes take turns to broadcast a synchronization request to their neighbors who each respond with a message containing their local time. The receiving node (at the center of this exchange) averages the received timestamps and broadcasts this value back to its neighbors which adopt this value as their new time. This is repeated by each node in the network until network wide synchronization is achieved. This approach is fully distributed, but it does not compensate for the clock drift. To get higher time accuracy, it has to decrease the time synchronization period. RFA [7] is inspired by firefly synchronization mechanism. In this algorithm, every node periodically broadcasts a synchronization message, and anytime they hear a message they advance, of a small quantity, the phase of their internal clock that schedules the periodic message broadcasting. Eventually all nodes will advance their phase till they are all synchronized, that is, they fire a message at the same time. This approach, however, does not compensate for clock drift, therefore the firing period needs to be rather small.
ATS [8] uses a cascade of two consensus algorithms to tune compensation parameters and converge nodes to a steady state virtual clock. In the first stage, nodes broadcast local timestamps in order to estimate the clock skew rates relative to each other. Nodes then broadcast their current estimate of the virtual clock skew rate and receiving nodes combine this with their relative skew estimates to adjust their own virtual clock estimate. The same principle is then applied to remove the offset errors. This is a fully distributed protocol; its convergence speed is a bit slow, related to the network topology.
CCS [9] utilizes average consensus algorithm to compensate the clock offset. By the accumulated offset error that they remove during each round of offset compensation nodes can observe how much their own clocks drift away from the consensus time, then they use this information to compensate the clock drift, which can be seen as an enhancement of ATS. This approach is also fully distributed and has the disadvantage of slow convergence speed. The TSMA protocol proposed in this paper is similar to ATS and CCS. It is fully distributed, that is, it does not require any special root, including skew compensation, and exploits MAC-layer timestamping for higher accuracy. Unlike ATS and CCS, the proposed protocol uses max consensus method to compensate the clock drift, which has faster skew convergence speed as the centralized protocol, that is, it increases linearly with the max network hops. The features of all the protocols mentioned in this section are summarized in Table 1.

Mathematical Preliminaries
We assume that links in WSNs are symmetric, so the network can be modeled as an undirected graph = ( , ), where = {V | = 1, 2, . . . , } represents the nodes in the WSNs, and the edge set represents the available communication links, that is, (V , V ) ∈ if node sends information to node . We refer to V and V as the tail and head of the edge (V , V ), respectively. The orientation of the graph is a choice of heads and tails for each undirected edge. The set of edges of a fix orientation of the graph is denoted by 0 . Thus, 0 contains one and only one of the two edges (V , V ), (V , V ) ∈ . We use to present edge number and the vertex number, for example, = | |, = | 0 |. The set of neighbors of node is denoted by = {V | , ∈ }. The degree of node V is the number of its neighbors | | and is denoted by (1) Let denote the adjacency matrix of . Then, the Laplacian of graph is defined by The matrix has the following features.
It is a well-known fact that this property holds regardless of the choice of the orientation of [11]. Let denote a scalar real value assigned to V . Then, = ( 1 , 2 , . . . , ) denotes the state of the graph . We define the Laplacian potential of the graph as follows: Lemma 1 (Laplacian potential [12]). Ψ ( ) has the following properties.
(i) Undirected graph 's Laplacian potential is positive semidefiniteness, and Lemma 2 (connectivity and graph Laplacian [11]). Assume graph G has c connected components, then Particularly, for a connected graph with = 1, rank( ) = − 1. Proof. is an undirected graph, then is a real symmetric matrix. After diagonalization, the rank of matrix stays the same. is a connected graph, then based on Lemma 2, the rank of matrix is − 1, then there must be a zero in 's eigenvalues, and all the other eigenvalues of are nonzero. is a real symmetric matrix, so 's eigenvalues are real. Based on Lemma 1, is a positive semidefinite matrix, then 's eigenvalues are nonnegative. In conclusion, there must be a zero in 's eigenvalues, and all the other eigenvalues of are positive and real.
Above all, when zero-input response reaches the equilibrium status, the system's full response is In other words, every node in the network finally has the same status.

Clock Model
Clock synchronization is of significant importance in WSNs. Before delving into the details of synchronizing clocks, we first define some terminologies used in this paper. ( ) denotes a reference clock, where is the real time, that is, coordinated universal time (UTC). Every sensor node maintains its own local clock, which is a monotonically nondecreasing function of . This local clock is an ensemble of hardware and software components, essentially a timer that counts the oscillations of a quartz crystal running a particular frequency. In general, the timer is programmed to generate an interrupt, which is called a clock tick. At each clock tick, the interrupt procedure increments the clock value stored in memory.
For any two nodes' local clocks ( ) and ( ), the clock ( ) is considered correct at time if ( ) = ( ), or the clock ( ) is considered accurate at time if ( )/ = ( )/ , and two clocks ( ) and ( ) are synchronized at time if ( ) = ( ). The aforementioned definitions show that the two synchronized clocks are not always correct or accurate; time synchronization is not related to time correctness and accuracy. As for most applications in WSNs, it is sufficient to achieve clock synchronization. Since the oscillator frequency is time varying due to ambient conditions such as temperature changes, variations of electric supply voltage, and air pressure, clock ( ) of node can be modeled as where is the clock offset, the difference between the time reported by clock ( ) and the real time at the initial instant 0 (i.e., = ( 0 ) − ( 0 )), and is defined as the skew of the clock ( ). To achieve perfect synchronization in a sensor network, all nodes = 1, 2, . . . , must precisely compensate for their clock parameters and so that all clocks have zero offset error and all tick at the same rate.

TSMA Method
In this section, we present our time synchronization algorithm, that is, TSMA. Instead of trying to synchronize to an external reference like absolute time or UTC, the TSMA method aims to achieve an internal consensus within the network on what time is, and how fast it travels. In every synchronization round, the TSMA algorithm updates the compensation parameters for each node and over time the network clocks converge to a consensus, where ( ) is the consensus clock. The consensus clock is not a physical clock, it is a new virtual clock, that is, generated from the network of nodes running the TSMA algorithm. The consensus clock has its own skew rate and offset relative to the absolute time.
By expanding the clock functions from both sides of (20), we can see the compensation parameters that all nodes must obtain in order to synchronize to the consensus clock, To achieve these compensation parameters, nodes repeat the TSMA algorithm in synchronization rounds which consist of two main tasks; skew compensation and offset compensation. In skew compensation, the nodes iteratively select the max skew rate estimation from its neighbors in order to improve their skew compensation parameter. In the offset compensation phase, nodes average their neighbors' clock readings to synchronize nodes to a common time.

Skew Compensation.
The goal of skew compensation is to ensure all compensated clocks in the network tick at the same rate, that is, 6 International Journal of Distributed Sensor Networks We use max consensus protocol to compensate for clock skew. For perfect skew compensation, the skew rate of the consensus clock is given by = max ( 1 , 2 , . . . , ) . (24) Nodes execute the following algorithm to converge their clock skew to the consensus value.
Step 1. Before running the synchronization algorithm, nodes set their skew estimation to 1, and set their respect life time life to 0.
Step 2. At the beginning of each synchronization round, every node increases their own life time life by 1. If node 's life time life > 3, then node broadcasts a synchronization packet containing its own local clock value and estimated skew using a CSMA/CA MAC scheme once per synchronization round. Like other synchronization techniques, MAC layer timestamping is required to ensure that clock values are transmitted instantly.
Step 3. If node receives a synchronization packet from node at time , then this node should immediately record its own local time ( ) and node 's information ( ( ), ( )). If node has not stored a history local time pair before, then create the local time pair ( (old), (old)) and assign it with ( (old) , (old)) = ( ( ) , ( )) .

Offset Compensation.
After the skew compensation algorithm is applied, all local estimators will eventually have the same drift, that is, they will run at the same speed. At this point, it is only necessary to compensate for possible offset errors. To achieve this, each node aims to accurately estimate the instantaneous average time of all its neighbors' clock in the network and set their clocks to this time. From Theorem 5, we can see that every node in the network will finally reach the same status, In reality, however, it is not practically possible for nodes to calculate the instantaneous global average time of all its neighbors' clock in the network. Since sensor networks may consist of a large number of nodes, and one node may have lots of neighbors. It is impossible for nodes to exchange clock values instantaneously, and nodes may suffer from packet loss and other network dynamics. Instead, the TSMA algorithm converges network clocks to an approximation of ( ) using the locally available information within the single-hop transmission range of each node. Nodes execute the following algorithm to converge their clock offset to the consensus value.
Step 1. Before running the synchronization algorithm, nodes set their offset estimation to their current local time and set their respect life time life to 0.
Step 2. At the beginning of each synchronization round, nodes reset their confidence parameter Conf to 1 and increase their life time life by 1. If node 's life time life > 3, then node broadcasts a synchronization packet containing its estimated offset and confidence parameter Conf using a CSMA/CA MAC scheme once per synchronization round. MAC layer timestamping is required to ensure that clock values are transmitted instantly.
Step 3. For each broadcast, all other nodes within receiving range update their own clocks by combining this value with their own clocks using the confidence weighted running average equation and then increase their confidence parameter Conf by 1, The confidence parameters give weighting to the clock values from nodes that have already received synchronization packets in the current round. These nodes are generally closer to a local consensus than those who have not received International Journal of Distributed Sensor Networks any sync packets, and this was found to increase the rate of convergence.

Simulation
In this paper, we use max consensus to compensate for clock skew and use average consensus to compensate for clock offset. To show the correctness and robustness of the TSMA algorithm, we make a simulation in MATLAB; the following is the simulation environment: there are 100 WSN nodes in the 10 * 10 grid, one grid one node, and we place the node in the center of the grid, the base value of nodes' oscillator is 32.768 KHz, the skew rates are assigned randomly from a normal distribution with mean = 1 and standard deviation = 20 PPM, nodes are given an initial error distribution between 0 and 1000 ticks.

Offset Compensation.
In this simulation, we set the time synchronization period to 1 minute. Nodes broadcast one sync packet per round in random order with a transmission range of 2, which means if the distance between two nodes is less than 2, then they can communicate with each other. Figure 1 shows the clock errors from average time measured in clock cycles. In the first three synchronization periods, nodes' life time is less than 3, so they do not transmit any synchronization packet to their neighbors, and then in 3 minutes, nodes start to compensate for the clock skew and offset. After seven rounds compensation, 100 nodes in the network achieve the synchronization state, the synchronization error between any two nodes is included between ±10 ticks.
The geographic distribution of clock errors in the network is shown in Figure 2. It can be seen that before Figure 2(a), the first round of offset compensation, the clock error relative to the average time can reach up to 550 ticks, but after Figure 2(b), the first round offset compensation, the clock error reduces to 140 ticks. Also we can see that the clock error after compensation transfer smoothly in geographic distribution, the local error between any neighboring nodes is no larger than 50 ticks.
A histogram of the clock errors is shown in Figure 3. It can be seen that before Figure 3(a), the first round of offset compensation, the standard deviation of clock error reaches up to 286.32, but after Figure 3(b), the first round of offset compensation, this value reduces to 70.90. From Figures 2 and 3, we can see that after the first round of offset compensation, the clock errors among nodes are significantly reduced. Figure 4 shows the convergence of the skew rates of 100 nodes in this simulation. From Figure 4, we can see that after two synchronization periods all the 100 nodes' skew rates converge on the maximum clock skew. Figure 5 shows the transmission range's effect to algorithm performance. We set the transmission range to 1, 2, and 3, respectively, then compare their convergence speed with each other. From Figure 5, we can see that synchronization accuracy and convergence speed are found to improve as the transmission range of the nodes increased, since nodes could receive synchronization packets from a greater proportion of the network population. On the other side, to increase the transmission range, we also increase the energy consumption of nodes, so in the practical application, we should make a tradeoff between energy consumption and convergence speed. Figure 6, is intended to study the robustness properties of the TMSA protocol subject to node failure and node replacement, as well as the performance in terms of convergence speed and steady state synchronization error. The synchronization period is set to 1 minute which is sufficiently large to exhibit the effects of different clock speeds. The simulation was run for about 50 minutes and presents 4 different regions of operation indicated by the letters A, B, C, and D which model potential node failure or the addition of the new nodes. In Region A, all nodes are turned on simultaneously with random initial conditions of their local clocks. After 2 synchronization periods, the synchronization error between any two nodes is included between ±10 ticks, that is, the maximum error is smaller than 20 ticks. At the beginning of Region B, about 20% of the nodes chosen random in the grid are switched off and then switched on at different random times. Once a node is switched on, it starts updating its estimated time using TSMA protocol but does not transmit any message for the first three synchronization periods to avoid injecting large disturbances into the already synchronized network, and then it starts to transmitting and receiving messages equally. The plot in Figure 6 clearly shows that the nodes get synchronized as soon as they are turned on without perturbing the overall network behavior. At the beginning of Region C, about 20% of the nodes turned off their radio, that is, they stop updating their parameters , , then we turn on the 20% nodes' radio at about 35 minutes. In this procedure, we can see that all nodes keep the synchronized state, the reason is that their skew rates have already converged to the maximum skew rate in the network, these 20% nodes still synchronize with others. At the beginning of Region D, we replace the node in the top left corner with one node having 40 PPM drift, which is the max skew rate in the network now, and then all the other nodes will compensate themselves to this skew value. From Figure 6, we can see that three synchronization period after the new node joining the network, it starts to transmitting messages to its neighbors, after a short transient the nodes quickly synchronize again.

Conclusions
This paper presented a new synchronization algorithm for WSN, the time synchronization using consensus protocol, which is based on consensus algorithms whose main idea is to average local information to achieve a global agreement on a specific quantity of interest. The proposed algorithm is fully distributed, asynchronous, includes skew compensation, and is computationally light. Moreover, it is robust to dynamic network topologies due, for example, to node failure or replacement. Finally, a set of simulations are presented to show the good performance of the proposed protocol.
International Journal of Distributed Sensor Networks  However, extensive work is still necessary to compare the performance of our proposed approach relative to FTSP [3] and other protocols over large scale multihop sensor network and over longer periods.