Energy-Efficient Monitoring in Software Defined Wireless Sensor Networks Using Reinforcement Learning: A Prototype

Software defined wireless networks (SDWNs) present an innovative framework for virtualized network control and flexible architecture design of wireless sensor networks (WSNs). However, the decoupled control and data planes and the logically centralized control in SDWNs may cause high energy consumption and resource waste during system operation, hindering their application in WSNs. In this paper, we propose a software defined WSN (SDWSN) prototype to improve the energy efficiency and adaptability of WSNs for environmental monitoring applications, taking into account the constraints of WSNs in terms of energy, radio resources, and computational capabilities, and the value redundancy and distributed nature of data flows in periodic transmissions for monitoring applications. Particularly, we design a reinforcement learning based mechanism to perform value-redundancy filtering and load-balancing routing according to the values and distribution of data flows, respectively, in order to improve the energy efficiency and self-adaptability to environmental changes for WSNs. The optimal matching rules in flow table are designed to curb the control signaling overhead and balance the distribution of data flows for achieving in-network fusion in data plane with guaranteed quality of service (QoS). Experiment results show that the proposed SDWSN prototype can effectively improve the energy efficiency and self-adaptability of environmental monitoring WSNs with QoS.


Introduction
Wireless sensor networks (WSNs) are application-oriented information-centric networks, which are characterized by limited energy and constrained radio resources [1]. One typical application of WSNs is environmental monitoring, where data-gathering based environmental monitoring tasks are executed by nodes with heterogeneous sensing and programmable functions. Each node in WSNs could be equipped with multiple sensors for different sensing purposes, for example, temperature, humidity, light, and vibration. In WSNs, the time-varying wireless communication environment and random interference may lead to unreliable communication links, while switching on/off of network nodes due to energy constraints can cause unpredictable topology changes, making it difficult to guarantee reliable and adaptive data-gathering for monitoring applications.
Software defined wireless networks (SDWNs) enable programmable control in network and virtualization of network equipment by decoupling control plane and data plane [2]. The logic centralization and simplified abstraction of control plane can improve the scalability and multitasking efficiency [3]. The combination of SDWNs based architecture and WSNs, that is, software defined wireless sensor network (SDWSN), would bring the following advantages: (i) SDWNs based abstraction of network control plane can effectively reduce the cost of WSN expansion and operation. (ii) SDWNs based virtualization of network equipment and programmable control of common hardware and software enabled flexible task configuration, high resource utilization, and simplified network management in WSNs.

2
International Journal of Distributed Sensor Networks However, to realize the above advantages of SDWSN for monitoring applications is not without difficulties. The control-data decoupled structure of SDWNs relies on crossplane control traffic, which may result in excessive communication overhead and transmission delay. In SDWSN, although different virtual networks can work together on top of the same physical infrastructure, the centralized control plane may lead to high energy costs due to information collection for reaching a global view, and the multiple virtual networks may compete for common physical network resources. If a large number of flows simultaneously request a switch to forward data, network congestion or even crash may occur. Furthermore, energy-and resource-constrained WSNs might not have the sufficient network resources to realize the dynamic resource allocation and QoS of SDWNs. Therefore, the energy and resource utilization of SDWSN need to be carefully designed for resource-constrained and applicationoriented WSNs.
Most existing works on SDWSN focus on providing QoS guarantee or optimizing network management for monitoring applications. The software defined information centric network (SDN-IC) [4] floods the network with packets so as to leave reverse path information at routers, but that method will cause frequent duplication of packets and lead to huge communication loads, which increase not only endto-end delay but also energy consumption. The resource allocation in a software-driven wide-area network (SWAN) was optimized by an agent-based traffic engineering scheme [5], which requires excessive information exchange between the controller and switches for tracking network topology and traffic distribution changes. With the increase of network density, the SWAN would be plagued by large overhead caused by collisions between candidate relays contending for media. The software defined vehicular ad hoc network (SDV) [6] uses network virtualization to allocate network traffic in a programmable fashion, where surveillance packets are delivered following a position-aided data-gathering mechanism with greedy perimeter stateless routing (GPRS) [7] in case of controller failure. However, the SDV controller needs to gather and maintain a large amount of information for transmission power control, which is not practical for large-scale monitoring WSNs. In [8], the energy consumption of a multitask SDWSN was minimized for monitoring applications with guaranteed quality-of-sensing by solving a mixed integer linear programming problem at a high computation complexity. In [9,10], the load-balancing routing algorithms for WSNs construct an optimal routing tree by minimizing the total weight of routing paths, where the path weights are modeled as a function of energy consumption. However, none of these works has adequately considered the application-oriented features of flows and in-network data fusion in complex and dynamic monitoring environments for SDWSN, thereby significantly limiting their energy efficiency and environmental adaptability.
In this paper, we develop an energy-efficient cognitive SDWSN prototype for environmental monitoring application, where high computational complexity management of data fusion and data routing are centralized in control plane, while low computational complexity execution of algorithms is implemented in data plane. The cognitive mechanism based on reinforcement learning (RL) [11] is embedded in control plane for information processing, where the interactions (in terms of reward or punishment) between agents and the environment are utilized to enhance the intelligence in policy decision making and to improve the self-adaptability of the energy-saving mechanisms in dynamic environments. Particularly, we propose to mine the application-specific value redundancy of flows in periodic transmissions of monitoring data using an autoregressive moving average (ARMA) [12] based time series forecast model. We design RL based mechanisms to perform value-redundancy filtering and loadbalancing routing according to the values and distribution of flows, respectively, in order to improve the energy efficiency and self-adaptability to environmental changes of WSNs. Furthermore, the actions of control plane are mapped to lowcomplexity vector calculations and rule matching in switch's flow table. The rules in flow table are designed to curb the control signaling overhead and balance the distribution of data flows for achieving in-network fusion in data plane.
The novel aspects of the proposed energy-efficient SDWSN prototype are (i) energy saving with guaranteed QoS is achieved by mining the application-specific value redundancy and distribution of data flows in SDWSN, taking into account the inherent constraints of WSNs in terms of energy, radio resources, and computational capabilities; (ii) the RL based mechanisms for value-redundancy filtering and load-balancing routing can adapt to the varying environment and network status, thus improving the self-adaptability of SDWSN for monitoring applications.
The rest of the paper is organized as follows. Section 2 elaborates the cognitive SDWSN prototype and its functional architecture. Section 3 presents a specific implementation of the proposed prototype. In Section 4, performance of the proposed SDWSN prototype in terms of energy efficiency and self-adaptability is evaluated through experiments in comparison with existing WSN approaches for monitoring. Finally, conclusion is drawn in Section 5.

Functional Architecture of SDWSN Prototype
In this section, we propose a cognitive SDWSN prototype, where RL is incorporated into the network information process for an integrated consideration of the energyand resource-constrained trait of WSNs, complex features of monitoring applications, and dynamic nature of WSN deployment environments. As shown in Figure 1, the fundamental functionalities of SDWSN prototype include an information QoS setting module, cognitive information middleware (CIM), and an  information processing module. Following the design principles of SDWNs, the application plane of SDWSN prototype is designed to meet the QoS requirements of monitoring applications, supported by the hardware of sensors. Application plane interacts with control plane through an application programming interface (API). The functionalities of data plane are dynamically configured using Over-the-Air Programming (OTAP) technique [13], which can run multiple tasks simultaneously with QoS and can reduce the energy consumption in online task scheduling. The data plane is abstracted into a weighted directed graph, G = (V, L, E), which forms a reverse multicast tree with V being the set of vertexes, L being the set of links, and E being the set of link weights. Each node in the vertex set V maintains a hierarchical cluster, where a switch acts as the cluster leader and other programmable nodes become cluster members constituting the monitoring information generating (IG) module. The data flows generated by members of the same cluster form a programmable set of packets that share certain properties, because the packets of a flow are handled by matching "Field" and "Rules" in a switch's flow table (see Figure 1) and by imposing the "Action" set to execute the preset policy. The weight of each link indicates the status of flow property distribution and frequency bandwidth allocation during a certain data-gathering round and is defined as the function of link bandwidth utilization (BWU) (see Section 3).
In the control plane, CIM is a part of the controller that performs adaptive data mining of network information using machine learning schemes with QoS guarantee. Information mapping (IM) module in CIM is responsible for preprocessing information received from data plane (i.e., information mining). It has two main duties: to perform online evaluation of the value of current monitoring data flow using an ARMA model and to build a flow distribution map and a network interconnection map in G. Information Q-learning (IQ) module utilizes the results from IM module to produce optimal strategies, which are then transformed by policy defining (PD) module into a set of policies to be inserted into switch's flow table for a lightweight implementation of data plane.
Routing decisions are made by CIM in the controller and then translated into rules and actions to be deployed in flow tables. APIs are used to configure flow tables for routing, in conjunction with a floodless service discovery mechanism. As part of the operating system, Sensor OpenFlow (SOF) [14] channel is used to establish an end-to-end connection between the controller and a switch. SOF also supports queries on packet streams and automatically splits queries between the data plane and the control plane, thus avoiding the increase of traffic in the data plane due to queries. Value matching and path matching are designed by CIM and executed by lightweight actions of flow tables in an on-demand driven energy-saving mode. This reduces the amount of information exchange between the operating system and the data plane. After data-gathering routing has been established, data packets can be forwarded and processed in the data plane. Subsequent (follow-up) packets in a flow are forwarded in the data plane based on the configured routing in flow tables without any further participation from the control plane. This can reduce the data-gathering traffic in the data plane and decrease control overhead in the control plane. The features specified in the MAC layer (as part of the operating system) are logically partitioned into two different modules: the lower MAC module, which depends on the proprietary Hardware Abstraction Layer (HAL) and controls time critical functions to achieve value-redundancy fusing in the data plane based on service differentiation access control; and the upper MAC module, which is responsible for delay-tolerant control plane functions.
The proposed cognitive SDWSN prototype takes into account the inherent constraints of WSNs in terms of energy, radio resources, and computational capabilities. Energy saving is achieved through the design of value-redundancy data fusing and load-balancing data routing technologies in CIM. By using machine learning schemes for energy saving in control plane and by incorporating lightweight execution using a flow table at each switch in data plane, intelligence and controllability can be achieved in all stages of the information operation chain in SDWSN. The introduction of in-network processing with low computational complexity in data plane facilitates the centralization of QoS management in control plane, thus reducing the total amount of overheads for crossplane communications. Moreover, low-complexity numerical operations of the flow entries are enabled by IM module, which matches CIM's outputs to vector constant parameters.
Based on the proposed SDWSN prototype, programmability and resource reutilization in data plane can be improved through OTAP. The overhead for cross-plane control signaling can be reduced by introducing a data fusion mechanism into data plane, which also improves the controllability of packet routing and the efficiency of resource utilization.

Design of RL Based Energy-Saving Mechanisms
In this section, the implementation of the proposed SDWSN prototype for environmental monitoring applications will be discussed with a focus on RL based energy-saving mechanisms.

Design of Energy-Saving Mechanisms in Control Plane.
In event-detection based monitoring applications, the periodic transmissions of monitoring data usually have low duty cycles and high time-domain correlation, resulting in data value redundancy. In the following, we exploit the data value redundancy to achieve transmission energy saving. RL is an agent based learning approach, which uses the trial and error method to find a reward maximizing behavior in a dynamic environment. RL can adapt to the dynamic environment with a relatively low complexity, rendering itself perfectly applicable to WSNs with limited resources and operating in unpredictable environments. Therefore, we design the energy-saving policy Γ (see (4)) in control plane based on RL. The key design of Γ is the utilization of contention window (CW) in QoS-aware media access control (MAC), which exploits the concept of service differentiation MAC [15]. CW introduces a MAC back-off counter, called Failed Times (FT), to count the number of failures before winning the contention, which can avoid long time occupancy of media caused by a large CW. The threshold of FT sets the retry limit in MAC and can be considered as CW size. According to Γ, we propose the value-redundancy filtering mechanism 1 and the load-balancing routing mechanism 2 , where 1 performs online estimation of flow values and implements an optimal in-network fusion strategy, and 2 performs online analysis of flow distribution and adaptive optimization of path weights.
During a specific data-gathering round ( ∈ Z + ), the monitoring data flow generated by node can be modeled as a limited time series X = { }, ∈ T , T = { , + 1, . . . , + − 1}, ∈ V, ∈ Z + , where = ( − 1) + 1 denotes the sampling time instant and denotes the learning queue length. The time span of each data-gathering round is , where is the sampling period.
Since the ARMA model captures the statistical characteristics of a time series, which can be used to mine the sampled data value redundancy and to perform real-time value evaluation of data flows, we adopt ARMA to predicate the valuêand calculate the corresponding prediction error ‖ −̂‖ , where ‖ ⋅ ‖ denotes the normalized Frobenius norm [16], for example, ‖ ‖ =2 = [Tr( * )] 1/2 . Note that if some important events occur, the distribution of monitoring data flows among nodes would become uneven. According to the information entropy theory [17], greater variations in data flow values indicate a larger average amount of information contained in the data flows and the higher probability of important events occurring under the premise of no external interference/influence. Thus, we define the value factor to estimate the underlying value of data flow X generated by node in the th data-gathering round as follows: where is the mean of prediction errors, is an antiinterference factor designed to avoid misjudgment of values International Journal of Distributed Sensor Networks 5 caused by environmental disturbance, and = 1 if the prediction error at time instant is larger than ; otherwise = 0. If the number of significant prediction errors in data flow X reaches the threshold th , that is, ∑ ≥ th , then X is considered as a high value level; otherwise, those significant prediction errors are considered as the result of external interference. The anti-interference capability increases with the value of th .
After the value of data flow X having been predicted, the historical forwarding record of X during the th datagathering round needs to be extracted based on link statistics obtained from the counter field in switch's flow table, in order to analyze the link state information and calculate the BWU. Accordingly, the real-time state element for RL can be obtained as In (2), reflects the local status of flow value and flow allocation used for resource utilization in SDWSN, where is the BWU and denotes the traffic throughput of node on link in the th data-gathering round, which can be calculated as the incremental number of "transmitted bytes" from node on link in the th data-gathering round. indicates the mean bandwidth of links in set {⟨ , ℎ⟩} ℎ , where node ℎ belongs to ( ) , which denotes the neighborhood of node with size | ( ) |. Then, the threshold of FT in CW is set as an inversely proportional function of using service differentiation based MAC retransmission protocol [15]. Therefore, the value of data flow X and its historical forwarding records can be mapped to the corresponding probability of channel access in MAC layer. The adaptive optimization of CW can be formulated as an average reward Markov decision process (ARMDP) [18]. Accordingly, we design an ARMDP based RL mechanism to optimize CW for the purpose of finding the optimal energy-efficient policy. The RL mechanism is executed by CIM, which adaptively adjusts the channel access probability to inhibit the transmission of value-redundant loads and balance the distribution of transmission loads among nodes while guaranteeing QoS.
In the RL mechanism, the dynamic environment is characterized by a 4-tuple (S, A, P, R), where S is the set of network states updated in each data-gathering round and consisting of value factors and BWU stored in the status table embedded in CIM; that is, thereby providing a real-time observation of the environment for Q-learning; A is the action set produced by PD module and injected into the "Match Field" of flow table in the switch (see Figure 1); P : S × A × S denotes the state transition probability; and the reward function R : S × A → R indicates the environmental reward to the corresponding action for improving energy efficiency. The global reward is accumulated by maximizing the local reward in each datagathering round by CIM. For a given QoS, the local reward calculated by (4) increases with a higher value of the data flow and a lower value of the BWU . Conversely, if the current value of data flow is low and/or the bandwidth is overutilized, then RL mechanism will generate a zero income or a negative reward as punishment. The CW optimization strategy is shown in Algorithm 1, where Γ optimizes the size of CW |W | by maximizing the accumulated local reward and amending the criteria of action evaluation in an iterative manner: where and denote the learning factor and the discount factor, respectively, and are the state and the action in the th data-gathering round, and and represent the tuning step sizes for updating values of Ψ and function [19], respectively.
Status table in CIM tracks how the operational environment evolves with time, where new states can be mined and new actions should be discovered. Policy Γ needs to be constantly updated to match the state-event pairs with the optimal actions. RL mechanism uses a random strategy to fully explore the state space at the beginning and adopts a greedy strategy to ensure convergence later on. According to Γ outputted by CIM, controller executes the optimal action with local maximum -value as follows: Action: where Δ|W | indicates the adjustable size of CW. After the th iteration, Ψ is obtained according to the comparison of value between the th and the ( − 1)th rounds, and QoS factor QoS represents the QoS requirement of a given monitoring application. The value of QoS can be adjusted by the information QoS setting module in application plane and then be fed into the "Match Rules" in switch's flow table (see Figure 1) via API to realize programmable network control. Accordingly, the size of CW is adjusted by node executing action based on the following transfer function: where Γ ( ) = ( , )/[∑ ( , )] denotes the state assessment for action under state using Γ. Thus, CW size, that is, |W |, can be adaptively adjusted according to dynamic environment status and application QoS, with the vibration amplitude Δ|W | given by the inverse of . According to the MAC protocol for differentiated services [20], the probability of packet forwarding is inversely proportional to |W |.
Based on the above analysis, we design 1 and 2 as follows: (i) 1 reduces the total energy consumption in SDWSN by inhibiting the transmission of value-redundant loads. In 1 , is used as one important element to control |W |. According to (5), a low-value flow will be configured with a high medium access delay caused by the corresponding large number of retries set by MAC back-off counter. This would lead to a low forwarding probability. The low-value flow will be discarded when the medium access delay International Journal of Distributed Sensor Networks 7 goes beyond a preset threshold of FT. Therefore, the probability of traffic forwarding can be adaptively controlled according to the flow value. Suppressing the transmission of low-value flows greatly decreases the amount of in-network traffic for data-gathering and thus achieves energy saving.
(ii) 2 balances the energy distribution across SDWSN by minimizing the total weight of routing paths in G, that is, min{∑ }, ∃ ∈ Tree optim ⊂ L, to construct an optimal routing tree, Tree optim = { }, ∈ L, ∈ V, where the average link BWU is used as the link weight for controlling the size of CW. According to (2) and (5), a link with a higher BWU will be configured with a larger CW size, leading to a lower probability of traffic forwarding. Therefore, the distribution of network traffic can be balanced by adaptively adjusting the link weights, thereby optimizing the routing selection in SDWSN.

Implementation of Flow-Table Based Policy in Data Plane
SDWSN are characterized by the decoupled control and data planes. Although the energy-efficient mechanisms 1 and 2 require a high computational complexity in control plane, they are executed by the information processing module of data plane at a low computational complexity. Based on Γ in (4), 1 and 2 are mapped to the parameter vector of the value-redundancy filter and the path matrix Π of the load-balancing routing mechanism, respectively. The valueredundancy filtering parameter vector = (‖Ψ ‖ , Qos ) 2×1 , where Ψ is the value-redundancy filter threshold calculated in (4) for use in (5) to curb CW size. Π consists of the identities of nodes belonging to the optimal routing tree Tree optim found by 2 . In data plane, the specific implementation of energy-efficient mechanisms 1 and 2 involves only lightweight vector product and numerical comparison, which are both low-complexity matching operations, following the corresponding rules in the "Match Field" (i.e., Rule (1) to Rule (4) in Figure 2) of flow table. Figure 2 shows the detailed implementation of flow table at each switch. Flow table contains a prioritized list of rules to instruct the corresponding actions. Particularly, task scheduling has the highest priority, followed by value matching (i.e., value-redundancy filtering), and path matching (i.e., load-balancing routing) has the lowest priority. Each input flow will be matched to the prioritized rules. When multiple rules match an input flow, the rule with the highest priority will be selected first to execute the corresponding action set. If no rule matches an input flow, then the switch will request the controller to update its flow table, and the default-action set will automatically forward the flow to CIM in the controller for developing new energy-efficient policies.
When the current flow matching process ends and the next flow arrives, if the newly arrived flow contains the same contents as the previous one, it will be considered as redundant. In this case, flow table does not need to be updated for value-redundancy filtering or forwarding path. Therefore, cross-plane communications and task reconfiguration can be greatly reduced, thus improving the energy efficiency in application-oriented SDWSN. When the real-time status (e.g., QoS and throughput) of SDWSN notably changes, the value-redundancy filtering parameters and routing paths can be dynamically adjusted by , thus improving the environmental adaptability SDWSN.
In the proposed SDWSN-RL prototype, the control traffic from the controller to the data plane (i.e., downstream traffic) contains Packet-Out, Modify-State (configuration), and Read-State (request or query); the control traffic from the data plane to the controller (i.e., upstream traffic) contains Packet-In and Read-State (reply or report). The control traffic flow can be described as follows. Once a source host generates a query message, the controller responds with a reply message if the source host and the destination host are on the same island. Otherwise, the controller drops this query message. When network status changes, Packet-In event will be triggered by a request message in the data plane. Each switch sends a reply message containing the switch status to the controller via a secure channel supported by SOF. Meanwhile, Modify-State configuration messages are exchanged between the controller and switches via the secure channel as well. A Packet-Out message is generated by PD module and sent to switch to validate an entry in flow table. If no response is returned within a specified time, the potentially invalid entry will be deleted. The amount of Packet-In/Out messages for handling requests grows with the number of switches in the network.

Experiment Results
We perform experiments to evaluate the performance of the proposed RL based SDWSN prototype (SDWSN-RL) for environmental monitoring applications. The network simulator NS2 [21] is used to build the experiment environment. The parameter values used for the experiment setup are given in Table 1. We adopt the event radius (ER) model [22] to simulate the impulsive traffic triggered by temporally and spatially correlated monitoring events in a disk area. Following the ER model, the monitoring area of SDWSN is divided into an event gathering region, a data relaying region, and a decision making region. The first two regions belong to data plane, and the third one belongs to control plane. The monitoring center, that is, BS, is placed at the top right corner of the monitoring area with the coordinate (128 m, 162 m). The event center is located at the coordinate (48 m, 82 m) inside the event gathering area. The arrival of events follows a Poisson distribution in the time domain. Note that all the experiment results in Figures 3-8 include the energy consumptions of both datagathering and control traffic. In the energy consumption calculation, we consider the energy consumption data-gathering for data-gathering (including data fusion and processing) and the energy consumption control-traffic for control overhead (including the control traffic for configuration, request-query, and reply-report). The evaluation for energy consumption in   Figure 5 has all taken into account both data-gathering and control-traffic . Figure 3 shows the comparison of energy consumption rate between SDWSN-RL and a WSN without SDN    energy-efficient distributed clustering routing (HEED) with a back-pressure mode [23], periodically computes the utility based on current queue gradients, and decides the next hop for each flow accordingly. The energy consumption rate is defined as the ratio of the energy consumption of SDWSN-RL (or NonSD-WSN-RL) to that of single-hop communication (without clustering or aggregation). Figure 3 shows that the average energy consumption rate of SDWSN-RL is much lower than that of NonSD-WSN-RL for each considered density of network nodes. The energy consumption rate of NonSD-WSN-RL increases faster with the increase of datagathering rounds than SDWSN-RL. Furthermore, when the network node density increases from 1 = 0.7 to 2 = 1.2 1 , NonSD-WSN-RL has a much higher increase in energy consumption rate than SDWSN-RL. This is because the SDNbased controller can obtain real-time statistics on granular control and link status (which are not available in Non-SDN networks) for use in energy-efficient routing. Routing decisions made by CIM in SDWSN-RL are translated into rules and lightweight actions in flow tables to realize a floodless service discovery mechanism, thus limiting the amount of control messages. Without the support of SDN, routing tables are created (or reconstructed) in a collaborative way based on local exchanges of neighborhood information, which may require a lot of iterations before convergence. Figure 4 shows the experiment results in terms of the normalized link BWU of 10 randomly selected links. For performance comparison with SDWSN-RL, we include three classic data-gathering schemes, SDN-IC, SDV+GPRS, and SWAN, which are content-centric, position-aided, and agentbased, respectively. We calculate the normalized variance of BWU ( ) based on the 10 randomly selected links during data-gathering round as follows: where the 10 × 1 vector Y contains the BWU of the 10 randomly selected links. Our calculations using the results in Figure 4 show that SDWSN-RL reduces the normalized variance of BWU by 8.9%, 12.8%, and 6.7% as compared to SDN-IC, SDV+GPRS, and SWSN, respectively, thus achieving improved load-balancing performance in SDWSN. The gap between the lowest and highest BWU across the 10 links of SDWSN-RL is 0.1357, which is the minimal among the five schemes. The results show that SDWSN-RL outperforms the other four schemes in terms of more balanced flow distribution in SDWSN by optimizing the weight of each link in the routing tree. Figure 4 also includes NonSD-WSN-RL in the loadbalancing performance comparison. We can see that the loadbalancing performance of NonSD-WSN-RL is much worse than that of SDWSN-RL and the other SD based schemes. This is because the load-balancing routing mechanism in SDWSN-RL utilizes global network information to construct optimal routing paths in a centralized manner. More specifically, SOF in SDWSN-RL provides a lightweight control protocol between the central controller and the switches in the data plane. The controller uses information in flow tables to calculate the load-balancing routes among all switches and sends the flow tables back to the switches to indicate the next hop towards each destination. SOF provides simple APIs at switches and allows the controller to program the switches through the APIs, which provide flexible lookup mode for deploying routing protocols. The SDN controller can obtain information about granular control, network topology, and link statistics, which is used in the centralized load-balancing routing, while such information is not available or difficult to obtain in traditional WSNs without SDN. NonSD-WSN-RL relies on a distributed neighbor discovery approach, which is not efficient in load balancing. Moreover, frequent next-hop neighbor discoveries and data packet forwarding based on distributed communications would lead to a sharp increase in control traffic with the increase of node density. Figure 5 plots the average survival rate of nodes versus the number of data-gathering rounds for the four considered schemes. The survival rate of nodes in a network can be used to evaluate the total energy consumption of a datagathering mechanism [24]. The lifetime of SDWSN is defined as the duration of normal network operations (e.g., datagathering) while the survival rate of nodes is maintained above a threshold (SR th ∈ [0.4, 0.9]). In Figure 4, we set SR th = 0.45 and denote the network lifetime achieved by SWAN, SDN-IC, SDV+GPRS, and SDWSN-RL schemes as R1, R2, R3, and R4, respectively. We can see that R1 < R2 < R3 < R4, with SDWSN-RL achieving the highest average node survival rate among the four schemes. This is mainly because the value-redundancy filtering and load-balance routing of SDWSN-RL effectively inhibit the transmission of low-value flows and balance the distribution of loads across nodes, leading to a longer network lifetime of SDWSN as compared to other three schemes. Figure 6 indicates the remaining energy level of a node after 80 data-gathering rounds normalized with respect to its initial energy level (which is fixed at 2 mJ for all nodes), for 36 different nodes randomly selected in an annular area centered at the coordinate (48 m, 82 m) (i.e., the event center) with the inner and outer radiuses of 10 meters and 20 meters, respectively. The experiment results show that for almost all the selected nodes, SDWSN-RL achieves the highest residual energy level among the four considered schemes. This would effectively prolong the lifetime of SDWSN. With each scheme, the normalized residual energy level varies across different nodes. A higher (lower) level of the remaining energy is due to the smaller (larger) amount of data flows that the node has forwarded. Compared with the three existing schemes, SDWSN-RL offers a more balanced distribution of energy consumption across the network nodes. This is mainly due to the proposed load-balancing routing mechanism, which utilizes global network information to construct optimal routing paths in a centralized manner. Figure 7 plots the average energy consumed by the four considered schemes for forwarding a single bit of data (mJ/bit) while meeting the same QoS requirement, versus the number of sensor nodes deployed in SDWSN. We can see that the energy consumption per bit of SDWSN-RL increases with the number of sensor nodes at a much slower rate than the three existing schemes, leading to a much lower energy consumption per bit of SDWSN-RL for large numbers of sensor nodes than the existing schemes. This is because the application-oriented in-network fusion in data plane of SDWSN-RL inhibits the transmission of value-redundant flows, and meanwhile the flow value determined by (1) is not notably affected by the number of traffic sources (i.e., nodes in the event gathering area), while the other three schemes would generate excessive traffic loads due to the large amount of local information exchange for executing distributed algorithms in data plane and the large amount of control overhead for cross-layer interaction, which degrade the energy efficiency especially for SDWSN with a large number of sensor nodes. Figure 8 shows the comparison of control traffic cost between our proposed SDWSN-RL and the other three schemes (SWAN, SDV+GPRS, and SDN-IC), where the control traffic cost is defined as the ratio of control overhead to network throughout, and the network throughput is defined as the rate of successful bit delivery from the IG module to the monitor center. Figure 8 where topology denotes the change in network node density and event represents the variation in Poisson event arrival rate. Since topology and event are the two major factors influencing the network status, the update interval of update indicates the frequency of network status change. When the WSNs status in monitoring applications frequently changes (e.g., the redistribution of data traffic caused by changes in topology or event arrival rate), the control overhead will increase due to frequent reconfiguration processes.
We can see from Figure 8 that the control traffic cost decreases with the increasing update interval of update for all the considered schemes. The control traffic cost of the proposed SDWSN-RL is much lower than the other three schemes for all considered values of update update interval. This is because the value-redundancy filtering in SDWSN-RL inhibits the transmission of value redundant loads, thereby avoiding generating control traffic between CIM and IM when the changes in network topology and/or event arrival rate do not significantly affect the monitoring data value. Routing decisions are made by CIM in the controller and then translated into rules and actions to be deployed in flow tables. Network APIs are used to configure flow tables for routing, in conjunction with a floodless service discovery mechanism. After the controller has configured routing, data packets can be forwarded and processed in the data plane. Subsequent (follow-up) packets in a flow are forwarded in the data plane based on the configured routing in flow tables without any further participation from the control plane. Such reductions in control traffic free up radio resources for more data packets to be successfully delivered, thus further lowering the control traffic cost. However, the other three schemes (i.e., SWAN, SDN-IC, and SDV+GPRS) adopt broadcast based service discovery mechanisms, where distributed systems collaboratively create the routing table and the amount of control messages grows while network throughput declines with the increase of network node density or event arrival rate.
Since the throughput is inversely proportional to the control traffic cost for given control overhead, the results in Figure 8 also indicate that SDWSN-RL achieves the highest throughput among the four schemes considered, because it significantly reduces local control message exchanges, thereby freeing up radio resources for more data packets to be successfully delivered. The other three schemes (SWAN, SDN-IC, and SDV+GPRS) use broadcast-based service discovery mechanisms and distributed protocol, where switches need to wait for the wireless medium to be free to send their packets and many data packets may have to be dropped due to the wait, thereby limiting the throughput. Moreover, the broadcast based service discovery mechanisms require high volumes of control messages to be exchanged and high packet processing overhead.

Conclusion
In this paper, we have proposed a SOF-based SDWSN prototype for improving the energy efficiency and adaptability of WSNs in environmental monitoring applications, taking into account the inherent constraints of WSNs in terms of energy, radio resources and computational capabilities, and the distributed data flows of monitoring applications. Experiments results have shown that the proposed SDWSN prototype can greatly improve energy efficiency by effectively inhibiting the transmission of value-redundant loads, reducing the amount of cross-plane communications and enhancing the load balance in SDWSN.
In our future work, we will improve the scalability of control-plane mechanisms using decentralized coordination to overcome the bottleneck of a single logical controller and develop an adaptive anti-interference mechanism to improve the robustness of SDWSN for diverse monitoring applications in wireless environments with severe interference.