Resource Allocation in Cognitive Radio Wireless Sensor Networks with Energy Harvesting

The progress of science and technology and the expansion of the Internet of Things make the information transmission between communication infrastructure and wireless sensors become more and more convenient. For the power-limited wireless sensors, the life time can be extended through the energy-harvesting technique. Additionally, wireless sensors can use the unauthored spectrum resource to complete certain information transmission tasks based on cognitive radio. Harvesting enough energy from the environments, the wireless sensors, works as the second users (SUs) can lease spectrum resource from the primary user (PU) to finish their task and bring additional transmission cost to themselves. To minimize the overall cost of SUs and to maximize the spectrum profit of the PU during the information transmission period, we formulated a differential game model to solve the resource allocation problem in the cognitive radio wireless sensor networks with energy harvesting, considering the SUs as the game players. By solving the proposed resource allocation game model, we found the open loop Nash equilibrium solutions and feedback Nash equilibrium solutions for all SUs as the optimal control strategies. Ultimately, series numerical simulation experiments have been made to demonstrate the rationality and effectiveness of the game model.


Introduction
With the development of the Internet of Things (IoTs), all things will have the ability to perceive the environment. Through the deep integration of the physical world and the digital world [1], the IoTs revolutionizes connection and interaction among all objects. Based on the IoT, the physical equipment and physical world are no longer cold but become a vitality community. In addition, wireless sensor networks (WSN), as the nerve ending of IoTs, have attracted extensive attention and research interest by many scholars in recent years [2], and their theoretical and applied research has received more and more attention [3].
WSNs are composed of a series of sensor nodes. Generally, communication between the sensor nodes is completed using the common frequency band. However, with the development of wireless communication, this frequency band is increasingly crowded, and the communication between sensor nodes receives interference not only by their own nodes, but also increasingly serious and uncontrollable interference from other application type networks. As interconnected devices increase rapidly, sensor nodes with multiple applications overlapping coverage areas will suffer from more severe interference. Over the past few years, cognitive radio networks have been proposed as an effective way to alleviate the spectrum scarcity problem [4]. Many scholars have realized that cognitive radio that focused on transmission strategies, which aim to improve energy use efficiency and choose different channel models to improve the efficiency of energy collections [25,26]. Recently, cognitive radio network energy-harvesting systems based on spectrum sharing have become a significant research direction. A novel energy cooperation transmission scheme was proposed for cognitive spectrum-sharing-based D2D communication system in [27]. Li et al. [28] considered a spectrum-sharing scheme based on simultaneous wireless information and power transfer (SWIPT). Zhang et al. [29] proposed a new cooperative spectrum-sharing protocol with dynamic time-slot allocation based on energy harvesting. However, none of the related works mentioned above takes into account resource allocation in CRWSNs with energy harvesting.
In CRWSNs with energy harvesting, the wireless sensors can be considered as the secondary users (SUs), and the base station or infrastructure can be seen as the primary user (PU). The SUs can capture the spectrum from the PU. Before information transmission, the SUs should harvest enough energy for information transmission. Then, when the channels are not occupied by the PU, the SUs will use the harvest energy to transmit information. The available information transmission spectrum mainly depends on the spectrum leased from the PU. The SUs should control their spectrum requirements to lower the spectrum lease cost, while completing the information transmission tasks. In this paper, we investigated the optimal resource allocation problem for the PUs and SU based on a differential game. The SUs control their spectrum requirements to minimize the cost.
In this paper, we propose a differential game-based resource allocation problem in CRWSN. The system state of the proposed CRWSN is the capacity of the spectrum resource that the PU wants to lease. All the wireless sensors that can be seen as SUs should control the resource level leased from the PU to minimize their cost during the information transmission. We obtained open loop and feedback Nash equilibria from the SUs. The numerical results are given to present the correctness of the differential game analysis. The whole paper is organized as follows: Section 2 is the system differential game model and problem formulation, which consists of two parts, that is, system model and game formulation. Section 3 is the game analysis, which consists of three parts, that is, open loop Nash equilibrium, feedback Nash equilibrium under the finite horizon, and feedback Nash equilibrium under the infinite horizon. Section 4 is a numerical simulation and analysis. Finally, there is the conclusion about the main work and the corresponding summary in Section 5.

System Model
As shown in Figure 1, here, we consider a CRWSN with one primary user (PU), one fusion center (FC), and N secondary users (SUs) with N ≥ 2. All the SUs in the SU system share the same spectrum band with the PU. Meanwhile, the SUs communicates with the FC via the same channel of the PU. We assume that the SU is equipped with the energy harvest circuits. The SUs can harvest energy from the ambient environment (e.g., solar, wind, and radio frequency) to increase the lifetime. The harvest energy is stored in a rechargeable battery with finite energy storage capacity. With the harvest energy, the SUs can achieve information transmission with the FC. The SUs can harvest energy when the PU's spectrum is busy. The status of the spectrum is updated by the PU. When the spectrum is not used by the PU, the PU will announce its status and lease its spectrum to the SUs. The spectrum status can be seen by the SUs through the available spectrum idle status map [30], which can be dynamically updated by the PU through collaboration with SUs. Using the leased spectrum, the SUs can directly transmit data to the FC with the harvest energy. The work mode of SUs can be considered as the harvest-then-transmit mode. Let ( ) denote the leased spectrum of SU at time , for ∈ , where each SU controls its spectrum band from the PU. Let be the set of admissible spectrum band. The harvest-thentransmit mode in our proposed model includes two phases as shown in Figure 2, the energyharvesting phase and the information transmission phase, respectively. In the energy-harvesting phase, the SUs will harvest enough energy for information transmission. In the information transmission phase, the SUs will use the harvested energy and leased spectrum to complete the information transmission tasks. Let denote the normalized channel idle period, and (1 − ) is the normalized channel busy period. When the channel is idle, is the time fraction of the SUs for information transmission. When the channel is busy, (1 − ) is the time fraction of the SUs for energy harvesting. The harvest energy during the (1 − ) period will be stored in the battery of the SUs for information transmission during the period. For example, the normalized idle period for the Disney TV channel is approximately 0.25 (25%), which means = 0.25 for the SUs in the Disney TV channel. Let denote the spectrum price of the PU when leasing the spectrum to SUs. The PU can adjust its price continuously as a seller during the idle period to maximize the spectrum income. The SUs should adjust the corresponding leased spectrum according to the needs and the spectrum price announced by the PU.

Game Formulation
We formulate the resource allocation problem in the CRWSN with energy harvesting as a differential game as follows: • Players: The set of SUs in the proposed CRWSN are the players of the differential game; • State: The system state of the proposed CRWSN is the capacity of the spectrum resource the PU wants to lease; • Strategy: The strategy of each SU is the leased spectrum resource from the PU. Accordingly, the strategy set can be denoted as = , , … , .
We aim at minimizing the overall cost of the SUs during the energy-harvesting and information transmission period. The overall cost function for each SU is given by the following equation: Let u i (t) denote the leased spectrum of SU i at time t, for i ∈ N, where each SU controls its spectrum band from the PU. Let U i be the set of admissible spectrum band. The harvest-then-transmit mode in our proposed model includes two phases as shown in Figure 2, the energy-harvesting phase and the information transmission phase, respectively. In the energy-harvesting phase, the SUs will harvest enough energy for information transmission. In the information transmission phase, the SUs will use the harvested energy and leased spectrum to complete the information transmission tasks. Let β i denote the normalized channel idle period, and (1 − β i ) is the normalized channel busy period. When the channel is idle, β i is the time fraction of the SUs for information transmission. When the channel is busy, (1 − β i ) is the time fraction of the SUs for energy harvesting. The harvest energy during the (1 − β i ) period will be stored in the battery of the SUs for information transmission during the β i period. For example, the normalized idle period for the Disney TV channel is approximately 0.25 (25%), which means β i = 0.25 for the SUs in the Disney TV channel. Let π p denote the spectrum price of the PU when leasing the spectrum to SUs. The PU can adjust its price continuously as a seller during the idle period to maximize the spectrum income. The SUs should adjust the corresponding leased spectrum according to the needs and the spectrum price announced by the PU. Let ( ) denote the leased spectrum of SU at time , for ∈ , where each SU controls its spectrum band from the PU. Let be the set of admissible spectrum band. The harvest-thentransmit mode in our proposed model includes two phases as shown in Figure 2, the energyharvesting phase and the information transmission phase, respectively. In the energy-harvesting phase, the SUs will harvest enough energy for information transmission. In the information transmission phase, the SUs will use the harvested energy and leased spectrum to complete the information transmission tasks. Let denote the normalized channel idle period, and (1 − ) is the normalized channel busy period. When the channel is idle, is the time fraction of the SUs for information transmission. When the channel is busy, (1 − ) is the time fraction of the SUs for energy harvesting. The harvest energy during the (1 − ) period will be stored in the battery of the SUs for information transmission during the period. For example, the normalized idle period for the Disney TV channel is approximately 0.25 (25%), which means = 0.25 for the SUs in the Disney TV channel. Let denote the spectrum price of the PU when leasing the spectrum to SUs. The PU can adjust its price continuously as a seller during the idle period to maximize the spectrum income. The SUs should adjust the corresponding leased spectrum according to the needs and the spectrum price announced by the PU.

Energy harvest
Information transmission Busy Idle

Game Formulation
We formulate the resource allocation problem in the CRWSN with energy harvesting as a differential game as follows: • Players: The set of SUs in the proposed CRWSN are the players of the differential game; • State: The system state of the proposed CRWSN is the capacity of the spectrum resource the PU wants to lease; • Strategy: The strategy of each SU is the leased spectrum resource from the PU. Accordingly, the strategy set can be denoted as = , , … , .
We aim at minimizing the overall cost of the SUs during the energy-harvesting and information transmission period. The overall cost function for each SU is given by the following equation:

Game Formulation
We formulate the resource allocation problem in the CRWSN with energy harvesting as a differential game as follows: • Players: The set of N SUs in the proposed CRWSN are the players of the differential game; • State: The system state of the proposed CRWSN is the capacity of the spectrum resource the PU wants to lease; • Strategy: The strategy of each SU is the leased spectrum resource from the PU. Accordingly, the strategy set can be denoted as U i = {u 1 , u 2 , . . . , u N }. We aim at minimizing the overall cost of the SUs during the energy-harvesting and information transmission period. The overall cost function for each SU is given by the following equation: where U sp i (t) is the spectrum cost for the spectrum leased from the PU, which is a linear function of the spectrum leased from the PU and is given by β i π p u i (t). If the spectrum price π p is announced by the PU, the SUs should pay for the leased spectrum to the PU with payment β i π p u i (t). Because the battery capacity of the SUs is limited, each SU is equipped with the energy-harvesting circuit to harvest enough energy before the information transmission. U eh i (t) is the cost of energy harvesting of SU i to have enough energy for information transmission and can be given by ( where η i is the conversion efficiency of the harvest energy and ε i is the QoS requirements of SU i, such as energy efficiency, or spectrum efficiency. Based on the harvest-then-transmit mode, there are two costs during the harvest-then-transmit process, the spectrum cost U sp i (t) and the energy-harvesting cost U eh i (t). U dis i (t) is the discrepancy cost between the spectrum requirements and available capacity of the spectrum resource. Without loss of generality, we use the quadratic function to express the discrepancy cost between the spectrum requirements and the available capacity, and the cost is and ω dis i are weighted parameters. In the above equation, we find that we do not consider the interference cost, because we have the spectrum cost due to the spectrum leased from the PU.
The SUs will acquire the availability of spectrum resource at the beginning of the idle time slot from the PU and will share the spectrum resource with the other SUs. To avoid interference among SUs, each of the spectrum resources can only be allocated to one SU. Meanwhile, at the beginning of the idle time slot, the PU will announce a unit price for the leased spectrum. Then, the SUs should decide the spectrum requirements based on the price and their acquirements. Let x(t) be the capacity of the spectrum resource at time t. In particular, we have U i ∈ R + , x(t) > 0, and U i = 0 for x(t) = 0. The evolution of spectrum capacity x(t) can be given by the following differential equation: where α i is a negative constant which means the spectrum leasing efficiency of SU i from the PU. δ is the spectrum loss rate during the spectrum leasing. From Equation (2), we can find that the dynamic variation of the spectrum capacity is mainly controlled by the control variables u i (t). The evolution of spectrum capacity is an ordinary differential equation of variables u i (t).
Based on the above models, we formulate the optimal resource allocation problem in the proposed CRWSN. Assuming the observing time for the proposed game is [t 0 , T], the objective for SU i can be written as differential game as follows: subject to: where Φ i (x(T)) is the terminal cost after the game period that the SUs can achieve, which is given by the spectrum capacity x(t) at the end of the time interval. r is the discount rate and e −rt is the discount factor. SU i is to find the optimal spectrum strategy that can minimize its overall cost function over the time interval [t 0 , T]. In the following section, the equilibrium to the differential game is given based on the Hamilton-Jacob-Bellman (HJB) function.

Game Analysis
In this section, we try to find the equilibrium solution to the proposed differential game. Based on the spectrum price controlled by the PU, the SUs should control their spectrum requirements to minimize the overall cost given by Equations (3) and (4). Firstly, assuming the initial state of the spectrum capacity is known by all the SUs, we can find the open loop Nash equilibrium for each SU. Then, the feedback solutions to the proposed game are discussed based on Bellman dynamic programming, when the game players know the exact system state at time instant t.

Open Loop Nash Equilibrium
Definition 2. The Hamiltonian function of each SU in the proposed differential game in the time period [t 0 , T] can be given as follows: where Λ i (t) is the costate function and is given by: Theorem 1. The allocated spectrum resource u * i (t) provides an open loop Nash equilibrium to the proposed resource allocation game in Equations (3) and (4) if there are constate functions Λ i (t), satisfying the following equations: Considering the optimal control problem given by Equations (3) and (4), based on the Pontryagin's maximum principle, we can have the Nash equilibrium solutions of the optimal resource allocation problem for each SU. Theorem 2. The optimal resource allocation strategy of SU i is given by: where [x(t), Λ i (t)] are the solutions to the following Riccati function: In Equation (9), the allocated spectrum resource for SU i is a linear function of the system state x(t), and affected by the unit spectrum price π p controlled by the PU. The SU should make decision for the spectrum from the PU based on the available system capacity x(t) at time t and should consider the influence of unit resource price π p .

Proof.
For the open loop equilibrium, Pontryagin's maximum principle can be used as the necessary condition to find the optimal strategies. The Hamiltonian function of the SU is given by Equation (6), and take the derivative of the Hamiltonian function yields: which is the optimal strategy for the SUs under the open loop condition. Based on Equation (12), we can allocate the spectrum resource to SUs for the proposed game durations [t 0 , T].

Feedback Nash Equilibrium under Finite Horizon
In the open loop Nash equilibrium, the system state is not known by all the game players. The game players only know the initial system state. The SUs make decisions on the optimal allocated resource based on the time instant s and the initial system state x(0). Next, we try to find the feedback strategies when the system state is known by the SUs. The optimal solutions to the proposed game under feedback condition depend on the current time and current system state. In order to have the feedback solutions, some definitions are needed. Definition 3. The allocated spectrum resource u * i (t, x) from PU to SU is optimal under feedback condition if the following inequality holds for all control variables u i (t, x) u * i (t, x) in the set of admissible spectrum U i : Theorem 3. The allocated spectrum resource u * i (t, x) provides a feedback Nash equilibrium to the proposed resource allocation game in Equations (3) and (4) if there are continuously differentiable functions V i (t, x), satisfying the following set of partial differential equations: V i (T, x(T)) = Φ i (x(T)), (15) where V i (t, x) is the game equilibrium payoff of SU i at time t ∈ [t 0 , T] with the system state being x, which is called value function of SU i in the proposed resource allocation problem.

Definition 4.
The value function of each SU under feedback control can be given as follows: and satisfying the boundary condition: Given another resource allocation strategy u i u * i , with the corresponding system state x, then we can have the following equations: and: Integrating the above expressions from t 0 to T, we obtain: Performing the indicated minimization in Equations (14) and (15) yields: Substituting Equation (18) into Equations (16) and (17) and solving Equations (16) and (17), one obtains: where A i (t), B i (t) and C i (t) satisfy the following differential equations: Theorem 4. The optimal resource allocation strategy of SU i under the feedback control situation is given by: where A i (t) and B i (t) are the solutions of the differential equations given by Equations (20)- (22).

Feedback Nash Equilibrium under Infinite Horizon
We now turn the proposed game to the infinite-horizon autonomous game with a constant discount factor. In this subsection, the observing time for the differential game is infinity. The objective function and system state function, which are given by Equations (3) and (4), are both non-autonomous. The game problem in Equations (3) and (4) is changed to a problem with infinite horizon as follows: subject to: Under the infinite horizon, the solutions are independent of time-instant and dependent only on the system state at the starting time. A feedback solution for the infinite horizon game in Equations (24) and (25) can be characterized as follows.
Theorem 5. The allocated spectrum resource u * i (t, x) provides a feedback Nash equilibrium to the proposed resource allocation game in Equations (24) and (25) if there are functions W i (t, x), satisfying the following set of partial differential equations: Theorem 6. The optimal resource allocation strategy of SU i is given by: Proof. Performing the indicated minimization in Equation (27), we can obtain: Incorporating the solution u * i into Equations (24) and (25), and solving the equations, we can obtain: where WA i (x), WB i (x), and WC i (x) satisfy the following equations:

Numerical Simulations
In this section, a series of numerical simulation experiments have been done using a mathematical software named MATLAB, version R2016a, to show the optimal strategy's change over time about each of the SUs' resource leased through the differential game model formulated and solved above. In the next portion, we comprehensively analyze the differential game model open loop Nash equilibrium solution and feedback Nash equilibrium solution to SUs about the spectrum band resource leased. In the simulations, for both the open loop and feedback situation, three SUs are chosen to make the simulation environment, and to show the dynamic changing process of the system strategy based on the proposed differential game model. All the differential game model simulation parameters that are used in the experiment process are shown in Table 1, where we see that there are some parameters (discount rate r, the unit price π p that the PU appointed to lease their spectrum resource, the spectrum loss rate δ during the process of spectrum leasing) that are the same for the three users; on the one hand, to simplify the simulations, we consider some special scenarios. Table 1. Simulation parameters setting in the differential game model.

Parameters
i Firstly, we analyze the open loop solution and feedback solution u * i (t) of the differential game model that are formulated in this paper, and from Section 3, we have got the optimal resource allocation strategy expression of SU about the open loop Nash equilibrium and feedback Nash equilibrium. Through the optimal strategy expression and the parameters' setting, we get the strategy simulation figure, and via the simulation in Figures 3a and 4a, we see that the changing of the resource allocation strategy u * i (t) over time is decreasing gradually and stabilizing to a certain value in both the open loop and feedback solution simulation figure, which satisfies the actual fact that we know u * i (t) represents the spectrum resource that SU i leased from the PU at t instantaneous, and in the information transmission stage, the leased spectrum and the harvested energy are used by the SUs to complete certain information transmission tasks; as time goes by, some information transmission tasks may tend to be finished or stable, which results in that fewer spectrum resources may be needed with time than the previous moment, so the spectrum resource that needs to be leased from the PU may decrease gradually, and with transmission tasks finished or tending toward a stable state, some permanent spectrum resource may just be needed to maintain the message transmission status, so the curve trend is stable near to a certain value.        Secondly, we analyze the variation of x(t) over time under the optimal resource allocation strategy u * i (t) in both the open loop and feedback solution situation; from Formula (4) and same as with the analysis description of u * i (t) in the above, we know x(t) represents the spectrum resource capacity that the PU wants to rent externally at t instantaneous and the variations of x(t) are relevant to not only the optimal strategy u * i (t) but also x(t) itself. Here, through Formula (4), the optimal strategy u * i (t), and the parameters' setting, we get the changing simulation figure about x(t), Figure 3b Figures 3b and 4b, we see that the trend of x(t) over time gradually decreases and stabilizes to a certain value, which satisfies the actual fact that, on the one side, with the information transmission tasks near to being finished, the resource of the spectrum band that the SU wants to lease from the PU may be reduces, which result in the consequence that the resource the PU wants to rent outside may be depressed to achieve the maximum of resource utilization. On the other side of the shield, long-term rental resources may cause the reduction of the spectrum used by the PU, which may bring about the reduction of its work efficiency to the PU; therefore, the spectrum that the PU rents externally may decrease over time. The optimal solutions in the feedback situation under an infinite horizon are also analyzed, which are given in Figure 5. Based on the results given in Figure 5, we can find that the system state will be changed with the time varying, and the optimal solutions would be changed, and there will be more optimal solutions for the users to choose.

Conclusions
In this article, the secondary users that have the energy harvest function can achieve the information transmission task via its stored energy and the spectrum resource leased from the PU. The act of leasing a spectrum resource brings about a cost increase for SUs. To minimize the cost, firstly, we formulated a differential game model to represent it mathematically and figuratively. Secondly, by solving the game model, the open loop Nash equilibrium solutions and the feedback Nash equilibrium solutions were obtained, which illustrates the fact that optimal resource allocation strategies for SUs exist. Finally, a certain number of numerical simulations was done to verify the correctness of the differential game model.
Author Contributions: H.X. conceived the main idea and the differential game theory model; all authors contributed to data analysis and wrote the paper.