Self-Organized Cognitive Sensor Networks: Distributed Channel Assignment for Pervasive Sensing

We study the channel assignment strategy in multichannel wireless sensor networks (WSNs) where macrocells and sensor nodes are overlaid. The WSNs dynamically access the licensed spectrum owned by the macrocells to provide pervasive sensing services. We formulate the channel assignment problem as a potential game which has at least one pure strategy Nash equilibrium (NE). To achieve the NE, we propose a stochastic learning-based algorithm which does not require the information of other players’ actions and the time-varying channel. Cluster heads as players in the game act as self-organized learning automata and adjust assignment strategies based on their own action-reward history. The convergence property of the proposed algorithm toward pure strategy NE points is shown theoretically and verified numerically. Simulation results demonstrate that the learning algorithm yields a 26% sensor node capacity improvement as compared to the random selection, and incurs less than 10% capacity loss compared to the exhaustive search.


Introduction
In wireless sensor networks [1], spatially distributed, lowpower, and low-cost sensor nodes are deployed in a geographical area to monitor the environment. The sensor nodes usually form clusters, and in each cluster there is an energyrich sensor node acting as the cluster head, while other sensor nodes are referred to as cluster members. A cluster head is a special sensor node with better cognitive radio (CR) [2,3] functionality and is responsible for the spectrum sensing and the channel assignment among its cluster members.
To enable the various kinds of services [4][5][6] provided by a pervasive sensing system, proper radio resource management [7] is important. Due to the spectrum scarcity and the ad hoc nature of sensor network deployment, it could be hard to assign licensed bands to sensor networks. Therefore, the CR technology has been considered as a promising solution to the channel assignment problem of sensor networks. CR technology enables dynamic spectrum access (DSA) of unlicensed users in distributed networks. The key idea for CR operation is to allow the active sensing of the dynamic radio environment so as to improve the spectrum utilization. Akan et al. [8] provided a survey on cognitive radio sensor networks. By utilizing the CR technology, the sensor networks are able to attain high data rate due to available spectrum holes. In addition, dynamic spectrum access helps mitigate the interference incurred by dense deployment of sensor nodes.
Despite the promising features of cognitive sensor networks, the deployment of such heterogeneous networks with sensor clusters underlying the same spectrum as macrocells and in the same geographical area brings new technical challenges. In particular, we are interested in the case of densely populated sensor networks where, due to extensive frequency 2 International Journal of Distributed Sensor Networks reuse, the cochannel interference (CCI) among sensor nodes and the cross-tier interference (between the macrocell and sensor networks) affect the system performance.
In the absence of a central controller, channel assignment in cognitive sensor networks is implemented in a distributed manner. In this paper, we consider the distributed channel assignment for self-organized cognitive sensor networks from a game-theoretic perspective. The main contributions of this paper are summarized as follows.
(i) We model the femtocell channel assignment problem as an ordinal potential game (OPG). The game considers time-varying channel availability as its external state.
(ii) We propose a fully decentralized channel assignment algorithm in which the channel is selected by each cluster head independently based on its action-reward history. The convergence property of the algorithm to a pure strategy NE point is verified theoretically as well as numerically.
This paper is organized as follows. We review the related work and compare it with our study in Section 2. In Section 3, the system model for a cognitive sensor network is presented. Section 4 describes the game-theoretic model of the distributed channel assignment problem. Section 5 presents the stochastic learning procedure carried out by the cluster heads. Finally, numerical results are given in Section 6, with the conclusion drawn in Section 7.
Notations. Normal letters represent scalar quantities; uppercase and lowercase boldface letters denote matrices and vectors, respectively. Given a finite set A, Δ(A) represents the set of all probability distributions over the elements of A. 1 {cond} is the indicator function which equals one if the condition cond is satisfied and zero otherwise.

Related Works
Distributed channel assignment has been extensively investigated for different networking applications where concurrent transmissions among neighboring wireless links exist.
In an interference avoidance scenario, different channels must be assigned to neighboring links. In femtocell networks [9], different methods have been proposed to assign different spectrum to adjacent femtocells. Examples include distributed random access [9], dynamic frequency planning [10], and clustering [11]. For sensor networks with multiple channels, graph theory-based methods have also been considered [12]. These methods can be viewed as variations of frequency planning and usually rely on negotiations among neighboring links. Graph theory is also a popular approach as the interference condition can be represented as nodes and edges in a graph. In sensor networks, Chowdhury et al. [13] proposed the dynamic channel allocation (DCA) and studied the related protocol design. Yu et al. [14] considered a game theory-based approach which takes into account both the network topology and transmission routing information. Channel selection for multicell orthogonal frequency division multiple access-(OFDMA-) based networks using graph framework was considered in [15].
On the other hand, in an interference mitigation scenario, mutual interference is tolerated. Channel assignment for cognitive sensor networks has been studied in [16]. Recently, self-organization of distributed agents based upon reinforcement learning (RL) mechanisms [17,18] has been shown to be effective in the literature. Multiagent Q-learning (MAQL) was applied to femtocell networks in [19,20]. MAQL involves the actions of other agents as the external state and thus requires the sharing of the knowledge of all agents' actions. The stochastic learning (SL), in contrast, updates the actions of users based on their individual action-reward history. Nie and Comaniciu [21] considered the channel selection in cognitive radio using interference mitigation game formulation. SL was also applied to the opportunistic spectrum access in cognitive radio networks [22] to achieve the Nash equilibrium (NE) strategy. However, fully distributed resource allocation in cognitive sensor networks has not been extensively investigated.

System Model
We consider a cognitive radio sensor network consisting of one MBS and sensor clusters under the coverage of the MBS. The method of sensor node clustering and cluster head selection [23] are also interesting topics but are out of the scope of this paper. Also we consider only the single-hop transmission and omit the multihop routing issue for ad hoc networks [24]. The sensors are deployed in an apartment block with a dual-stripe room layout, as shown in Figure 1. In our considered system, the medium access control (MAC) function in a cluster resembles that of cellular systems. The time domain is divided into frames, and a frame is further divided into time slots. In each frame, a cluster head allocates its cluster members (i.e., sensor nodes) in different time slots following a time division multiple access (TDMA) rule. For simplicity, we assume that in each slot each cluster head allocates one sensor node over one of the available channels. We emphasize that the proposed method can be easily generalized to the cases with multiple sensor nodes per slot. A sensor node is in idle mode unless the current time  Figure 2: Exemplary time slot allocation in a frame. In the first slot, cluster heads A and C assign channels for sensor nodes A1 and C1, respectively. slot is allocated for it. A cluster head is idle if none of its sensor nodes is transmitting data and active otherwise. We therefore introduce an active ratio, which is defined as the percentage of active clusters in a time slot. An exemplary time slot allocation is depicted in Figure 2.
The spectrum is divided into channels, and the channels may be licensed to different macrocells (a.k.a spectrum owners). By utilizing CR, the sensor nodes access the same frequency band as the macrocell does. Since the sensor nodes are in an energy-tight situation and operate with ultralow power, we assume that the transmission power of a macrocell user equipment (MUE) is much higher than that of the sensors. Thus, the uplink transmission of an MUE will block the nearby sensor nodes using the same spectrum. For crosstier interference mitigation, we define an avoiding region for each MUE. A channel is available to a cluster only if the channel is not assigned to an MUE whose avoiding region covers the cluster head. The channel availability for sensor clusters is expressed as a binary matrix X ∈ {0, 1} × , in which the element , equals one if channel is available to link and zero otherwise. The elements of X follow the Bernoulli distribution and can be described by a probability matrix Θ ∈ [0, 1] × , where the element , is the probability that , = 1.
Assuming perfect synchronization in time and frequency, let denote the power of sensor node , and let |ℎ , | 2 indicate the link gain between cluster head and sensor node . The interference received by cluster head from sensor node is given by where ( ) is the action (channel selection) of cluster head in frame . For notational brevity, we will hereafter discard the timing dependence of the action ( ) in occasions without ambiguity. Then, the signal-to-interference-andnoise ratio (SINR) at cluster head can be expressed as Consequently, the expected capacity for link in bits/s/Hz is given by Let a = ( 1 , . . . , ) be the channel assignment profile of all active clusters. The global objective of the system is to find the optimal channel selection profile a opt that maximizes the sum capacity. Formally, consider To reflect a practical cognitive sensor network, our system model incorporates the following considerations.
(1) The uplink resource allocation for MUE is timevarying during the learning period, and the channel availability statistics (i.e., Θ) are fixed but unknown to any sensor nodes.
(2) There is no centralized controller and the channel selection is performed independently by each cluster head.
(3) The number of sensor clusters in the system, , is unknown.
With these considerations, solving (4) is a challenging task, since the only available information for decision making at each individual player is its own action-reward history. Thus, a fully distributed channel selection scheme is proposed.

Game-Theoretic Model and Channel Selection
In this section, we present the game-theoretic formulation of the self-organized cognitive sensor network channel selection problem. Our objective is to devise for each cluster head a distributed channel assignment strategy that takes into account the effect of both the sensor-tier and cross-tier interference. We summarize our notations related to the game formulation in Table 1.

Problem Formulation and Game
Model. The channel selection problem described in the previous section can be modeled by a normal-form game with external state, expressed as a 4-tuple: where X is the external state (channel availability) space, N is the set of players (cluster heads), A is the set of actions International Journal of Distributed Sensor Networks An element of A ( ) ∈ A Action (channel selection) of player at slot − ( ) ∈ A Actions of players except for at slot P := Δ(A ) Set of probability distribution over A p ( ) ∈ P Mixed strategy of player at slot ( ) ∈ R Observed utility of player at slot u ( ) ∈ R |A | Estimated utility vector of player at slot ( , ) Learning rates of player (selections of channels) that player can take, and { } ∈N is the utility function of player that depends on his own action as well as on the actions of other players. Inspired by [21], the reward function is designed to consider the interference received (inward) and generated (outward) by each link. In this way, the cluster heads implicitly cooperate to reduce the interference generated toward other sensor nodes. We define the generalized SINR (gSINR) for player as̃= Then the instantaneous reward function of cluster head is designed as By the definition in (7), when the channel is available, the reward is given by Shannon's capacity formula where both inward and outward interference are accounted for. When the channel is not available, the reward is zero. Notice that the calculation of the reward function in (7) relies on the knowledge of other players' action. This leads to overhead due to the required information. The implementation is possible, and discussion on such protocol design can be found in [21]. The self-organization claimed in this paper is based on the fact that the action in each time instant is selected by each player independently and simultaneously.
For systems with the channel availability as the external state, the utility function is defined as the expected reward of player over the external state (i.e., channel availability X); that is, Furthermore, if the cluster heads are assumed to be selfish and rational players, they will compete to maximize their own individual utility. In fact, a selfish cluster head will not only maximize the capacity of its own user but also reduce the interference. Formally, the game G is expressed as 4.2. Analysis of Nash Equilibrium. With the utility function defined in (8), we show the existence of an NE point for the proposed game in the following proposition.

Proposition 1.
The game G is an ordinal potential game (OPG) which possesses at least one pure strategy NE.
Proof. Consider the function Φ : × ∈N A → R + : Now consider an improvement step made by cluster head that changes its action unilaterally from tŏ, so that Here we have used the fact that when cluster head changes its action, the effects are only on the interference that it receives ( → ) and generates ( → ). From (10) and (11), we obtain Therefore, G is an OPG with potential function Φ, and the existence of a pure strategy NE is always guaranteed [26] since it coincides with the local maxima of the potential function. This completes the proof.
Notice that the term ∑ =1 ∑ =1, ̸ = → in the potential function Φ denotes the summation of all mutual interference in the sensor network. Therefore, every NE point is the strategy profile, that is, a local maximum of the summed interference.

Stochastic Learning Procedure
Here, we discuss obtaining the NE via stochastic learning. As the channel state is time-varying and the action is selected International Journal of Distributed Sensor Networks 5 (1) Initially, set = 0. Set the channel assignment probability vector and utility estimation as , (0) = 1/ A ,̂, (−1) = 0, ∀ ∈ N, ∈ A .
(2) At the beginning of the th slot, each player selects an action ( ) according to the current channel assignment probability p ( ).
(3) In each slot, each BS transmits data. At the end of each slot, each BS receives the instantaneous reward ( ) specified by (15) depending on the precoding scheme. (4) All players update their channel assignment probability vector and utility estimation according to the rules: where and are the learning rates for action probability and utility estimation, respectively.
by each player simultaneously and independently in each play, previous algorithms that require complete information (e.g., better response dynamics [26]) may not be applicable here. Thus, we propose a decentralized stochastic learning-(SL-) based algorithm by which the BSs learn toward the equilibrium strategy profile from their individual actionreward history.
To facilitate the development of the SL-based channel selection algorithm, we extend the channel selection game into a mixed strategy form. Let p ( ) = [ ,1 ( ), . . . , , ( )] , for all ∈ N, be the channel selection probability vector for player , where , ( ) is the probability that player selects strategy ∈ A at slot . More precisely, using mixed strategies means that the channel assignment of cluster head is the outcome of a probabilistic experiment based on the probability vector p (imagine that each SU rolls a biased dice in each strategy update). The mixed-strategy extension of the utility function is defined upon × ∈N P , where P is the set of probability distributions over the action space of player . Let P( ) = [p 1 ( ), . . . , p ( )] be the mixed strategy profile of G, and let ( , P) be the expected reward function of player if he employs pure strategy while other players , for all ∈ N, ̸ = , employ a mixed strategy p ; that is, The proposed distributed channel assignment (DCA) algorithm for cognitive sensor networks is described in Algorithm 1.
In each play, the channel selection is based on a probability distribution over the set of channels. After each play, cluster head obtains the instantaneous reward and updates the mixed strategy (i.e., channel selection vector) p ( ) and utility estimationû ( ). Notably, the utility estimation serves as a reinforcement signal so that higher utility induces higher probability in the next play. Furthermore, the proposed learning algorithm is fully distributed, and the channel selection is solely based on individual action-reward experience without a centralized controller. In fact, the proposed algorithm belongs to the combined fully distributed payoff strategy reinforcement learning (CODIPAS-RL) [27]. The evolution of the mixed strategies is described as follows.

Proposition 2.
If the learning rates are sufficiently small, the sequence {P( )} converges to P * , which is the solution for the following ordinary differential equation (ODE): Proof. Please refer to [28,Section 4].
The ODE in (14) is actually the ODE of the replicator dynamics [29]. An intuitive interpretation is that the probability of taking an action increases if the utility is higher than the average utility over all possible actions and decreases otherwise.
The convergence property of the proposed algorithm is discussed in the following proposition.
Let Ψ(P) be the expected potential function; that is, For OPGs, Ψ(e , P − ) = Ψ(P)/ , is an increasing function of (e , P − ). Let By applying (15) and (18) where the last inequality holds since, given the condition in (18), , , and , , always have the same sign. Thus, Ψ is nondecreasing along the trajectories of the ODE, and asymptotically all the trajectories will be in the set {P ∈ P : Ψ(P)/ = 0}. From (15) and (19), the following is known: In other words, when starting from an interior point of the simplex of the mixed strategy space P, the sequence P( ) converges to a stationary point of the ODE in (15). By Proposition 3, we complete the proof.
While the SL-based learning algorithm converges to an NE point when the learning rates approach zero, smaller learning rates lead to a slower convergence. Therefore, the choice of the learning rates strikes a trade-off between accuracy and speed and may be determined by training in practice.

Numerical Results
For system-level simulations, we consider a cognitive sensor network deployed within the coverage of a cellular network. As in Figure 1, the simulation environment includes one macrocell covering one dual-stripe apartment block. The apartment block contains 40 single-floor apartments. There is one sensor cluster in each apartment. When a sensor cluster is active, its cluster head assigns one channel to cluster members randomly located in the same apartment. Without loss of generality, we consider the channel assignment in the first slot of each frame, in which for each active cluster there is one cluster member. The simulation parameters are listed in Table 2.

Convergence of the Proposed SL-Based Learning Algorithm.
We first study the time-evolving behaviors of the proposed stochastic learning method. Figure 3 shows the evolutions of the channel assignment probabilities (i.e., mixed strategy) using the proposed SL-based algorithm. We consider different learning rates and study the convergence behaviors. It is observed that, with equal initial probability,  the channel assignment probability converges to a pure strategy (i.e., the probability of choosing one strategy approaches one) in around 80 and 20 iterations for = 0.1 and = 0.5, respectively. As expected, larger learning rate results in faster convergence. Figure 3, the convergence toward pure strategy is observed for both = 0.1 and = 0.5. An intuitive question to ask is as follows: does the resulting strategy profile achieve the Nash equilibrium? In Figure 4, we verify the NE property by testing the unilateral deviation with a 25% active ratio and different learning rates. As can be seen from Figure 4(a), when = 0.1, a unilateral deviation results in lower utility for all players. In other words, the outcome of the learning algorithm is an NE point. On the other hand, when = 0.5, as shown in Figure 4(b), links number 4 and number 8 both achieve higher throughput by unilateral deviation, and thus the resulting strategy is no longer an NE point. These results reflect the trade-off between accuracy and convergence speed which we mentioned before.

Evolution of Actions.
During the learning procedure, the channel assignment is based on probabilistic experiments. When the channel assignment changes in the next frame, the switching between different channels brings overhead since the sensor node needs to be reconfigured. The evolution of actions for selected players is shown in Figure 5. As can be seen, while Figure 3 (Left) reveals that it takes around 80 iterations for all players to converge to pure strategies, the actions seldom change after about 60 iterations in the learning procedure. This suggests that channel switching, if at all happens, usually happens only in the beginning of the entire learning procedure. Actually, our proposed learning algorithm aims at learning the equilibrium strategy in the long run. The channel switching and the incurred sensor node reconfiguration are manageable overheads compared to the long operation time.

Different Active Ratios.
We further consider different active ratios and investigate the convergence behaviors under different levels of mutual interference. The results for active ratio of 50% and 75% are shown in Figure 6. We observe that the convergence toward pure strategy takes around 100 and 150 iterations for active ratio of 50% and 75%, respectively. Comparing the case of 25% active ratio in Figure 3 (Left), we see that it takes fewer iterations for densely active networks to converge than for sparsely active sensor networks.    While the altruistic utility function design reduces the mutual interference, we are also interested in the performance of Nash equilibrium strategy in terms of the throughput of each cluster as well as the whole system. Therefore, in Figure 7, we test the change in capacity under unilateral deviation from the NE strategy for all players. As depicted in Figure 7(a), there is no significant change in the average capacity per sensor link when only one player unilaterally deviates from its NE strategy. From Figure 7(b) we observe that, for all players, deviation from NE strategy decreases their own capacity.

Comparison with Other Methods.
We further compare the performance of the proposed channel selection scheme with two other approaches, namely, random allocation and exhaustive search, described as follows.
(i) In the random allocation scheme, each cluster head randomly selects a channel for its sensor node in each frame. Neither learning algorithm nor centralized controller is implemented.
(ii) In the exhaustive search scheme, it is assumed that there exists a centralized controller which knows all system information including the channel gains, the channel availability statistics, and the number of clusters. The channel assignment profile is determined by maximizing the expected sum capacity (i.e., solving (4)). The performance of different channel selection schemes is evaluated by the average capacity per sensor node, avg = (1/ ) ∑ =1 , and the fairness among sensor nodes. In the literature, fairness of resource allocation is usually quantified by Jain's fairness index (JFI) [30], which is defined as The value of JFI falls in the interval of [1/ , 1], and a higher JFI value indicates better fairness.
The simulation results of average capacity and JFI for different active ratios are summarized in Table 3. We observe that the exhaustive search method results in the best average capacity with the worst fairness. The random selection scheme, in contrast, has the lowest average capacity but good fairness due to its randomness nature. The proposed method shows well-balanced performance in terms of both average capacity and fairness. The results show the advantages of the proposed method; through the learning procedure toward equilibrium, the capacity of each player is considered and fewer players are sacrificed. If we examine the final channel selection profile, it is observed that, in the progress of convergence toward the NE point, the proposed learning algorithm allocates the mutually interfered users on different channels.

Conclusion
In this paper, we have studied the problem of distributed channel assignment in self-organized cognitive sensor networks with unknown channel and unknown number of clusters. We have presented a game-theoretic approach to distributively manage interference and enable the coexistence of sensor and macrocell operations in a scenario where sensor nodes operate in the same spectrum as a cellular system. We modeled channel assignment problem by means of an ordinal potential game. A decentralized stochastic learning algorithm has been proposed. Simulation results have demonstrated the convergence of the algorithm toward a pure strategy Nash equilibrium with sufficiently small learning rates. The proposed method outperforms the random selection scheme in terms of average capacity, while the performance loss compared to the exhaustive search is limited. In addition, its fairness level is comparable to that of the random selection and surpasses the exhaustive search scheme.