Coalition Formation Approaches for Cooperative Networks With SWIPT

This paper proposes three game-theoretic approaches for coalition formation in cooperative networks with simultaneous wireless information and power transfer. To improve the reception reliability of destinations with poor channel conditions, we first divide destinations in the network into two types: Type I and Type II. Type I destinations refer to the destinations with capability of successful information decoding and energy harvesting, which serve as relays to help other destinations. Type II destinations have poor connections to the source and hence compete to obtain help from Type I destinations. Accordingly, cooperative relaying strategies for the two types of destinations are proposed on the basis of coalition formation game. First, we propose to utilize the dynamic programming (DP) approach to obtain the optimal coalition structure in the network, though at the cost of heavy time and storage complexity. Then, two distributed hedonic coalition formation (DHCF) approaches are developed to generate coalition structures, which are more efficient than the DP approach. Simulation results show that all proposed approaches outperform the non-cooperative one (i.e., direct link transmission). The results also illustrate that the DP approach achieves the largest data rate and lowest outage probability for destinations, and the DHCF approaches achieve near-to-optimal performance.

a joint robust scheme for secure SWIPT in AF relay networks with considering dual-functional desired receiver. In [5], the joint DF relay selection and power minimization problem has been studied for a cooperative uplink network, with significant power savings over a non-cooperative uplink. In [6], a distributed beamforming and power allocation algorithm was proposed in cooperative networks, to increase the diversity of the system. The above works demonstrate significantly improved network performance benefited from relay transmission. However, cooperative networks are always constrained by energy. Hence, energy harvesting (EH) has been emerged as a sustainable and promising solution to prolong the lifetime of wireless networks [7].
EH is a technique to scavenge energy from the ambient environments, such as solar, wind, vibrations, and etc. Apart from those conventional energy sources, the radio frequency (RF) signal has been recently recognized as a promising new source, enabling simultaneous wireless information and power transfer (SWIPT). The basic concept of SWIPT is that the EH receiver can simultaneously perform information decoding (ID) and EH [8]. Particularly, the practical receiver either switches between the modes of ID and EH periodically [9], or splits the received observations into two portions: one portion for ID and the other for EH [10]. The authors in [11] reviewed the security issues in various SWIPT scenarios, and emphasised the challenges and opportunities for implementing SWIPT. Also, the EH receiver can operate in the full-duplex mode to harvest energy as proposed in [12]. The advance in SWIPT technique over the past decade has enabled wireless energy supply for low-powered devices. The authors in [13] proposed a network architecture for RF charging stations, and clarified that a network with a practical node density can deliver sufficient power for mobile recharging. Hence, SWIPT is appealing to various wireless networks, e.g., wireless sensor networks [14], cognitive radio networks [15], multiple-inputsingle-output networks [16], and cooperative networks [17]. For literature review on SWIPT, refer to [18].
Besides, the network architecture in which relay nodes harvest energy from the received RF radiation is of great interests. In [19], a three-node cooperative network was studied, where the relay node harvests energy from RF radiation, thus improving the performance of the system. Also, the charging/discharging behavior of the battery has been well studied, modeled as a discrete Markov chain. In [20], the optimal time-switching scheme was studied in a dual-hop fullduplex relaying system with SWIPT, which characterizes the achievable throughput of three different communication nodes. In [21], a wireless cooperative network with multiple source-destination pairs and an EH relay was considered, and four power allocation strategies were proposed to distribute the harvested energy. In [22], the authors developed a resource allocation scheme for jointly optimizing the BS transmit power, the received power-splitting factors, and the relay transmit power. In [23], a distributed power splitting scheme for SWIPT in interference channels was designed to improve the network performance, where the EH relays adopt pure or mixed AF and DF. In [24], different strategies to select EH relays for data transmission were investigated in the cooperative networks, where the random distances are captured by using stochastic geometry.
Both [23] and [24] employed game theory to design efficient relaying strategies for EH relays. Game theory is a powerful mathematical tool for designing fair, robust and efficient cooperation strategies [25], and it thus has many potential applications in wireless networks. For example, a Bayesian coalitional game model was developed to study the cooperative packet delivery in wireless mobile networks [26]. In [27], the coalitional game was applied to model the cooperation among single antenna transmitters by adopting the merge-and-split algorithm. In [28], a distributed coalition formation algorithm was developed to optimize the profits of network operators in green cellular networks. In [29], the coalitional game theory and pricing mechanism was applied in vehicular network. Although coalitional game theory is a useful analytical tool, only a few literature has referred to its application in the cooperative networks with SWIPT. Exploration of its application to coordinate the cooperation among EH relays remains to be done. Thus, our work aims at developing efficient game-theoretic approaches to cooperative networks with SWIPT.

B. Main Contributions of this Paper
In this paper, we apply coalitional game theory in a cooperative network with SWIPT, in which there is one source and multiple destinations. To improve the network throughput, both centralized and decentralized approaches are proposed to devise efficient cooperation strategies for destinations. First, we model the cooperative network with SWIPT by grouping destinations into cooperative coalitions, on the basis of types and payoff of destinations. Specifically, destinations in the network are divided into two types, called Types I and II, according to whether they can decode the source messages successfully or not. Type I destinations utilize the harvested energy to relay source messages to Type II destinations in the same coalition. As such, the cooperative network with SWIPT can be modeled as a coalition formation game, where cooperation occurs among Types I and II destinations in the same coalition. To stimulate beneficial cooperation between two types of destinations, three payoff functions are defined to evaluate their gains in a coalition. In particular, payoff function I quantifies the amount of help a destination contributes or receives, which is modified into payoff function II by adopting a modified signal-to-noise ratio (SNR). Payoff function III indicates the data rate of destinations by the modified SNR.
Second, we propose three approaches based on the defined payoff functions, to form appropriate coalition structures for destinations in the network. Specifically, a dynamic programming (DP) approach is developed to generate optimal coalition structure in centralized manner, where destinations achieve highest data rate and lowest outage probability. However, it requires heavy time and storage complexity, especially in large-scale networks. In order to reduce complexity, two distributed hedonic coalition formation (DHCF) approaches are proposed in decentralized manners, where destinations decide individually which coalition to join. As such, the DHCF approaches generate coalition structures more efficiently and achieve near-to-optimal performance.
To summarize, the main contributions of this paper can be listed as follows: 1) We model the cooperative network with SWIPT into a coalition formation game, where Type I destinations serve as relays to help Type II destinations. 2) We develop a DP approach to obtain optimal coalition structure in our modeled scenario, where destinations achieve largest data rate and lowest outage probability.

3)
We propose two DHCF approaches to generate coalition structures more efficiently, dramatically reducing complexity with only little performance loss over the DP approach.
Our work concentrates on utilizing coalitional game theory to devise efficient cooperation strategies for destinations in cooperative networks with SWIPT. It is totally different from the previous works that deal with rate-energy tradeoff [9] [10] or power allocation [12] [21] in wireless networks with SWIPT. Although a few efforts have been paid to explore the application of game theory in cooperative networks with SWIPT [23] [24], they only adopt dedicated relays for cooperative communications. Instead, our work refers to dynamically selecting relays from destinations according to their channel conditions, i.e., Type I destinations. Obviously, the cooperative network with selective EH relays is more adaptive to the changes of wireless channels. Moreover, rather than a simple non-cooperative game modeled in [23], our work models the network into a coalition formation game for more efficient cooperation strategies. Besides, both centralized and decentralized approaches are proposed for coalition formation in our work, while only a decentralized approach is discussed in [24]. In summary, the novelty of this paper is that selective EH relays are adopted and various game-theoretic approaches for efficient cooperation are proposed in cooperative networks with SWIPT.

C. Organization of this Paper
The rest of this paper is organized as follows. In Section II, the cooperative network with SWIPT is modeled into a coalition formation game. In Section III, three payoff functions for coalition formation are developed. In Sections IV and V, three approaches are proposed to generate appropriate coalition structures in the network. In Section VI, the complexity and performance analysis of the proposed approaches are discussed. Simulation results are presented in Section VII and conclusions are drawn in Section VIII.

II. SYSTEM MODEL
Consider a multi-agent cooperative communication network with one source and multiple destinations. The source transmits messages to destinations, all equipped with a single antenna and working in the half-duplex communication scheme. Destinations in the network have different channel connections to the source. The destinations in good channel conditions play as relays to help those in poor conditions. However, it is unfair for relays to offer volunteer help since the relay transmission consumes their own power. Therefore, SWIPT is adopted to power the relay transmission. Each destination is embedded with the EH circuits, if possible, to harvest energy from the source messages.
We introduce a disc D with radius R d to model the network topology, with the source located at the center of a circle and the destinations randomly deployed inside the circle. The source transmits messages with constant power P at a fixed data transmission rate R. Here, let d k and h k be the distance and the fading channel gain between the source and the k-th destination 1 . Then, the used path loss model d * k is where α is the path loss exponent, and d 0 is the threshold distance. Here, it is assumed that wireless links suffer from large-scale path-loss effects. The path-loss model (1) ensures that the path-loss is always larger than or equal to 1 for any distance. The path-loss attenuation is proportional to d α k . When the distance between two nodes is larger than d 0 , the far field model is always the case. Specifically, each destination k receives where s k is the normalized source message intended for destination k, and n k is the additive noise. Therefore, the data rate in destination k for s k is In (3), ρ = P Pn is the receive SNR, where P n is the noise power. Note that wireless channel conditions are random and uncertain, such that the reception reliability cannot be ensured for destinations in poor channel conditions. Specifically, destinations which are far away from the source or poorly connected to the source, cannot necessarily decode the messages successfully. According to whether R k is larger than R, destinations in the network can be categorized into the following two types: • Type I: This type of destinations can decode the source messages correctly, i.e., |h k | 2 . Naturally, Type I destinations are able to harvest energy from the observations after successful decoding, and the harvested energy is utilized to power the relay transmission. 1 In this paper, we assume that destination k denotes the k-th destination in the network which may be of Type I or Type II, whereas destination i denotes a Type I destination and destination j represents a Type II destination. . In this case, Type II destinations cannot decode their messages successfully, thus being unable to harvest energy. In order to improve their transmission quality, they compete with each other to receive help from Type I destinations. In our cooperative network, Type I destinations harvest energy to help Type II destinations. That is, two types of destinations form cooperative groups (namely coalitions). However, if there is no proper coalition to join, a destination may choose not to participate in the cooperation, i.e., being singleton. The notations for coalition formation are given in the following. N destinations form K coalitions, represented by S m , 1 ≤ m ≤ K. Each coalition S m can be further separated into two subsets, denoted by G Sm,1 and G Sm,2 . G Sm,1 contains all Type I destinations in S m , and G Sm,2 is composed of all Type II destinations in S m . One coalition formation example is shown in Fig. 1.
Consider the network with one source and N destinations. Assume that N 2 destinations are of Type II, 0 ≤ N 2 ≤ N . Furthermore, we assume that the number of Type II destinations receiving help from Type I destinations is R, 0 ≤ R ≤ N 2 . Thus, (N + R) time slots are required for the transmissions, and the specific time structure is illustrated in Fig. 2. In this paper, we assume the use of orthogonal multiple access, where only one particular destination is served at each time slot. We divide (N + R) time slots into two phases: the first N time slots as phase 1 and the remaining R time slots as phase 2. The source and the relays transmit messages in phases 1 and 2, respectively, in a time division multiple access (TDMA) manner. During phase 1, the source transmits N messages to the destinations individually. Due to the broadcast nature of the wireless channels, all destinations listen to the entire source transmissions, and perform EH if decoding successfully. As each destination receives data from the source at a time slot, there is no interference in phase 1. At each time slot of phase 2, relays in the same coalition forward the same data to a Type II destination. Thus, at each time slot, the Type II destination receives a mixture of the same data distorted differently by the users' channels, and will not be interfered by other signals. Here, we do not consider interference from other communication networks. Typically, Type I destinations utilize the harvested energy to forward relay transmissions. For convenience, we normalize each time slot T = 1 in the rest of this paper without loss of generality. More details about the two phases are introduced as follows.

A. Phase 1
During each time slot of phase 1, the source transmits message to the intended destination. Due to the broadcast nature of wireless channels, other destinations in the network may receive the message as well. For Type I destinations, they can harvest energy from the observations intended for other destinations after successful decoding. Here, destinations adopt the time-switching strategy to harvest energy, and τ i , 0 ≤ τ i ≤ 1, denotes the percentage of time allocated to EH at each time slot. For any destination k, k = i, during the first (1 − τ k ) portion of the i-th time slot, it directs the following observation to the detection circuit, For simplicity, the channel state as well as the additive noise for each destination are assumed to remain unchanged during the entire transmission time. Therefore, the data rate R k at destination k is To ensure successful decoding at destination k for message s i , the supported data rate has to be greater than or equal to the target data rate R, i.e., R k ≥ R. Therefore, the threshold time-switching coefficient τ k is set as Obviously, the choice of τ k is based on the strategy that destination k first tries to achieve ID of message s i and then harvests energy as much as possible. When the wireless channel condition is poor, the entire i-th time slot is allocated for ID, i.e., τ k = 0. Hence, for Type II destinations, τ k is always zero due to its poor reception signal strength, and no energy can be harvested. In fact, if not decoding the source information in the first place, Type II destinations can also harvest the energy. However, since the channel condition of Type II destinations is poor, the energy harvested by Type II destinations, is rather little compared with that of Type I destinations. Thus, this paper does not consider SWIPT in Type II destinations.
For Type I destinations, τ k > 0, the transmission power harvested at the i-th time slot on the basis of (6) is where constant η ∈ [0, 1] represents the efficiency in harvesting and storing energy, assuming to be identical for all destinations. Indeed, in some cases, if the type I node can barely decode the source messages successfully, τ k is almost 0. Fortunately, in general, the channel condition between the source and Type I node k is good, such that τ k is larger than 0. Each Type I node is equipped with a single energy storage device (e.g., battery, capacitor, etc.), to store the energy for relay transmission. In essential, the harvested energy is stored to power the relay transmission. Most of the available energy is used for relay transmission to satisfy the sensitivity requirements of the rectennas. Also, the distances between the relays and Type II destinations are generally small according to the adopted relay-selection strategy, i.e., coalition formation. We assume that each Type I destination can distinguish the message intended for itself, which is not used for energy harvesting. Therefore, for Type I destination k, it harvests the same amount of energy at each i-th time slot, 1 ≤ i ≤ N, i = k. The amount of harvested energy can be relatively larger, with a larger density of nodes in the network or a longer time interval to harvest energy. Based on (7), the total amount of harvested energy for destination k is (N − 1)P k .

B. Phase 2
Phase 2 is a cooperative phase for relay transmission. At each time slot of phase 2, Type II destinations receive messages forwarded by Type I destinations in the same coalition. Without loss of generality, we assume that each Type I destination offers equal help to Type II destinations in the same coalition. That is, destination i, i ∈ G Sm,1 , equally splits its harvested energy into |G Sm,2 | portions, where |S| denotes the number of destinations located in coalition S. Mathematically, on the basis of (7), the power that destination i employs to transmit message to destination j, j ∈ G Sm,2 , is In terms of relay transmission, the global channel state information (CSI) is assumed to be available. At each time slot of phase 2, type I destinations in the same coalition relay message s j to type II destination j. To transmit the message cooperatively, the cooperation strategy for a qualified relay is described as follows. Each Type I destination i, i ∈ G Sm,1 , performs distributed beamforming and transmits conj(gij ) |gij | √ P i s j to destination j through independent channels.
Here, the power normalization factor conj(gij ) |gij | means avoiding destructive signals at the receiver. During the j-th time slot, destination j observes where conj(g ij ) is the complex conjugate of g ij . In (9), c ij and g ij are the distance and channel gain between destinations, respectively. In addition, y j can be simplified as Afterwards, destination j performs maximum ratio combining (MRC) at the signals received from the source and relays at phases 1 and 2. Therefore, the resulting SNR at destination j is For a Type I destination i, the right hand side of (11) is zero. Hence, the receive SNR for destination i is

III. COALITION FORMATION GAMES
The cooperative relaying strategy, in which Type I destinations utilize the harvested energy to help Type II destinations, improves the network throughput and reception reliability. To coordinate the cooperation, game-theoretical approaches are explored in this paper. Specifically, coalitional game theory is applied to form efficient cooperation structure for destinations in the cooperative networks with SWIPT. Note that destinations in the same coalition need to communicate and exchange information, unavoidably bringing cooperation cost. The cost demonstrates the non-superadditive property of a game, such that the cooperative network can be modeled as a coalition formation game [30]. Some important definitions for a game are introduced as follows.
A game characterizes the interactions and decision-making process of players, who are the agents participating in the game. It can be classified into non-cooperative or cooperative game, wherein cooperation among players makes the game a cooperative one. Furthermore, if the players would join relatively small coalitions rather than the "grand coalition" due to cooperation costs, the game is called a coalition formation game. Typically, in our paper, destinations in the network are players in the modeled coalition formation game. Type II destinations compete to obtain help from type I destinations, and they join small coalitions for beneficial gains. In addition, the strategies are the possible actions that players may take, e.g., to join a possible coalition. The action is the event that a player takes, e.g., joining a coalition. More importantly, the payoff, which is also called the utility, depicts the desirability of a player to join a particular coalition. The payoff is a numerical value, and it measures the benefits or gains in a game. Next, we turn to some important definitions for coalition formation.
Definition 1: The coalition structure of set A, or namely partition, is defined as Definition 2: Given a particular coalition structure Γ, S Γ (k) is the coalition that destination k belongs to, i.e., S Γ (k) = S m , k ∈ S m and S m ∈ Γ.
Definition 3: A coalition structure Γ = {S 1 , S 2 , ..., S K } of set A is claimed Nash-stable, if no destination k ∈ A would like to deviate unilaterally from its current coalition S Γ (k).
As described in Definitions 1 and 2, the coalition structure Γ of set A is a union of the exhaustive and disjoint coalitions in A. According to Definition 3, no single destination in a Nashstable coalition structure can improve its payoff by moving to other coalitions, when other destinations stay in their current coalitions. Here, the payoff refers to the amount of gain that a destination receives from its current coalition, denoted as φ. Besides, the sum of payoffs for all destinations in a coalition constitutes the coalition value, denoted as v.
Considering that the main focus of this paper is generating efficient coalition structures to maximize the network throughput, it is important to employ fair rule to define appropriate payoff. In this paper, we utilize the receive SNR to define the coalition value. It is due to the reason that SNR determines the data transmission rate as well as the reception reliability in wireless communication networks. SNR is a direct measure of the relative power of the noise compared to that of signals, corresponding to the achievable data rate of a destination. We consider that the performance of each link is characterized by its achievable SNR, which is regarded as a network-wide performance metric. For a destination k, k ∈ S m , its allocated payoff is a function with SNR being the variable, i.e., A fair payoff function ensures that destinations are encouraged to join appropriate coalitions, taking both gains and costs of cooperation into account. More specifically, three different payoff functions for destinations in a particular coalition are defined as the following.

A. Payoff Function I
Given a particular coalition S m , Type II destinations in G Sm,2 receive help from Type I destinations in G Sm,1 . Destinations evaluate whether to join a coalition based on their payoffs, and they are motivated to join the coalition with highest payoff. To stimulate beneficial cooperation between two types of destinations, payoff function I quantifies the help that a destination receives or contributes. Specifically, the payoff of destination k in coalition S m , k ∈ S m , is defined on the basis of (11) and (12), i.e., Here, the function (14) is designed to measure a player's benefit or gain. It is made up of three parts, and considers the different roles of types I and II destinations in a game. Specifically, E 1 indicates the loss to the overall SNR with and without destination k. It stimulates a Type I destination to join the coalition S m , i.e., serving as relays to help other players. E 2 is the actual SNR that a destination experiences in a coalition. It is the incentive for a Type II destination to join S m . E 3 is the cooperation cost of forming a coalition, since some system costs are needed to coordinate distributed beamforming. E 3 is assumed to be related to the size of a coalition, with µ being the combination coefficient. If there is only one member in the coalition, no cost is needed, i.e., E 3 = 0. For Type I destinations, E 1 is the main incentive for them to join a coalition, corresponding to the improved reception reliability of the network. For Type II destinations, they join a coalition mainly for the increase in E 2 , corresponding to increased capacity benefited from the relay transmission. Together, function (14) depicts the desirability of a player to join a coalition, taking both gains and costs of cooperation into account.
Correspondingly, the coalition value of S m is the sum of payoffs achieved by members in this coalition, i.e.,

B. Payoff Function II
Payoff function II is proposed on the basis of payoff function I. Consider that payoff function I in (14) has not referred to the fact that it is not always beneficial to increase the SNR of Type II destinations. Since the source communicates with destinations at the fixed data rate R, each destination's capacity only needs to be larger than or equal to the target data rate, i.e., C j,Sm ≥ R. As a result, when the capacity of a Type II destination is greater than R, there is no need to assign more help to this destination. Moreover, the receive SNR of a destination determines its capacity, i.e., C = 1 2 log(1+SN R). Corresponding to the threshold capacity, there is a threshold for SNR, denoted by τ , τ = 2 2R − 1. Consequently, when the SNR of a Type II destination exceeds τ with the help from some Type I destinations, there is no need for more Type I destinations to help it. Inspired by this observation, the SNR expression for destination k ∈ S m is modified as For a Type I destination, SN R k,Sm is always τ . For a Type II destination, SN R k,Sm first increases when it receives help from Type I destinations, and reaches the peak value τ with enough help. Accordingly, the payoff function for destination k to join coalition S m is defined as follows, In addition, the coalition value of S m is defined as To maximize the payoff on the basis of (17), Type I destinations seek to help destinations who need help most. Type II destinations receive a proper amount of help, only to meet the target data rate requirement. Consider a specific coalition S m , where a Type II destination j can decode its message successfully with the help from Type I destinations in S m . In this case, if another Type I destination i joins this coalition to help destination j, it cannot increase its own payoff nor that of destination j. In contrast, destination i can join another coalition to help other Type II destinations for a higher payoff. Hence, the definitions in payoff function II avoid the situation that almost all Type I destinations choose to help one or two particular Type II destinations while other Type II destinations are neglected.
According to [25], a coalitional game is declared in characteristic form, if the value of a coalition depends solely on its members, no matter how the players in other coalitions are structured. Obviously, in (14) and (17), the payoff of each destination k only depends on the members in its coalition S m . Correspondingly, the coalition value only relies on the members of that coalition. Therefore, the considered coalitional game is in characteristic form based on payoff functions I and II.

C. Payoff Function III
Payoff functions I and II consider both gains and costs of cooperation, fitting for scenarios where destinations make individual decision. Given a centralized approach, the centralized entity makes global decisions, such that no negotiation process is involved among destinations. Therefore, the cooperation costs can be ignored in the payoff function for any centralized approach. Specifically, the payoff for destination k, k ∈ S m , is defined as where the SN R is defined in (16). Here, we use the modified SNR to directly define payoff, corresponding to the actual achievable data rate of destinations. The reason of using the summing SNRs is two-fold. One is that it is proportional to the SNRs, to be an important metric for the robustness of transmissions. The other is that the utility function is a linear function of the SNRs, and hence can be easily used to track the various properties of the addressed games, as shown in the following sections. Note that the payoff in (19) is bounded by τ , according to (16). In particular, for a Type I destination, its payoff is always τ . For a Type II destination, its payoff reaches τ once enough help is obtained. Therefore, destinations in the networks cooperate only to meet the data transmission requirement, and they will not attend a bigger coalition if the payoff reaches τ . Accordingly, the coalition value is the sum of receive SNRs for destinations in a coalition, i.e.,for a Type I destination, its payoff is always τ .
This way, the coalition with maximum value v 3 guarantees maximum achievable sum rate for destinations in the coalition.

IV. THE CENTRALIZED APPROACH
In this section, we present a DP approach, to solve the coalition formation problem in the centralized manner. The DP approach is developed to obtain the optimal structure for the cooperative networks with SWIPT, in which the sum rate of destinations can be maximized.

A. Overview of the DP Approach
We briefly overview the DP approach [31], which is a powerful technique to solve a particular class of problems. It solves a complex problem, by decomposing it into subproblems and storing the solutions to the sub-problems. The DP approach was first applied in combinatorial auctions to make winner determination [32], and has been widely applied to a broad class of optimal control problems [33] [34].
Typically, to obtain the optimal coalition structure of a set, a centralized approach enumerates all potential coalition structures. Then, the optimal coalition structure is the one with the highest sum of coalition values. However, the enumeration process runs in O(N N ), which is not feasible in practice. Fortunately, the coalition formation problem can be divided into smaller sub-problems, and the sub-problems can be divided in the same way. Then, to combine the solutions to the subproblems recursively, it yields the optimal solution to the original coalition formation problem.
Assume that there are N destinations in the network A, i.e, A = {1, 2, ..., N }. To obtain the optimal coalition structure of A, we first find the optimal partition by dividing it in two 2 , i.e., A 1 ∪ A 2 . Then we divide each sub-coalition likewise, and find the optimal partitions A 11 ∪ A 12 for A 1 , and A 21 ∪ A 22 for A 2 . This process is repeated until no sub-coalition can achieve a better value if divided into two. Then, by tracing the optimal partition using the backward induction, we can obtain the optimal coalition structure of A. That is, the DP approach executes in the "bottom-up" sequence, i.e., to compute from the smallest sub-coalitions to the larger ones. The maximal value of each smaller sub-coalition is stored, to give reference to the larger ones.
). • Thirdly, we turn to the coalition at the top level, 3}). In Stage II, the DP approach obtains the optimal coalition structure to the original network recursively. According to Stage I in Fig. 3, {2} ∪ {1, 3} is the optimal partition of A. Furthermore, {1, 3} ∪ ∅ is the optimal partition of {1, 3}. Combining them together, the optimal coalition structure is {2} ∪ {1, 3}. The details about the DP approach for coalition formation is to be discussed in the next section.

B. The DP Approach for Coalition Formation
The DP approach obtains the optimal coalition structure for destinations in cooperative networks with SWIPT. The specific approach is summarized in Algorithm 1.
As shown in Algorithm 1, the input to the DP approach is the coalition value v 3 (S). Given N destinations, there are 2 N possible coalitions S (including the empty set), such that the number of inputs is 2 N . Here, the coalition value (20) in payoff function III is utilized, to globally compute the sum rate of destinations. The coalition structure with maximal sum of values corresponds to the maximal sum rate of destinations. The optimal coalition structure can be obtained by two stages: stages I and II.
In Stage I, the DP approach computes the optimal two-part partition with the maximal sum of values, for each coalition S ⊆ A. The coalitions with sizes larger than 1, can be divided Output: The optimal coalition structure for A is Γ * .
into two sub-coalitions by considering all possible cases of disjoint partitions. Stage I compares the value of each partition, and finds the optimal two-part partition of S. This process is executed in the bottom-up sequence, i.e., with the coalition size being from 1 to N . Note that for each coalition, the optimal two-part partition and the maximal sum of values are stored in functions F and G. The maximal sums of values for smaller sub-coalitions are utilized directly in larger coalitions. It is critical that these values do not need to be computed repeatedly, but only once.
Stage II obtains the optimal coalition structure Γ * for A recursively. First, Γ = F (A) is the optimal two-part partition of A, as obtained in stage I. Then, we look into each coalition in Γ, and replace it by its optimal partitions. This operation is repeated until all coalitions in Γ are the optimal themselves. To combine them together, we obtain the optimal coalition structure Γ * for A.
The DP approach is an approach with reasonable complexity that guarantees to find the optimal coalition structure. It runs in O(3 N ) time, and can be further simplified in execution. However, it is still complex in large-scale networks, since its complexity increases exponentially with the number of nodes in the network. Thus, we use the DP approach as a benchmark. We propose two decentralized approaches, which are more practical in large-scale networks. Also, we study the tradeoff in complexity and performance of these approaches. The decentralized approaches are introduced in the next section.

V. THE DECENTRALIZED APPROACHES
In this section, two decentralized coalition formation approaches, i.e., DHCF approach I and DHCF approach II, are proposed to form efficient coalition structures in cooperative networks with SWIPT.

A. Definitions in DHCF Approaches
In DHCF approaches, destinations decide individually which coalition to join, on the basis of their payoffs in different coalitions. According to (14) and (17), the payoff of a destination depends solely on the members in its current coalition. Therefore, the network with cooperative destinations is modeled as a hedonic coalition formation game. In particular, DHCF approach I is based on payoff function I, and DHCF approach II is based on payoff function II. Note that destinations evaluate their preferences over coalitions according to their own or collective benefits. That is, destinations are motivated to join coalitions with highest payoff, with consideration to the network performance. Specifically, the preference relation for a destination to judge which coalition is preferable is defined as follows.
Definition 4 (The Preference Relation): The preference is a metric for a destination to valuate the set of possible coalitions it may join. Particularly, for a destination k ∈ A, the preference relation, represented by S m ≺ k S n , implies that the destination k strictly prefers to stay in S n rather than S m , where k ∈ S m and k ∈ S n . For this purpose, the following two criteria are used to quantify the preference: In (21), the payoff φ and the coalition value v are either (14) and (15) in payoff function I, or (17) and (18) in payoff function II. The first term in (21), i.e., C1, indicates that a destination prefers a coalition where its expected payoff exceeds its current payoff. The second term C2 guarantees that the sum of values for the new coalition structure is not reduced if the destination stays in S n instead of S m . Here, essential information exchanges are required inside the coalition when judging C 2 . For a destination, C1 is the motivation of seeking a better payoff, while C2 bounds the actions from the aspects of global network performance. Actually, these two criteria work together, ensuring the quality and convergence of the network. In fact, to compute (21), a destination first computes C1, and if satisfied, it then computes C2 on the four coalition values. However, there may be contradiction in these two criteria. That is, when the payoff of a destination increases by moving from S m to S n , the sum of values for the new coalition structure does not necessarily increase. Nevertheless, there exist some crucial situations that both C1 and C2 are satisfied, as shown in the following.
Proposition 1: Consider the scenario that a Type I destination i evaluates its preference between S m and S n , i ∈ S m and i ∈ S n , on the basis of payoff function II. S m contains both Type I and Type II destinations, |G Sm,1 | ≥ 2, |G Sm,2 | ≥ 1. S n contains a Type II destination j besides destination i, |S n | = 2. If each Type II destination l ∈ G Sm,2 receives enough help from Type I destinations in S m /i, i.e., SN R l, Proof. The payoff that destination i achieves in S m is where M is the coalition size of S m , i.e., |S m | = M, M ≥ 3.
The payoff that destination i achieves in S n is Therefore, the difference in payoff for destination i joining S n instead of S m is To sum up the payoff of destinations in S m and S m /i, respectively, we have (26) Note that the first items in (25) and (26) have not been detailed into the specific form, since some same parts of them can be removed later. Similarly, the coalition values of S n and S n /i are Note that G Sm/i,2 = G Sm,2 , i.e., the Type II destinations in S m /i and S m are the same. Thus, the difference in the sum of values for destination i joining S n instead of S m is Here, we have SN R l,Sm = τ , for the reason that enough help is already provided for Type II destinations in S m from Type I destinations in S m /i. Consequently, (29) can be simplified as As can be concluded from (24) and (30), the differences in both payoff and the sum of values exceed zero, for destination i joining S n instead of S m . That is, destination i prefers to join S n rather than S m , and thus this proposition is proven.
Proposition 1 is meaningful for its extensive applications. If a Type I destination has little positive effect on destinations in S m , but a critical positive effect on destinations in S n , it prefers to join S n instead of S m . In other words, a Type I destination is motivated to move to the new coalition, if its help to destinations in a new coalition is more important than to destinations in its current coalition.
Definition 5 (The hedonic shift rule): Given a coalition structure Γ = {S 1 , S 2 , ...S K } of set A, a destination k ∈ A makes a hedonic shift based on the preference defined in (21). Destination k decides to leave its current coalition S Γ (k) = S m , and to join another coalition S n ∈ Γ ∪ ∅, S n = S m , if and only if S m ≺ k S n ∪ {k} is satisfied. Mathematically, The hedonic shift (denoted as '→' ) demonstrates that destination k leaves its current coalition S m to join another coalition S n , given that the new coalition S n ∪ {k} is strictly preferred over S m based on the preference in (21). Note that destinations make hedonic shifts only between existing coalitions, or form a singleton coalition. Subsequently, after the hedonic shift, the current coalition structure Γ transforms into a new coalition structure Γ = {Γ/{S m , S n }} ∪ {S m /k, S n ∪ {k}}.

B. The Design of DHCF Approaches
In this section, two DHCF approaches, which are in the same algorithm framework, are proposed to form efficient coalition structures in cooperative networks with SWIPT, as shown in Algorithm 2. For N destinations in the network, i.e., A = {1, 2, ..., N }, the initial coalition structure is Γ 0 = {{1}, {2}, ..., {N }}. During the iteration phase for coalition formation, each destination takes its turn to determine whether to stay in the current coalition or to join a new coalition. As for any destination k ∈ A, it first retrieves the current coalition structure Γ. Afterwards, it searches the potential hedonic shifts over the sets {Γ/S Γ (k)} based on the preference in (21). Specifically, if destination k is to perform any hedonic shift, it leaves the current coalition S Γ (k) = S m , and joins the new coalition S n . Consequently, the new current partition turns into {Γ/{S m , S n }} ∪ {S m /k, S n ∪ {k}}, and the new current coalition of destination k is S n . The iteration runs until no more hedonic shift is possible for any destination. Finally, we obtain the coalition structure Γ f , where no destination prefers to leave its current coalition. (1) Leave its current coalition and join the new coalition.

C. Stability Properties Analysis
In practice, we are concerned about the stability properties of the coalition structures generated by DHCF approaches. First, the convergence of the DHCF approaches is analyzed.
Proposition 2: Given any initial coalition structure Γ 0 , the DHCF approaches always converge to a final coalition structure Γ f .
Proof. The coalition formation process can be mapped into a series of hedonic shifts. Each destination determines whether to make a shift to a new coalition, based on the hedonic shift rule. The process continues until there is no more shift existing for any destination k ∈ A. According to (21), the sum of values for the coalition structure is guaranteed nondecreasing after each hedonic shift. Mathematically, the DHCF approaches go through the following transformations in the coalition formation process: where the operator "→" is the shift operator defined in Definition 5, and the index m indicates the number of already performed hedonic shifts. Each hedonic shift leads to a new coalition structure with gains in the sum of coalition values. Therefore, each adopted hedonic shift corresponds to increment in the sum of coalition values, guaranteeing the uniqueness of each generated structure Γ m . As the total number of coalition structures is finite for a certain set A, i.e., the Bell number, the number of total transformations is finite, at most equaling to the Bell number. In conclusion, no matter how much transformations the DHCF approaches take preciously, the sequence in (31) always terminates and converges to a final coalition structure, though not necessarily the optimal.
Proposition 3: Any final coalition structure Γ f , generated by the DHCF approaches, is Nash-stable.
Proof. This proposition can be proven by contradiction. Assume that Γ f is the final coalition structure generated by the DHCF approaches, but not Nash-stable. Then, according to Definition 3, there must be a destination k ∈ A, who has the incentive to deviate. That is, there exists a coalition S n ∈ Γ f ∪ ∅, satisfying S Γ f (k) ≺ k S n ∪ {k} for destination k. There is no doubt that destination k is motivated to perform a hedonic shift, and to join the new coalition S n . Consequently, the current coalition structure Γ f turns into a new structure Γ f = Γ f /{S n , S Γ f (k)}∪{S n ∪{k}, S Γ f (k)/k}, which contradicts the assumption that Γ f is the final coalition structure. Therefore, this proposition is proven.
Propositions 2 and 3 demonstrate that whatever the initial coalition structure is, the network always converges into a Nash-stable coalition structure, even after some environmental changes.

VI. COMPLEXITY AND PERFORMANCE ANALYSIS
In this section, we take an insight into the complexity and performance of the DP and DHCF approaches, with exhaustion being the benchmark. The detailed analysis is summarized in Table I.

A. Time Complexity
As shown in Table I, the time complexity of the exhaustion and DP approaches is O(N N ) and O(2 N ) ∼ O(3 N ), respectively, as verified in [35]. Obviously, the DP approach runs significantly faster than exhaustion once N exceeds 3. Nonetheless, the time complexity of DP approach is still exponential in the number of destinations, which is not computationally manageable to large-scale networks.
In the DHCF approaches, the time complexity is O(N 2 ) ∼ O(N t ). Specifically, when the coalition structure is generated after only one iteration, i.e., each destination searches for one time but finds no possible hedonic shifts, the complexity is O(N 2 ). Nevertheless, when the number of hedonic shifts equals to the Bell number in the worst case, the complexity is O(N t ), N/2 < t < N . Fortunately, the search process requires a significantly lower number of hedonic shifts than the worst case, to converge to a stable coalition structure. Therefore, the DHCF approaches run faster than the DP approach.
Optimal Optimal sub-optimal Stability N/A 2 N/A Nash-stable Anytime property No No Yes

B. Storage Complexity
In the DP approach, two functions are adopted in memory to be utilized recursively, i.e., F and G. For the case of N destinations, there are 2 N possible coalitions, and thus 2 N +1 values are required to be stored. Such memory requirement is acceptable in the cases with a small N . However, it grows exponentially in N , posing a challenge on the centralized entity. In the exhaustion and DHCF approaches, there is no adoption of any additional function, and the storage complexity depends only on the temporary variables in these approaches, i.e., O(cN ) with c being a constant.

C. Stability and Optimality Properties
The coalition structures generated by the exhaustion and DP approaches are optimal, since all cases of possible coalition structures are considered to yield the optimal one. Nonetheless, the generated coalition structures are not guaranteed to be optimal, since the DHCF approaches search only local optimal solutions. This is also verified by the simulations in the next section. Beyond, the coalition structures generated by the DHCF approaches are Nash-stable, as proved in Section V-C.
In case the radio environment changes, the stability of the network can be destroyed. The payoffs of destinations may have been changed, and thus the network needs to re-organize. Specifically, for the DHCF approaches, the destinations locally re-compute their payoffs, and judge whether to join a new coalition, based on the preference of (21). After a few periods, the network can converge to a stable structure. For the DP approach, the algorithm needs to be re-executed from the beginning. Fortunately, some previous computations can be utilized directly, and the network can re-organize into a new structure after a few periods.

D. Anytime Property
In the exhaustion and DP approaches, the coalition formation process is pre-executed. Specifically, the centralized entity collects all the CSI information, and then computes the optimal coalition structure. This process may take a long time in largescale networks. The coalition structure is generated only when there is enough time to run. If destinations are required to react in a limited time, these two centralized approaches are not practical. Also, if the centralized entity fails to make decision due to the system error, the whole network crashes and no solution can be generated.
On the contrary, the DHCF approaches can generate solutions anytime even if executed incompletely or being interrupted, guaranteeing a more robust network. Each destination locally computes whether to leave the current coalition and join a new coalition in Phase 2. At each time, a destination only makes choice among the existing coalitions. The computation is distributed, and each hedonic shift corresponds to an intermediate coalition structure. If there is no enough time, the coalition structure after only one or several iterations is output, though not guaranteing robust performance.
To conclude, since proper memory requirement is acceptable in the mechanism design, the DP approach is much preferable than exhaustion, to achieve the optimal solution much quickly. In terms of the complexity and practical factors for execution, the DHCF approaches are much preferable than the DP approach, especially in large-scale networks, if no optimal performance is strictly required. Considering the tradeoff between the network performance and complexity, the DP approach is adoptable in small-scale networks, while the DHCF approaches are more practical in large-scale networks.

VII. SIMULATION RESULTS
In this section, the simulation results are presented to illustrate the performance of the proposed approaches. Here, 10 6 Monte Carlo simulations are carried out to illustrate the performance of the proposed approaches. For each simulation, the source is located at the center of a disc with radius 5 meter (m), and the destinations are randomly located inside the disc. That is, the EH distance is set several meters for nearby mobiles to harvest the radiated power. The source transmit power P is set in the range of 0 − 40 dBm, with the carrier frequency, the noise power, transceiver antenna gain, and energy harvesting efficiency set to 915 MHz, 0 dBm, 0 dBi, and 100%, respectively. The channels are assumed to be quasi-static Rayleigh fading, with the real and imaginary parts modelled by independent and identically distributed zeromean Gaussian processes. Moreover, The path loss exponent is set as α = 3, and the cost coefficient is set as µ = 0.01. Specifically, the DP and DHCF approaches are evaluated from the aspects of the data rate and outage probability, whereas the non-cooperative approach (i.e., direct link transmission) is set as the benchmark.

A. Performance of the Proposed Approaches
First, a specific example of the coalition structures generated by DP, DHCF I, DHCF II and a non-cooperative approach (i.e., Non-cooperative for short), is presented in Fig. 4, to demonstrate the performance of the proposed approaches. Here, we set the target data rate R = 1 BPCU and P = 22 dBm. For simplicity, all h or g that model the channels between the source and destination or between destinations are assumed to be 1.
Consequently, destinations in the same dashed circle cooperate with each other, according to the coalition structures generated by the proposed approaches, as shown in Fig. 4. Specifically, in DHCF I, Type II destinations 5, 8, and 10 are ignored without any help, while Type II destinations 1 and 4 receive crowded help from Type I destinations 2, 3, 6, 7, and 9. In DHCF II, the situation with unbalanced help allocation is avoided, such that destinations 5, 8, and 10 receive proper help from Type I destinations. In DP, the generated coalition structure guarantees the most efficient cooperation among destinations, whose achievable data rate are the highest as shown in Table II. According to Table II, the average data rates of destinations achieved in DP, DHCF I and DHCF II are higher than those in Non-cooperative, benefiting from the strategy of adopting EH relays. It indicates that the harvested energy for relay transmission contributes to the performance improvement, despite the harvested energy being small. The achieved average data rate in DP is the highest, approaching to the target data rate R = 1 BPCU, and the average data rate in DHCF II is greater than in DHCF I. In particular, the achievable data rates of Type II destinations in different approaches are discussed. Destinations 1 and 4 achieve the target data rate R = 1 BPCU in DP, DHCF I and DHCF II. Destination 10 achieves R = 1 BPCU in both DHCF II and DP, whereas its data rate in DHCF I is R = 0.7103 BPCU. Moreover, destination 5 achieves a higher data rate R = 0.8942 BPCU in DP, compared to R = 0.7407 BPCU in DHCF II. In summary, DP provides the best cooperation strategy for destinations in the network, followed by DHCF II and DHCF I.
Furthermore, it is important to evaluate the efficiency of the proposed approaches in different networks. The averaged data rate and outage probability of destinations are provided as follows, to illustrate the network throughput and reception reliability in different approaches. In Fig. 5(a), the average data rate of destinations, achieved by DP, DHCF I, DHCF II, and Non-cooperative, is plotted versus P . Here, we set the noise power P n = 0 dBm, and the ratio of P/P n , i.e., SNR, reflects the transmission environments between the source and destinations. The higher P is, , i.e., a higher SNR, the better the transmission environment is. As can be observed from Fig. 5(a), DP achieves the best data rate performance for destinations, followed by DHCF II, DHCF I, and Non-cooperative. When P increases, the average data rates of all these approaches increase, due to better transmission environments at higher transmit power. Once P is high enough, the data rate of destinations achieved in all approaches is R. This phenomenon is expected since all destinations turn to Type I at high transmit power. Also, it can be observed from Fig. 5(a) that the difference gap of average data rate achieved in DHCF II and DP is small, demonstrating that DHCF II achieves a near-to-optimal data rate performance for destinations. Besides, the data rate of destinations achieved in DHCF II is slightly higher than in DHCF I, due to the reason that DHCF II adopts the threshold SNR for fairer help allocations.
In Fig. 5(b), the outage probability of destinations achieved in DP, DHCF I, DHCF II, and Non-cooperative is presented. The average outage probability curves decrease along with the increase in P in all these approaches. It is due to the reason that destinations in reliable transmission environments, i.e., high transmit power, have low probability of unsuccessful information transmission. DP yields the best outage probability performance for destinations as expected, i.e., the best reception reliability of the network. Nonetheless, the outage probability performance of destinations achieved in DHCF II is slightly worse than in DHCF I. The possible reason is that DHCF II tries to provide a balanced help to more destinations. This way, although the sum rate of destinations is higher in DHCF II compared to DHCF I, it is not able to ensure that more destinations can reach the target data rate.

B. Performance of DHCF Approaches
Although DP achieves the highest data rate and lowest outage probability for destinations among the proposed approaches, it is computationally unmanageable to large-scale networks. In this regard, the influence of other parameters, (e.g., the target data rate and the number of the destinations,) on the network performance is only discussed in DHCF I, DHCF II, and Non-cooperative.
As shown in Fig. 6(a), in low transmit power region, destinations achieve the highest average data rate at different target data rate R in DHCF II, followed by DHCF I and Noncooperative, whereas destinations achieve the target data rate in all these approaches at high transmit power region. Again, it verifies that destinations achieve higher data rate in DHCF II with fairer help allocations, than in DHCF I. In Fig. 6(b), we focus on how the number of destinations N influences the average data rate. When the number of destinations in the network increases, the difference in data rate performance among these approaches is larger. Such observation demonstrates that DHCF II is more preferable than DHCF I and Noncooperative, especially in large-scale networks. In addition, for DHCF II, the more destinations in the network, the higher the average data rate is. This is expected, since more destinations in the network mean that there are more opportunities for destinations to join appropriate coalitions. Specifically, with more destinations in the same disc, i.e., a higher density, the distance between neighboring Type I and Type II destinations is smaller, enabling better cooperation. It is worth pointing out that both DHCF I and DHCF II are implemented in decentralized manners, which means that those performance gains over Non-cooperative can be obtained with little system overhead.
In Fig. 7, both DHCF I and DHCF II outperform Noncooperative in cases with different number of destinations N . It is interesting to observe that the outage probability performance of destinations in Non-cooperative does not change with N , since no cooperation is carried out in this benchmark. Besides, DHCF II performs better than DHCF I in large N in terms of the outage probability, and the gap between these two approaches is bigger along with increased N . That is, DHCF II performs almost the same or slightly worse than DHCF I in small N , e.g., N = 5 in Fig. 5(a). However, it outperforms DHCF I in large N , as shown in Fig. 7. The possible reason is that more appropriate coalitions can be formed with more destinations in the network, and therefore the probability of unsuccessful ID decreases. Moreover, with more destinations in the network, the number of Type I destinations may increase as well, and thus the fairer help allocations in DHCF II is more beneficial compared to DHCF I.

VIII. CONCLUSIONS
In this paper, we have proposed both centralized and decentralized approaches for coalition formation in cooperative networks with SWIPT. To be specific, the cooperation among two types of destinations was modeled as a coalition formation game, in which Type I destinations play as relays to help Type II destinations. Then, the DP approach was developed  DHCF II, N=30  DHCF I, N=30  Non-cooperative, N=30  DHCF II, N=20  DHCF I, N=20  Non-cooperative, N=20  DHCF II, N=10  DHCF I, N=10 Non-cooperative, N=10 to generate the optimal coalition structure for destinations in centralized manner. Next, we proposed two DHCF approaches in decentralized manner to generate coalition structures more efficiently. Simulation results demonstrated that all the proposed approaches are superior to the non-cooperative one, and the DP approach achieves the largest data rate and lowest outage probability. Besides, DHCF approach II performs better than DHCF approach I in large N, in terms of data rate and outage probability. With the trade-off between the complexity and network performance, the DP approach is adoptable in small-scale networks, while the DHCF approaches are more practical to large-scale networks.
It is worth pointing out that the scenario with quasi static fading and low mobility destinations is assumed for the simplified network model. The channels are assumed to remain static during the coalition formation process. In practical systems, the randomness of channel connections to the source should be considered, thus requiring a new analytical investigation. This can be seen as one of the future research directions. In addition, it is an interesting direction to consider nonorthogonal multiple access (NOMA) in the considered model, which can improve both the network capacity and spectral efficiency. NOMA superposes multiple users' messages in the power domain, and serves more than one user at each time slot. Nonetheless, strong co-channel interference exists in the network with NOMA. It means that additional optimization steps are required for resource allocation, in order to combat strong co-channel interference, however, this is beyond the scope of this paper.