Network Coding for Wireless Networks

Reference EPFL-ARTICLE-171957doi:10.1155/2010/359475View record in Web of Science Record created on 2011-12-16, modified on 2016-08-09

J. Goseling et al., in the first paper of this special issue "Lower bounds on the maximum energy benefit of network coding for wireless multiple unicast," investigate the benefit of using network coding for reducing energy consumption in wireless networks. The energy benefit of using network coding in d-dimensional networks, the paper indicates, is at least 2d/ √ d -fold, compared to the case of using the plain routing solution.
S. Zhang and S. C. Liew in the second paper, "Application of physical-layer network coding in wireless networks," investigate the use of physical-layer network coding (PNC) for wireless networks. The idea of PNC is to exploit the inherent property of the radio channel that radio waves from different users superpose at the receiver antenna. This property can be used to carry out the addition operation needed in network coding and can be utilized to achieve substantial increase in throughput compared to conventional network coding schemes.
In the third paper, "Joint channel-network coding for the Gaussian two-way two-relay network," P. Hu et al. investigate a two-way relay channel problem and consider five different network coding strategies made from a combination of basic ones such as Amplify-Forward (AF), Decode-Forward, and Decode-Amplify Forward. They have done extensive performance evaluations of these strategies for various relay channel environments.
B. Du and J. Zhang in the fourth paper, "Parity-check network coding for multiple access relay channel (MARC) in wireless sensor cooperative communications," aim to design a parity-check network coding scheme for a two-source multiple access relay channel. The parity-check network code, they imply, is a multidimensional low-density paritycheck (LDPC) code. Each user employs an LDPC code to

Introduction
Emerging applications in wireless networks, like environment monitoring in rural areas by ad hoc networks, require more and more resources. One of the most important limitations is formed by battery life. Since battery technology is not keeping up with the increasing demand from resourceconsuming applications, it is imperative that more efficient use is made of the available energy. There has been significant recent attention to the problem of minimizing energy consumption in networks. Some of the topics considered are minimum cost routing [1][2][3], power control algorithms [4][5][6], and cross-layer protocol design for energy minimization [7]. In this work, we are interested in the use of network coding [8][9][10][11][12][13][14] for reducing the energy consumption in wireless networks. We compare the reduction with traditional routing solutions. The contributions of this work are lower bounds on the energy reduction that can be achieved by using network coding for multiple unicast problems in wireless networks.
In recent years, there has been significant interest in network coding with the aim of reducing energy consumption in networks. More generally, network coding with a cost criterion has been considered. Much progress has been made in understanding the case of multicast traffic. In fact, it has been shown by Lun et al. that a minimum-cost network coding solution can be found in a distributed fashion in polynomial time [15]. The fact that the complexity of finding this solution is polynomial in time is surprising, since the corresponding routing problem is a Steiner tree problem that is known to be NP-complete [16].
Besides constructing minimum-cost coding solutions, it is also of interest to know what the benefits of network coding are compared to routing. In this work we, are interested in the energy benefit of network coding, which is the ratio of the minimum energy solution in a routing solution compared to the minimum energy network coding solution, maximized over all configurations. It has been shown by Goel and Khanna [17] that the energy benefit of network coding for multicast problems in wireless networks is upper bounded general linear coding strategy, in which linear combinations of coded messages can be retransmitted. The motivation behind using decode-and-recombine codes is that it prevents information from spreading too much in the network, away from the path between source and destination, a heuristic introduced by Katti et al. [25]. The use of a decode-andrecombine strategy results in reduced complexity. However, an important question that has to be addressed is whether the use of decode-and-recombine codes leads to a higher energy consumption than is strictly necessary. We answer this question affirmatively. An upper bound of three on the energy benefit of decode-and-recombine codes has been given by Liu et al. [26]. One of the contributions of this work is to show that larger energy benefits can be obtained by considering also other types of codes.
This paper is organized as follows. In Section 2 we specify our model and problem statement more precisely. Our main results are presented in Section 3. Constructions of configurations that allow a large energy benefit for network coding and proofs of our results are given in Sections 4 and 5. In Section 6, finally, we discuss our work.

Model and Problem Statement
Let V ⊂ R d be the nodes of a d-dimensional wireless network. We consider a wireless network model with broadcast, where all nodes within range r of a transmitting node can receive, and nodes outside this range cannot. More precisely, given a transmission range r, a node v is broadcasting to all nodes in the set where u − v denotes the Euclidean norm of u − v. The energy required to transmit one unit of information to all other nodes within range r equals cr α , where α is the path loss exponent and c is some constant. In analyzing the energy consumption of nodes, we will consider only the energy consumed by transmitting. Receiver energy consumption as well as energy consumed by processing are assumed to be negligible compared to transmitter energy consumption. In particular, note that little additional processing is required for network coding, compared to the processing that is performed in a traditional wireless protocol stack. The traffic pattern that we consider is multiple unicast. All symbols are from the field F 2 , that is, they are bits and addition corresponds to the xor operation. The source of each unicast session has a sequence of source symbols that need to be delivered to the corresponding destination. Let M be the set of unicast sessions. We call {V , M, r} a wireless multiple unicast configuration.
We will compare energy consumption of routing and network coding. Our goal is to establish lower bounds on the maximum of the ratio of the minimum energy required by routing and network coding solutions, where the maximum is over all configurations. We will refer to this ratio as the energy benefit of network coding. Let E coding (V , M, r) and E routing (V , M, r) be the minimum energy required for network coding and routing solutions, respectively, for a configuration {V , M, r}. The energy consumption of a coding or routing scheme is defined as the time-average of the total energy spent by all nodes in the network to deliver one symbol for each unicast session. In analyzing coding schemes, we will ignore the energy consumption in an initial startup phase and consider only steady-state behavior.
Note that since energy consumption per transmission equals cr α , the transmission range r is an important factor in the energy consumption. Therefore, it is of particular interest to optimize the transmission range such that energy consumption is minimized. In this work, we consider two different quantities: (1) B fixed , denoting the energy benefit that can be obtained if the transmission range is given and fixed and (2) B var , denoting the energy benefit that can be obtained if one is allowed to optimize the transmission range. Note that the transmission range can be individually optimized for the routing and network coding scenarios. More precisely, the goal of this work is to establish lower bounds on where the maximization is over all node locations V ⊂ R d , multiple unicast sessions M, and transmission ranges r, with the transmission range equal for the routing and network coding solutions, and where the maximization is over all node locations V ⊂ R d and multiple unicast sessions M, with the transmission range being optimized individually for the routing and network coding solutions. If no confusion can arise, we will omit dependency on d in the notation for B fixed and B var . Since in B fixed , r is equal for E routing and E coding , the energy per transmission is equal in E routing and E coding and the benefit is equal to the ratio of the number of transmissions required in routing and network coding solutions.
Since we are interested in energy consumption only, we can assume that all transmissions are scheduled sequentially and/or that there is no interference. All coding and routing schemes that we consider proceed in time slots or rounds. In each time slot, all nodes are allowed to transmit one or more messages. We assume that the length of the time slot is large enough to accommodate sequential transmission of all messages in that round. Coding operations will be based on messages received in previous time slots only. Finally, we assume that all nodes have complete knowledge of the network topology and the network code that is being used.
To conclude this section, we introduce here some of the notation that will be used in the remainder of the paper. The symbol transmitted by a node v ∈ V in time slot t is denoted by x t (v). If v transmits more than one symbol in time slot t, these will be distinguished by a superscript, giving, for instance, x 1 t (v) and Unicast sessions are denoted by m i (u), with i being an integer and u a vector. We will see in Sections 4 and 5 that u defines the location of the source and i the relative location of the destination, that is, the direction of the session. In some cases m i (u) will be denoted as m i (u 1 , u d 2 ) or similar forms. The tth source symbol of a session m i (u) is denoted by m i t (u). The source and destination of session m i (u) are denoted by s i (u) and r i (u), respectively.

Results
We provide lower bounds on B var and B fixed .
The result states that B fixed is at least 2, 4, and 6 for 1-, 2-and 3-dimensional networks, respectively. The result that B fixed is at least 2 in one-dimensional networks also follows from the results in [20]. The lower bound 4 for 2dimensional networks exceeds the previously known bound of 2.4 [21]. This new lower bound is of particular interest, since it exceeds the upper bound of 3 for decode-andrecombine type network codes [26]. Indeed, the code that we construct does not follow a decode-and-recombine strategy. This shows that energy can be saved by considering strategies other than decode-and-recombine. No lower bounds for three-dimensional networks have been previously established.
Before proving Theorem 1 in Section 5, we provide some intuition. The configuration used to proof Theorem 1 has nodes placed at a d-dimensional rectangular lattice, connectivity r = √ d and is parameterized by an integer K controlling the size of the network. The network is given in Figure 1 for d = 2 and K = 5. For d = 2, the result of Theorem 1 is obtained as follows. First consider the case of routing. Note, that the minimum-energy solution is to route all packets along the shortest path between source and destination. Therefore, all nodes in the interior of the network will need to transmit four times. Now, for the case of network coding, we will show in Section 5 that it is possible to construct a network code in which each node in the interior of the network is transmitting only once in each time slot. Therefore, by considering large K and neglecting the energy consumption at the borders of the network, the obtained energy benefit is 4.
In Section 5 we will consider the general case of arbitrary d. Again, the network coding solution will be such that each of the K d + O(K d−1 ) nodes in the interior of the network is transmitting only once in each time slot. In analyzing the routing solution, some care needs to the taken. Since r = √ d, the number of hops that need to be taken on the shortest path 4 EURASIP Journal on Wireless Communications and Networking between source and destination equals K/ √ d . By noting that the number of sessions is roughly equal to the number of nodes at the border of the network, that is, 2dK d−1 +O(K d−2 ), and ignoring all transmission from nodes at the border of the network, we establish Details of the configuration and a proof of Theorem 1 are given in Section 5. The configuration and network code construction used for Theorem 1 are not useful for obtaining bounds on B var . Since r = √ d, the cost per transmission in the network coding scheme is cd α/2 . One can verify, however, that the optimal transmission range under routing is r = 1. This requires K hops per session, with the cost per transmission being equal to c. Using the network code described above and the optimal routing solution at r = 1 gives which is at most 2, since α ≥ 2. Note that it was already shown in [20] that B var (1) ≥ 2 and in [21] that B var (2) ≥ 2.4.
By considering a different configuration, we show that B var (2) ≥ 3.
Theorem 2. For 2-dimensional wireless networks, the ratio of the minimum energy consumption of routing solutions and the minimum energy consumption of network coding solutions, maximized over all node locations and multiple unicast sessions, with the transmission range optimized individually for the routing and network coding solutions, is at least 3, that is, B var (2) ≥ 3.
Here we provide an intuitive explanation of this result; details of the configuration and a proof of Theorem 2 are provided in Section 4. The result is established using a multiple unicast configuration on a subset of the 2dimensional hexagonal lattice as depicted in Figure 2. The minimum cost routing solution on this network follows shortest paths for all sessions and will require all nodes in the interior of the network to transmit three times in order to deliver one symbol for each session. In Section 4, we construct a network code in which each node in the interior is only transmitting once per delivered symbol. By making the size of the network large, the influence of the borders becomes negligible. Hence, the energy benefit is 3.
Besides providing new lower bounds on the energy benefit of network, the network codes that are constructed in this paper are of interest by themselves. They might lead to insight in how to operate in networks with another structure. EURASIP Journal on Wireless Communications and Networking 5 Finally, even though the case d > 3 is not of any practical relevance, the bounds as well as the code constructions might lead to a better insight for lower-dimensional networks.

An Efficient Code on the Hexagonal Lattice
In this section, we present a multiple unicast configuration in which the nodes form a subset of the hexagonal lattice. It will be shown that the energy benefit on this configuration is 3, proving Theorem 2. Since the code construction used here is less involved then the construction used to prove Theorem 1, we start with the proof of Theorem 2. This section is organized as follows. In Section 4.1 we present the configuration in more detail after which we give the construction of the network code in Section 4.2. Section 4.3 is used to prove that the code is valid. Finally, in Section 4.4 we analyze the energy consumption of the network code and prove Theorem 2.

Configuration.
The size of the configuration is parameterized by a positive integer K. The nodes V form a subset of the hexagonal lattice. We index nodes with a tuple (v 1 , v 2 ) ∈ N 2 . V is given by The location of node v ∈ V in R 2 is given by vG, where Let • V denote the interior of the network, that is, The transmission range that we are interested in is r = 1. This leads to connectivity between the six nearest neighbours.
The nodes V and the connectivity are depicted in as depicted in Figure 4. Remember from Section 2, that session m j (i) has the sequence of source symbols m j 0 (i), m j 1 (i), m j 2 (i), . . . to be transferred.

Network Code.
The network code is such that in each time slot a new source symbol from each session is transmitted. Also, one symbol of each session is decoded by its destination in each time slot. After successfully decoding a symbol, it is retransmitted by the destination in the next time slot. Nodes at the border will, therefore, transmit twice in each time slot. Nodes in the interior of the network transmit only once. The symbol that they transmit is a linear combination of one symbol from each of the sessions for which the shortest path between source and destination includes that node. The operation of the network code is demonstrated in Figure 5 in which the transmissions of all nodes in the first four time slots are depicted. Different transmissions by the same node are separated by a comma. Note, moreover, that there is a startup phase, time slots 0 to 2, in which not all destinations are able to decode a symbol. From time slot 3 onwards, all destinations decode one symbol in every time slot. In analyzing the energy consumption of the coding scheme, we will ignore the startup phase.
The symbol transmitted at t = 3 by the node with the dotted border can be obtained by summing all transmissions from nodes with a dashed border in earlier time slots. Indeed This coding operation (i.e., in time slot t, a node transmits the sum of what was transmitted by its top-left neighbour in time slot t − 2, by its top right-neighbour in time slot t − 1, and so forth, as visualized in Figure 5) is performed by all nodes that are in the interior of the network. The idea behind the coding operation is to cancel, by means of the XOR operation, all symbols that should not be retransmitted. In (12), for instance, we have m 1 1 (3) + m 1 1 (3) = 0. The exact operation of the network code is made more precise in the remainder of this subsection. The coding operation for interior nodes is given in exact form in (17).
Nodes at the border of the network operate as follows. Let 0 < u 2 < K. In time slot t node (0, u 2 ) transmitstwo symbols  Left border: Since (0, u 2 ) is the source of session m 1 (u 2 ) it has source symbol m 1 t (u 2 ), available. Also, (0, u 2 ) is the destination for session m 3 (K − u 2 ). It remains to be shown that symbol m 3 t−u2 (K − u 2 ) can be decoded by (0, u 2 ) using the information obtained from its neighbours up to time slot t. For notational convenience, let Left border: x t (0, u 2 ) x 1 t (0, u 2 ) + x 3 t (0, u 2 ).
In a similar fashion, we have the following transmissions at the right and bottom borders of the network.
Nodes in the interior of the network transmit once in each time slot. Let (u 1 , u 2 ) ∈ • V . The coding operation it performs is given by x t (u 1 , u 2 ) = x t−1 (u 1 − 1, u 2 ) + x t−2 (u 1 − 1, u 2 + 1) +x t−1 (u 1 , u 2 + 1) + x t−3 (u 1 , u 2 ) +x t−2 (u 1 + 1, u 2 ) + x t−2 (u 1 , u 2 − 1) +x t−1 (u 1 + 1, u 2 − 1). (17) 4.3. Validity of the Network Code. We need to show that destinations can decode in time in order to retransmit the required symbols according to (13), (15), and (16). In order to do so we first analyze how data propagates through the network. If we look at the nodes in the network that transmit linear combinations that contain a certain source symbol, we see that symbols propagate exactly along the shortest paths between source and destination. This is made more precise in the following two lemmas. Lemma 1. Let 0 < u 2 < K. Assume that the only nonzero source symbol transmitted in the network is m 1 0 (u 2 ) by node (0, u 2 ) in time slot 0. Then, for all t ≥ 0 and (v 1 Proof. We use induction over time. The base case is time slot t = 0, for which it is readily verified that the statement is true. Now, for the induction step, suppose that the lemma holds for all t smaller than t. This implies that for all τ > 0 Hence, which by the induction hypothesis is equal to m 1 0 (u 2 ) if v 1 = t and v 2 = u 2 and zero otherwise.
We are now ready to prove that the destinations can correctly decode source symbols. We present the decoding procedure for nodes on the right border of the network. The decoding procedures at the other borders can be obtained by exploiting the symmetry of the system. Lemma 3. Consider (u 1 , u 2 ), with u 1 + u 2 = K, 0 < u 2 < K, that is, the destination of session m 1 (u 2 ). It can decode symbol m 1 t−u1 (u 2 ) at the end of time slot t − 1 as Proof. From Lemma 2, (15), it follows that (22) equals

Energy Consumption.
The energy consumption of the network coding scheme presented above is given in the following lemma.
Proof. From (13)-(17), we have that each of the 3(K − 1) nodes at the border that are source or destination are transmitting twice in each time slot. Each of the (K − 1)(K − 2)/2 internal nodes is transmitting once in each time slot. Since r = 1, the energy consumption per transmission is c. This gives Next, we give the minimum energy required by a routing solution.
Proof. Since we consider routing, we need to take the shortest path for each session. Since the energy consumption per hop equals cr α , the energy consumption under routing is minimized for r = 1. Now, we see that the number of transmissions required to deliver a symbol for the sessions m 1 (1), . . . , m 1 (K − 1) equals K(K − 1)/2. Adding the transmissions for sessions of type 2 and 3 gives Using the above two lemmas, we are able to prove Theorem 2.
Proof of Theorem 2. Remember that B var is defined as the maximum of min r E routing (V , M, r)/min r E coding (V , M, r) over V and M. Hence, min r E routing (V , M, r)/min r E coding (V , M, r) for any specific V and M will provide a lower bound to B var . In addition, any upper bound to min r E coding (V , M, r) will result in a lower bound to B var . Hence, from Lemmas 4 and 5, we have

An Efficient Code on the d-Dimensional Rectangular Lattice
In this section, we present a multiple unicast configuration in which the nodes are placed at integer coordinates in a ddimensional space, that is, at the rectangular lattice.

Configuration.
The size of the configuration is parameterized by a positive integer K. We have The interior of the network is given by We will make use of which corresponds to those nodes that are part of exactly one face of the network. The transmission range that will be used is r = √ d. This transmission range induces a neighbourhood consisting of all neighbours within distance √ d. The coding operation of our network code is based on only part of the neighbourhood, that is, it uses Note, that for d ≤ 3, N v corresponds to the complete neighbourhood of v. We will be using dist(u, v)  distance from u to v. The network and its connectivity are depicted for d = 2 in Figure 6. A source is located at each v ∈ V . Therefore, there are |V | = 2d(K − 1) d−1 sessions. If v i = 0, we denote the session corresponding to this source by m i (v \i ). Recall from Section 2 that v \i denotes the d-1 dimensional vector obtained by removing the ith element from v. If v i = K, we denote the session by m d+i (v \i ). The destination of each session is located at the other side of the network, that is, we have r i (v \i ) = s d+i (v \i ) and r d+i (v \i ) = s i (v \i ). The positions of sources and destinations are depicted for d = 2 in Figure 7.
It can be seen that m i (v \i ) and m d+i (v \i ) form oppositely directed sessions.

Network Code.
We introduce sets Θ δ ⊂ {1, . . . , 2d}, 0 ≤ δ ≤ d, which are defined recursively as follows:  where Δ denotes symmetric difference and Θ δ ± 1 = {τ ± 1 | τ ∈ Θ δ }. Note that irrespective of d we have 1 ∈ Θ 1 . As an example for d = 2 we have The scheme is very similar in flavour to the scheme presented in Section 4; its operation is demonstrated in Figure 8 in which, for d = 2 and K = 3, the transmissions of all nodes in the first four time slots are depicted. The operation of the scheme is such that in time slot t sources transmit the tth source symbol and destinations decode the (t − K)th source symbol. Besides transmitting a new source symbol in each time slot, sources/destinations will also retransmit the symbol that has been decoded in that time slot, that is, they transmit two different symbols in each time slot. In the figure, different transmissions by the same node are separated by a comma. Nodes in the interior of the network transmit only once. The symbol that they transmit is a linear combination of one symbol from each of the sessions for which the shortest path between source and destination includes that node. The symbol transmitted at t = 3 by the node with the dotted border can be obtained by summing all transmissions from nodes with a dashed border in earlier time slots. This coding operation is performed by all nodes that are in the interior of the network. The exact operation of the network code is made more precise in the remainder of this subsection. The coding operation for interior nodes is given in exact form in (34).
Let node v ∈ V . Remember that v ∈ V implies that there exists a unique i such that v i ∈ {0, K}. Node v transmits For notational convenience, let The coding operation performed by an internal node is as follows:

Validity of the Network Code.
The following result follows directly from the definition of the sets Θ δ , but is stated here as a lemma because of its importance in the remainder of the paper.
Lemma 7. Consider node (0, u d 2 ) ∈ V . Assume that the only nonzero source symbol transmitted in the network is m 1 0 (u d 2 ) by node (0, u d 2 ) in time slot 0. Then for all v ∈ V and t ≥ 0.
Proof. We use induction over t. At time t = 0, the lemma holds, giving us our base case. Now suppose that the lemma holds for all time slots smaller than t. If v ∈ V , the lemma follows directly from (32)-(33). In the remainder we consider u ∈ • V . From the induction hypothesis, it follows that for any t < t If u 1 = K − 1, it follows from (32) and the induction hypothesis that Now, at t the coding operation performed by u can be decomposed as 10 EURASIP Journal on Wireless Communications and Networking where In the remainder, we show that which proves the lemma, since by the induction hypothesis For w / = u we, have where the second equality follows from Lemma 6, the third equality follows from (37)-(38), and the last equality holds because we work over F 2 .
For w = u, we have Proof. By linearity, time-invariance and symmetry of (34) together with Lemma 7.
We are now ready to prove that the destinations can correctly decode source symbols. We present the decoding procedure for nodes on the right border of the network, that is, for nodes of type (K, u d 2 ) ∈ V . The decoding procedures at the other borders can be obtained by exploiting the symmetry of the system. and From (32) it follows that and τ∈Θ1\{1} The proof of the lemma follows by adding the final expressions from (46), (49) and (50) observing that the outcome is

Energy Consumption.
The energy consumption of the network coding scheme presented above provides an upper bound to min r E coding (V , M, r).
Proof. All transmissions are over distance √ d and cost cd α/2 . The nodes in V are transmitting twice. On each of the 2d sides of the network, there are ( transmissions. In addition, there are (K − 1) d nodes in the interior, that are all transmitting once.
Next, we give the minimum energy required by a routing solution.
Proof. Since the transmission range is equal to √ d, a routing solution requires K/ √ d transmissions per session. Moreover, there are |V | = 2d(K − 1) d−1 sessions.
Using the above two lemmas, we are able to prove Theorem 1.

Discussion
We have given several constructions of energy-efficient network codes. These constructions serve to show that compared to plain routing, network coding has the potential of reducing energy consumption in wireless networks. Since we have provided only codes that are based on a centralized design, it remains to be shown in future work if and how this potential can be exploited using practical codes. Moreover, it would also be of interest to consider the energy-benefit in topologies in which the nodes are not positioned at a lattice, for instance, random networks. In this work we have provided lower bounds on the energy benefit of network coding for wireless multiple unicast. Another open problem is to find upper bounds on the benefit. [1] S. Chen and K. Nahrstedt, "An overview of quality of service routing for next-generation high-speed networks: problems and solutions," IEEE Network, vol. 12, no.

Introduction
One of the biggest challenges in wireless communication is how to deal with the interference at the receiver when signals from multiple sources arrive simultaneously. In the radio channel of the physical-layer of wireless networks, data are transmitted through electromagnetic (EM) waves in a broadcast manner. The interference between these EM waves causes the data to be scrambled. To overcome its negative impact, most schemes attempt to find ways to either reduce or avoid interference through receiver design or transmission scheduling [1]. For example, in 802.11 networks, the carrier-sensing mechanism allows at most one source to transmit or receive at any time within a carrier-sensing range. This is obviously inefficient when multiple nodes have data to transmit.
While interference causes throughput degradation on wireless networks in general, its negative effect for multihop ad hoc networks is particularly significant. For example, in 802.11 networks, the theoretical throughput of a multihop flow in a linear network is less than 1/4 of the single-hop case due to the "self-interference" effect, in which packets of the same flow but at different hops collide with each other [2,3].
Instead of treating interference as a nuisance to be avoided, we can actually embrace interference to improve throughput performance with the "right mechanism". To do so in a multihop network, the following goals must be met.
(1) A relay node must be able to convert simultaneously received signals into interpretable output signals to be relayed to their final destinations.
(2) A destination must be able to extract the information addressed to it from the relayed signals.
The capability of network coding to combine and extract information through simple Galois field GF(2 n ) additions [4,5] provides a potential approach to meet such goals. However, network coding arithmetic is generally only applied on bits that have already been correctly received. That is, when the EM waves from multiple sources overlap and mutually interfere, network coding cannot be used to resolve the data at the receiver. So, criterion 1 above cannot be met.

EURASIP Journal on Wireless Communications and Networking
This paper proposes the application of network coding directly within the radio channel at the physical-layer. We call this scheme Physical-layer Network Coding (PNC). The main idea of PNC is to create an apparatus similar to that of network coding, but at the physical-layer that deals with EM signal reception and modulation. Through a proper modulation-and-demodulation technique at the relay nodes, additions of EM signals can be mapped to GF(2 n ) additions of digital bit streams, so that the interference becomes part of the arithmetic operation in network coding. The basic idea of PNC was first put forth in our conference paper in [6]. Going beyond [6], this paper addresses a number of practical issues of applying PNC in wireless networks. In particular, we evaluate the performance of PNC based on specific scheduling algorithms for 1D and 2D regular networks that make use of PNC (The PNC scheduling schemes in this paper can be easily extended to more general networks as in [6]) . Compared to the traditional transmission and the straightforward network coding, our analytical results show that PNC can improve the network throughput by a factor of 2 and 1.5, respectively, for the 1D network, and by a factor of 3 and 2 respectively for the 2D network.

Related Work.
In 2006, we proposed PNC in [6] as demodulation mappings based on different modulation schemes. A similar idea was also published independently in [7] at the same time by another group. After that, a large body of work from other researchers on PNC began to appear. The work can be roughly divided into three categories.
In the first category, PNC is regarded as a modulationdemodulation technique. Many new PNC mapping schemes have been proposed since [6]. For example, [8] proposed a scheme based on Tomlinson-Harashima precoding. Following [6], [9] proposed a simple relay strategy called analog network coding (ANC), in which the relay amplifies and forwards the received superimposed signal without any processing. Analog network coding turns out to be similar to a scheme earlier by researchers in the satellite communication society [10]. In [11], a number of memoryless relay functions, including PNC mapping and the BER optimal function, were identified and analyzed assuming phase synchronization between signals of the transmitters. In [12], we observed that there is a one-to-one correspondence between a relay function and a specific PNC scheme under the general definition of memoryless PNC. Besides the precise definition of memoryless PNC which distinguishes it from the traditional straightforward network coding (SNC), [12] also gave a number of new PNC schemes. Reference [13] proposed a new PNC scheme where the relay maps a group constellation points to one signal according to the phase difference of the two end nodes' signals. The mechanism also takes care of the phase difference between the two end nodes implicitly.
In the second category, PNC and channel coding are studied jointly. In [14][15][16], PNC was combined with Lattice code or LDPC code. It was proved that the capacity of the two-way relay channel can be approached in high SNR and low SNR. In [14][15][16], channel coding and PNC mapping are performed independently (i.e., successively). In [17], we proposed a novel scheme which treats channel coding and PNC in an integrated manner. We show that joint channel-PNC decoding can outperform the previous schemes significantly.
In the third category, the focus is on the performance impact and significance of PNC in large-scale wireless networks. For one-dimensional wireless networks, [18] showed that PNC can improve the capacity by a fixed factor, although it does not change the scaling law. For two-dimensional wireless networks, [19] showed that PNC can increase capacity by a factor of 2.5 for the rectangular networks and a factor 2 for the hexagonal networks. However, the result in [18] is obtained based on a rough scheduling scheme which is established traditional network coding rather than physical-layer network coding (the special properties of PNC are ignored). Our paper here also discusses the application of PNC in large-scale wireless networks. It is different from [18] in that we provide the construction of an explicit PNCscheduling algorithm (specially designed for PNC), upon which all our results are established. Compared with [19], we consider the many-to-many scenario with multiple sources and destinations, while [19] only considered the one-tomany scenario with one source.
The rest of this paper is organized as follows. Section 2 overviews the basic idea of PNC with a linear 3-node multihop network. Sections 3 and 4 investigate the application of PNC in the 1D regular linear network and 2D regular grid network, respectively. Section A concludes the paper.

Illustrating Example: A Three-Node Wireless Linear Network
Consider the three-node linear network in Figure 1. N 1 (Node 1) and N 3 (Node 3) are nodes that exchange information, but they are out of each other's transmission range. N 2 (Node 2) is the relay node between them. This three-node wireless network is a basic unit for cooperative transmission and it has previously been investigated extensively [20][21][22][23][24][25]. In cooperative transmission, the relay node N 2 can choose different transmission strategies, such as Amplify-and-Forward or Decode-and-Forward [22], according to different Signal-to-Noise (SNR) situations. This paper focuses on the Decode-and-Forward strategy. We consider frame-based communication in which a time slot is defined as the time required for the transmission of one fixed-size frame. Each node is equipped with an omnidirectional antenna, and the channel is half duplex so that transmission and reception at a particular node must occur in different time slots. Slow fading is assumed throughout this paper for the ease of synchronization.
Before introducing the PNC transmission scheme, we first describe the traditional transmission scheduling EURASIP Journal on Wireless Communications and Networking   scheme and the "straightforward" network-coding scheme for mutual exchange of a frame in the three-node network [20,25].

Traditional Transmission Scheduling Scheme.
In traditional networks, interference is usually avoided by prohibiting the overlapping of signals from N 1 and N 3 to N 2 in the same time slot. A possible transmission schedule is given in Figure 2. Let S i denote the frame initiated by N i . N 1 first sends S 1 to N 2 , and then N 2 relays S 1 to N 3 . After that, N 3 sends S 3 in the reverse direction. A total of four time slots are needed for the exchange of two frames in opposite directions.

Straightforward Network Coding Scheme.
References [20,25] outline the straightforward way of applying network coding in the three-node wireless network. Figure 3 illustrates the idea. First, N 1 sends S 1 to N 2 and then N 3 sends frame S 3 to N 2 . After receiving S 1 and S 3 , N 2 encodes frame S 2 as follows: where ⊕ denotes bitwise exclusive OR operation being applied over the entire frames of S 1 and S 3 . N 2 then broadcasts S 2 to both N 1 and N 3 . When N 1 receives S 2 , it extracts S 3 from S 2 using the local information S 1 , as follows: Similarly, N 2 can extract S 1 . A total of three time slots are needed, for a throughput improvement of 33% over the traditional transmission scheduling scheme.

Physical-Layer Network Coding (PNC)
. We now introduce PNC as shown in Figure 4. Let us assume that the use of BPSK modulation at all the nodes. We further assume symbol-level time and carrier-phase synchronization, and the use of power control, so that the frames from N 1 and N 3 arrive at N 2 with the same phase and amplitude (Power control can be achieved in a slow fading channel with current techniques. Additional discussion about carrier-phase and symbol time synchronization can be found in [26]) . The combined bandpass signal received by N 2 during one symbol period is where s i (t), i = 1 or 3, is the bandpass signal transmitted by N i , r 2 (t) is the bandpass signal received by N 2 during one symbol period, a i is the BPSK modulated information bit of N i , and ω is the carrier frequency. Then, N 2 will obtain a baseband signal a 1 + a 3 .
Note that N 2 cannot extract the individual information transmitted by N 1 and N 3 , that is, a 1 and a 3 , from the combined signal in a 1 + a 3 . However, N 2 is just a relay node. As long as N 2 can transmit the necessary information to N 1 and N 3 for extraction of a 1 and a 3 over there, the endto-end delivery of information will be successful. For this, all we need is a special modulation/demodulation mapping scheme, referred to as PNC mapping in this paper, to obtain the equivalence of GF(2) summation of bits from N 1 and N 3 at the physical-layer. Table 1 illustrates the idea of PNC mapping. In Table 1, s j ∈ {0, 1} is a variable representing the data bit of N j and a j ∈ {−1, 1} is a variable representing the BPSK modulated bit of s j such that a j = 2s j − 1.
With reference to Table 1, N 2 obtains the information bits: It then transmits The BER analysis in [6] shows that the end-to-end BER for the three schemes is similar when the per-hop BER is low (the BER is less than 10 −5 for 10 dB). Ignoring the slight BER difference, we have the following conclusion. For a frame exchange, PNC requires two time slots, 802.11 requires four, while straightforward network coding requires three. Therefore, PNC can improve the system throughput of the three-node wireless network by a factor of 100% and 50% relative to traditional transmission scheduling and straightforward network coding, respectively.

Applying PNC in Regular 1D Networks
Our discussions so far has only focused on the simple 3node network with one bidirectional flow. In this section, 4 EURASIP Journal on Wireless Communications and Networking we discuss the application of PNC in 1D regular networks. There are two reasons for this discussion. First, the schemes proposed in regular network still work in random networks. And the analytical results in regular networks also provide some insights about applying PNC in random networks. Second, the regular network can also find applications in real world. For example, APs (access points) positioned along a highway form a regular linear chain in a vehicular network.

Regular Linear Network with One Bidirectional Flow.
Consider a regular linear network with N nodes with equal spacing between adjacent nodes. Label the nodes as node 1, node 2, . . ., node N, successively with nodes 1 and N being the two source and destination nodes, respectively. Figure 5 shows a network with N = 5. Suppose that node 1 is to transmit frames X 1 , X 2 , . . . . to node N, and node N is to transmit frames Y 1 , Y 2 , . . . . to node 1. We could divide the time slots into two types: odd slots and even slots. In the odd time slots, the odd-numbered nodes transmit and the even-numbered nodes receive. In the even time slots, the even-numbered nodes transmit and the odd-numbered nodes receive. Figure 5 shows the sequence of frames being transmitted by the nodes in a 5-node network. In slot 1, node 1 transmits X 1 to node 2 and node 5 transmits Y 1 to node 4 at the same time. In slot 2, node 2 and node 4 transmit X 1 and Y 1 to node 3 simultaneously; both node 2 and node 4 also store a copy of X 1 and Y 1 in their buffer, respectively. In slot 3, node 1 transmits X 2 to node 2, node 5 transmits Y 2 to node 4, and node 3 broadcasts X 1 ⊕ Y 1 simultaneously; node 3 stores a copy of X 1 ⊕Y 1 in its buffer. Adding the stored X 1 to X 2 ⊕X 1 ⊕ Y 1 received with PNC detection, node 2 can obtain Y 1 ⊕ X 2 . Node 4 can obtain Y 2 ⊕ X 1 similarly. In slot 4, node 2 and node 4 broadcast Y 1 ⊕ X 2 and Y 2 ⊕ X 1 , respectively. In this way, node 5 receives a copy of X 1 and node 1 receives Y 1 in slot 4. Also, in slot 4, node 3 obtains Y 2 ⊕ X 2 by adding stored packet With reference to Figure 5, we see that a relay node forwards two frames, one in each direction, every two time slots. So, the throughput is 0.5 frame/time slot in each direction. Due to the half duplex assumption, this is the maximum possible throughput we can achieve.
As detailed above, when applying PNC on the linear network, each node transmits and receives alternately in successive time slots; and when a node transmits, its adjacent  nodes receive, and vice versa (see Figure 5). Let us investigate the signal-to-inference ratio (SIR) given this transmission pattern to make sure that it is not excessive. Consider the worst-case scenario of an infinite chain. We note the following characteristics of PNC from a receiving node's point of view.
(a) The interfering nodes are symmetric on both sides.
(b) The simultaneous signals received from the two adjacent nodes do not interfere due to the nature of PNC.
(c) The nodes that are two hops away are also receiving at the same time, and therefore will not interfere with the node.
Therefore, the two nearest interfering nodes are three hops away. We have the following SIR: where P 0 is the common (In a regular network, a trivial result of power control is that every node uses the same transmission power if the distances between adjacent nodes are constant) transmitting power of the nodes and α is the path-loss exponent. According to [27], α = 2 for free space, α = 2.7∼3.5 for urban cellular networks, and α = 4∼6 for inbuilding transmission. We calculate the SIR for different α and the results are shown in Table 2. As can be seen, when α ≥ 3 (this is typical in wireless networks), the SIR is no less than 10 dB and the impact of the interference on BER is EURASIP Journal on Wireless Communications and Networking 5 negligible for BPSK based on [28] (the capture threshold is often set to 10 db in wireless networks [3]). More generally, a thorough treatment should take into account the actual modulation scheme used, the difference between the effects of interference and noise, and whether or not channel coding is used. However, we can conclude that as far as the SIR is concerned, PNC is not worse than traditional scheduling (see Section 4) when generalized to the n-node linear network (In this paper, we assume that channel coding [17] is properly used at all the nodes and the packets can be correctly decoded to avoid error propagation once the targeted SIR is achieved. Reference [17] provides and investigates a hop-tohop channel coding scheme for PNC) .

Regular Linear Network with Multiple Flows.
Part A considers only one bidirectional flow. Here we consider a general setting in which there are K unidirectional flows in the N-node linear network. Note that this generalization includes the scenario in which there is a combination of unidirectional and bidirectional flows in the network, since each bidirectional flow can be considered as two unidirectional flows.
To allow PNC to be applied, we compose bidirectional flows out of the K unidirectional flows by matching pairs of unidirectional flows in opposite directions. The bidirectional flows can then make use of PNC for transmission, while the remaining unmatched unidirectional flows make use of the traditional strategy of multihop data transmission.
The optimal way to compose the bidirectional flows and schedule the transmission of the links in the flows is a tough problem. Here we consider a simple heuristic which is asymptotically optimal for the regular N-node linear network when N goes to infinity as shown in Part C. For simplicity, we assume that all flows have equal traffic.
We define the following terms with respect to the linear network. Let us label the nodes from left to right by 1 to N sequentially. Let (s i , d i ) denote the source-destination pair of flow i. For a right-bound flow, s i < d i ; for a left-bound flow, s i > d i . Let F denote the overall set of flows, and F R ⊆ F be the set of right-bound flows and F L ⊆ F be the set of left-found flows.
Two right-bound (left-bound) flows i and j are said to be nonoverlapping if d i < s j or d j < s i (s i < d j or s j < d i ). A right packing (left packing) is a set of nonoverlapping rightbound flows (left-bound flows). A dual packing consists of a right packing and a left packing. Figure 6 shows an example of a dual packing. Flows 2 and 3 form a right packing, and Flow 1 forms a left packing. Note that some of the nodes are traversed by both a right-bound flow and a leftbound flow. Let us call these nodes the common nodes, and the other nodes the noncommon nodes. A sequence of adjacent common nodes, flanked by but not including two noncommon nodes at two ends (an ellipse in Figure 6), forms a PNC unit, and we can use the PNC mechanism for transporting the bidirectional traffic over it. A sequence of adjacent noncommon nodes, together with the two common nodes flanking them (a rectangle in Figure 6), may or may not have traffic flowing over them. When there is traffic, the traffic is in one direction only, and the traditional multihop communication technique can be used to carry the unidirectional traffic. Essentially, by forming a dual packing, we also form many "virtual" bidirectional flows (each corresponding to a PNC unit) on which PNC can be applied.
Our heuristic as showing in Algorithm 1 consists of a method of forming dual packings from the K unidirectional flows.
The dual packings yield a set of "virtual" bidirectional flows, each corresponding to a PNC unit. Scheduling can then be performed as follows. Let us refer to the time needed for all the K unidirectional flows to transfer one packet from source to destination as one frame. Each link (hop) of a flow is allocated one time slot for transmission within a frame. A frame is further divided into two intervals, as follows.
(1) The first interval is dedicated to the PNC units (i.e., ellipses). Note that if there are M dual packings, 2M time slots are needed in the worst case; in the worst case, different dual packings use different time slots to transmit, and 2 time slots are needed for each dual packing (Two caveats are in order. The first is that according to our construction, there could be "trivial" PNC units with two nodes only. In this case, the PNC mechanism is not needed, and each node gets to transmit directly to the other node. Regardless of whether the PNC unit is trivial or not, two time slots are needed for the bidirectional flows. The second caveat is that there could be two PNC units in the same dual packing next to each other. For example, suppose nodes 1, 2, and 3 form a PNC unit, and nodes 4, 5, 6 form another. To avoid conflict, the scheduling of the transmissions on these two PNC units should be such that nodes 1, 3, 4, and 6 transmit in one time slot while nodes 2 and 5 transmit in another time slot. Again, two time slots are needed.).
(2) The second interval is dedicated to the nonPNC units (i.e., rectangles). The nodes of all rectangles of all dual packings are scheduled to transmit using the conventional scheme. The number of time slots needed in the second interval depends on both the number and the lengths of the rectangles. As will be shown in Part C, it can be ignored compared to the time slots needed in the first interval as N goes to infinity.

Throughput of 1D Network with PNC.
We now show that the packing and scheduling strategies presented in Part B can allow the upper-bound capacity of 1D network to be approached when the number of nodes N goes to infinity. Furthermore, compared with the conventional schemes discussed in [29], PNC can achieve a constant factor of throughput improvement.
We first detail the system model. To avoid edge effects, we consider a "large" circle instead of a line. The N nodes are uniformly distributed over the circle with a constant distance between adjacent nodes. Without loss of generality, let the distance between two adjacent nodes be a unit distance. Each transmission is over only one unit distance (i.e., a node only transmits to its two adjacent nodes). Consider the receiver of a link. We assume that simultaneous transmission by another link whose transmitter is two or more hops away from the receiver of the first link will not cause a collision to the first link. In our model, N/2 nodes are randomly chosen as the source nodes. The remaining N/2 nodes are the potential destination nodes. For each source node, a unique destination node is chosen among the N/2 potential destination nodes with equal probability. We assume matching without replacement in that the destination node chosen for a source node will not be put back to the pool before the destination node of another source is chosen. The route for a source-destination pair is also predetermined in a random way (note: there are two routes from a source to its destination, one in the clockwise direction and the other in the counterclockwise direction).
The analytical results for the traditional transmission scheme and straightforward network coding scheme in our circular model are similar to those in the 1D linear network in [29] when N goes to infinity. Using similar approach, it is not difficult to obtain the respective per-flow throughputs in our circular network as where unit link bandwidth is assumed.
Let us now focus on the PNC throughput. We will show that PNC can achieve the per-flow throughput 4/N − ε for any small positive value ε as N goes to infinity. Let us first provide further details to the scheduling strategy presented in Part B.
The packing and scheduling are as follows. For packing, we first unwrap the circle to a noncircular linear network by randomly selecting the source node of a clockwise flow, labelled s, on the circle as the start point of the linear network. The adjacent node of the selected source node in the counterclockwise direction in the circle, labeled e, will serve as the end point of the linear network. Next, we obtain one packing of the clockwise flows according to the packing algorithm in Part B. It is possible that the last selected flow crosses the start point. In that case, we cut the flow into two subflows by performing the cut between the start point and the end point, and only consider the first subflow in the aforementioned packing. After forming the above clockwise unidirectional packing, we form a matching counterclockwise unidirectional packing at choosing e as the start point and s as the end point. If there is an existing counterclockwise flow with e as its source node, we will start with this flow in the unidirectional packing. If not, we will choose the next flow with source node closest to e in the counterclockwise direction in our packing.
For "traffic balance", after getting the first dual packing as above, for the next dual packing, we will start with forming the counterclockwise unidirectional packing first (i.e., s and e will be defined with respect to the counterclockwise packing) before constructing the matching clockwise packing. Repeating the above procedure allows us to form a series of dual packings.
The scheduling of transmissions is the same as that in Part B except that here we also have to consider the transmission across the two subflows cut as above, if any. We assume the traffic from the destination of a preceding subflow to the source of its corresponding subflow is transmitted using the conventional scheme in the second interval.
With the above packing and scheduling strategies, we have the following theorem on the per-flow throughput of the 1D circular network when N goes to infinity.
Sketch of Proof. A sketch of the proof for Theorem 1 is provided here and a detailed proof is given in the Appendix.
With the help of the max-flow min-cut theorem, the upper bound of the per-flow throughput for our 1D circular network can be shown to be 4/N. That this upper bound can be approached with the application of the aforementioned PNC packing and scheduling strategies is argued as follows.
Consider the original N/4 unidirectional flows. With PNC packing and scheduling, these flows have been decomposed into PNC units and nonPNC units for transmission in the first and second intervals. For each round of first and second intervals (i.e., for each frame), one packet is transported from the source to the destination of each flow. We can show that the number of time slots needed in the first interval for all the flows is at most (1 + ε 1 )N/4, where the small positive quantity ε 1 goes to zero as N goes to infinity. The number of time slots needed in the second interval, on the other hand, is ε 2 N, where the small positive quantity ε 2 goes to zero as N goes to infinity. Then we can obtain the per-flow throughput with PNC: 1/(N/4 + ε 1 N/4 + ε 2 N/4) = (1 − ε)N/4. A corollary of Theorem 1 is that PNC can improve the throughput of the 1D network by a factor of 2 and 1.5 relative to the traditional transmission scheme and the SNC scheme (7), respectively.
A notable fact is that PNC can approach the capacity with minimum energy. Recall that PNC exchanges one packet between the two end nodes within two time slots, during which each of the n nodes on the chain transmits once with energy E t and receives once with energy E r . And a total energy n(E t +E r ) is used. In fact, n(E t +E r ) is the lower bound of energy to exchange one packet. For one exchange, the two end nodes must transmit once to send their message and must receive once to obtain their needed message; the n − 2 relay nodes must receive once and transmit once to finish one relay. Therefore, the energy of n(E t + E r ) is necessary.

Applying PNC in 2D Grid Network
Section 3 focused on the 1D regular network. This section investigates the application of PNC in a 2D regular gird network. We assume the same transmission protocol as in Section 3.

2D Grid Network with One Bidirectional Flow in Each
Line. Figure 7 shows the grid network under consideration, in which N nodes are uniformly located at the cross points as shown. In this part, we first consider the case in which each line (horizontal or vertical) on the grid has one and only one bidirectional flow. Specifically, the two end nodes in each line, node 1 and node √ N, exchange information through the relay nodes in between.
The flows transmit with the following PNC schedule. Consider the horizontal lines (similar schedule applies for the vertical lines). The first two time slots are dedicated to transmissions on lines 1, J + 1, 2J + 1, . . .; the next two time slots are dedicated to transmissions on lines nodes on the lines 2, J + 2, 2J + 2, . . .; and so on. The separation J must be large enough for acceptable SIR. In the example of Figure 7, For a group of simultaneous active lines, to reduce SIR, when the odd nodes transmit on one active line, then the even nodes will transmit on its two adjacent active lines, as shown in Figure 7.
Let us investigate the SIR of this transmission pattern given a J. Consider the worst-case scenario in which N goes to infinity. For a given receiver, the interference from the nodes within the same line is I 1 = 2 * ∞ l=1 P 0 /[(2l + 1)d] α , where P 0 , l, d = 1, and α are defined similarly as in Section 3.1. Without loss of generality, suppose that the receiver is an even node. The interference from the other active lines whose odd nodes are transmitting is and the interference from the other active lines whose even nodes are transmitting is Thus, the overall SIR is given by For a typical value of α = 4, the SIR in (9) is about 13.5 dB, 12.3 dB, and 10.0 dB for J equals 5, 4, and 3, respectively. With an assumed 10 dB target, J = 3 is enough to guarantee successful transmission.  Here we apply a simple routing scheme, as in [29]. For a source-destination pair at positions (x s , y s ) and (x d , y d ), the data will first be forwarded vertically to the node at (x s , y d ) before being forwarded horizontally to the destination. The horizontal and vertical transmissions are separated into two different time intervals. For horizontal (or vertical) transmissions, the scheduling within each line (column) is the same as that in the Section 3.2 and the scheduling among different lines (columns) is the same as in part A.

2D Grid
When N goes to infinity, the number of nodes in each line or column, √ N, also goes to infinity, and the per-flow PNC throughput in each line or column will approach 4/ √ N, as argued in Section 3. Since the horizontal transmission and vertical transmission are scheduled in different time interval and in each interval every J lines (columns) transmit simultaneously, the per-flow transmission of PNC in the 2D grid network can approach For comparison purposes, let us look at the perflow throughput under the traditional transmission strategy and under the straightforward network coding strategy. With the routing/scheduling strategy and the corresponding throughput analysis in [29], we can show that the traditional transmission scheme and SNC scheme can achieve the following throughputs, respectively: In the 2D grid network, the nodes are tightly packed than in the 1D network, and the interfering nodes must be kept at least 3 hops away, that is, Δ = 2, to obtain an SIR of no less than 10 dB (note: in the 1D network, Δ could be 1 for SIR of about 10 dB). When Δ = 2, we can verify that throughputs better than (11) cannot be achieved. In other words, the throughput in (11) is also the upper bound for traditional transmission scheme and SNC scheme under all possible schedulings.
Therefore, setting J = 3 in (10), we conclude that PNC can achieve a throughput improvement factor of 3 and 2 relative to the traditional transmission scheme and the SNC scheme, respectively. Note that the improvement factors under the 2D network are larger than those under the 1D network, which are 2 and 1.5, respectively (see Section 3).

Conclusion
This paper has introduced a novel scheme called Physicallayer Network Coding (PNC) that significantly enhances the throughput performance of multihop wireless networks. Instead of avoiding interference caused by simultaneous electromagnetic waves transmitted from multiple sources, PNC embraces interference to effect network-coding operation directly from physical-layer signal modulation and demodulation. With PNC, signal scrambling due to interference, which causes packet collisions in the MAC layer protocol of traditional wireless networks (e.g., IEEE 802.11), can be eliminated.
We have proposed explicit scheduling algorithms for PNC in 1D and 2D regular networks with multiple random flows. It is shown that PNC can potentially achieve 100% and 50% throughput increases compared with traditional transmission and straightforward network coding, respectively, in the 1D regular linear network. The throughput improvements are even larger in the 2D regular network: 200% and 100%, respectively. In particular, PNC can allow the upper-bound throughput of the 1D regular network to be approached as the number of nodes goes to infinity.

A. Proof of Theorem 1
This appendix proves Theorem 1 in three steps. First, the fact that 4/N is the upper bound for the throughput of the 1D circular linear network can be argued as follows. Let us consider the number of time slots needed so that each flow can transport one packet from its source to its destination. Due to half-duplexity, there can be at most N/2 transmitting nodes in a time slot. In general, each transmitting node can transmit to at most two of its adjacent nodes simultaneously. Hence, in total, there can be at most Next, we prove that the number of time slots needed in the second interval is negligible compared to N, denoted by ε 2 N where ε 2 is a small positive quantity that goes to zero as N goes to infinity. The total one-hop transmissions in the second interval can be divided into two parts, the one-hop transmissions in the rectangles and the one-hop transmissions between subflows (created when we unwrap the circular network into a linear network).
Let us first consider the rectangles. As shown in Figure 8, within a dual packing, the rectangles do not overlap. Furthermore, the two end nodes in a rectangle must be either a source or destination node of some flow. As a proof technique, let us artificially divide the rectangles into two groups according to the dual packings containing them. Recall that the dual packings are formed successively in our packing algorithm. Consider the first (1 − ε 3 ) fraction of all flows (including the original flows and the generated subflows) that are included successively into the dual packings. The first group of rectangles arises from these flows. The second group of rectangles belongs to the remaining ε 3 fraction of the flows. We set ε 3 such that As discussed in Section 3.2, when we perform packing on the circular network by unwrapping it to a linear network, it is possible for a flow to be cut into two subflows. Each clockwise unidirectional packing contains at least one flow that does not generate subflows (a flow cannot have more than N hops). As a corollary, if the clockwise packing contains a flow that has been cut into two subflows, then the packing must contain at least two flows to start with. One of these subflows will be relegated to a future packing exercise. So, each clockwise packing reduces the number of remaining flows to be packed by at least one. For the matching counterclockwise packing, at most one flow will be cut into two subflows. Thus, the matching counterclockwise packing does not increase the number of remaining counterclockwise flow. Recall from the discussion in Section 3.2 that for "traffic balance" successive dual packings will start with clockwise and counterclockwise packings in an alternate manner. Thus, successive dual packings will reduce the numbers of remaining clockwise and counterclockwise flows by at least one alternately.
In the beginning, there are N/2 original flows (N/4 of which are clockwise and N/4 of which are counterclockwise flows). From the argument in the previous paragraph, there are altogether at most N/2 dual packings. Each dual packing will at most generate at most two extra flows to the flow pool (because of cut between s and e). Thus, altogether there could be at most N extra flows being generated. Hence, the total number of flows (including the original flows and the subflows) is 3N/2.
In general, since the two end nodes of a rectangle must be either a source or a destination of some flow, the number of rectangles in a dual packing is no more than the number of flows in that dual packing (note: some nonend nodes within a rectangle could also be sources or destinations; thus the "no more than" rather than "equal to"). Therefore, the number of rectangles in the first group is no more than (1 − ε 3 )N. For these rectangles, as shown in Lemma 2 at the end of this appendix, the number of nodes in each group-1 rectangle is no more than (1 − ε 4 ) log(N) + ε 4 N w.h.p., where ε 4 is a small positive quantity that goes to zero when N goes to infinity. Similarly, the number of rectangles in the second group is upper bounded by ε 3 N. As a trivial bound, we will upperbound the number of nodes in each group-2 rectangle by N. Note that each node will at most transmit once within a rectangle (group-1 or group-2) for traffic forwarding. Thus, the total number of one-hop transmissions needed for the rectangles is upper bounded by Now, consider the transmissions across subflows. A one-hop transmission is needed for two adjacent subflows generated by the cut when we unwrap the circular network to a corresponding linear network. In other words, there is a one-hop transmission whenever there is an extra subflow, which is upper bounded by N/2 according to the above argument. Thus, the total number of one-hop transmissions between all adjacent subflows is upper bounded by Putting things together, the total one-hop transmissions in the second interval is upper bounded by T 1 + T 2 . Since we determine the start and end nodes of each dual packing in a uniformly random way and pack each unidirectional packing in a uniformly random way, the one-hop transmissions in the rectangles are also uniformly distributed among all the N nodes along the circle. With the traditional transmission scheme, there are N/2 one-hop transmissions in each time slot. Therefore, the time slots needed in the second interval 10 EURASIP Journal on Wireless Communications and Networking are upper bounded by where ε 2 is determined by ε 3 , ε 4 , and N. It is easy to show that ε 2 will go to zero as N goes to infinity. Finally, we prove that the number of time slots needed in the first interval is less than (1 + ε 1 )N/4. In a unidirectional packing, a residual node is an idle node that through which no packet passes (i.e., none of the flows of the unidirectional packing passes through the node). Thus, the number of nodes through which one packet passes in one unidirectional packing is N, minus the number of residual nodes. Consider a dual packing to which group-1 rectangles belong. According to Lemma 1 immediately after the proof of Theorem 1 here, the number of residual nodes in each of the unidirectional packings of the dual packings is less than log(N) w.h.p.. That is, the number of nonresidual nodes in a unidirectional packing is more than N-log(N) w.h.p., and the number of nonresidual nodes in both the unidirectional packings of the dual packing is more than 2(N − log N). That is, the traffic handled by each dual packing (in terms of packet flows across all nodes in the dual packing) is more than 2(N − log N). Now, consider an arbitrary node in the network. According to our model, it is either the source or destination of some flow. The packet of that flow passes through it with probability 1. For the other N/2 − 1 original flows, a packet passes through the node with probability 1/2. By the Chernoff-Hoeffding theorem, the number of packets that go through each node is 1/2 · (N/2 − 1) + 1 w.h.p.. Considering all N nodes, the number of packets passing through them is (1/2(N/2 − 1) + 1)N. Note that this is the total traffic which is more than the traffic in the dual packings to which group-1 rectangles belong.
Therefore, the number of dual packings to which the group-1 rectangles belong is upper bounded by Similar to the argument for group-1 rectangles, for the flows containing the group-2 rectangles, there are at most ε 3 N flows which will generate at most ε 3 N unidirectional packings, that is, ε 3 N/2 dual packings. Then we can obtain that the total number of dual packings is no more than with high probability, where ε 1 is determined by ε 3 and N.
It is easy to verify that ε 1 goes to zero as N goes to infinity.
Since each packing needs at most two times slots, the time slots needed for the first interval are at most k 1 = (1+ε 1 )N/4. With the help of k 1 and k 2 , we can obtain the lower bound of the per-flow throughput as where ε can be obtained from ε 1 , ε 2 , and N, and it goes to zero as N goes to infinity. Then Theorem 1 is proved.

Lemma 1. For any clockwise (counterclockwise) unidirectional packing contained in the dual packings to which group-1 rectangles belong, the number of residual nodes is less than
Proof. Let P denote the set of dual packings to which group-1 rectangles belong. Let us focus on one clockwise unidirectional packing p in P. The proof for the counterclockwise case is similar. Let P c be the clockwise packings in P. Let m denote the number of clockwise flows in P c . According to our way of partitioning the rectangles into the two groups, Recall that in our traffic model, we randomly select N/2 nodes to be sources and N/2 nodes to be destinations. In other words, any node among the N nodes is either a source or a destination. This applies to any residual node in p as well. In particular, a residual node in p is either (1) a destination node (of a clockwise or counter-clockwise flow), (2) a source node of a counter-clockwise flow, or (3) a source node of a clockwise flow. In case 3, since the residual node is a residual node in p, it must be a source node of a clockwise flow already packed (i.e., already belong to P c ) prior to packing p.
For a unidirectional packing, consider the first flow from the start point s. Suppose this flow ends at node i. Let us consider the probability of node (i + 1) being a residual node with respect to this unidirectional packing. Due to the randomness of our packing procedure and our random selection of sources and destinations for flows, node (i + 1) is a destination node with probability p 1 = 1/2; it is a source node of a counter-clockwise flow with probability p 2 = 1/4 w.h.p, and it is a source node of a prepacked clockwise flow with probability p 3 ≤ (1 − ε 3 )/4 w.h.p. Then the probability that node (i + 1) is a residual node given that node i is not a residual node is In out notation above, the 1 in P(1 | 0) refers to the fact that we have found one residual thus far, and the 0 refers to the fact that we have not found any residual node so far. Given node (i + 1) is a residual node, the probability that the node (i + 2) is also a residual node is P(2 | 1) ≤ P(1 | 0) (due to sampling without replacement). The probability of a sequence of l or more residual nodes is given by When l = log(N), as N-goes to infinity, the above probability is exp(− log(N)/4), which will approach zero. Thus, Lemma 1 is proved.

Lemma 2.
For group-1 rectangles, the number of nodes in each rectangle is no more than 2log(N) with probability 1−ε 4 , where ε 4 is a small positive quantity that goes to zero when N goes to infinity.
Proof. With respect to Figure 8 and the explanation in its caption, let N r , N g , N b denote the number of red, green, and blue nodes in a dual packing, respectively. By Lemma 1,

Introduction
Relay channel, which considers the communication between a source node and a destination with the help of a relay node, was introduced by van der Meulen in [1]. Based on this channel model, Cover and El Gamal developed coding strategies known as decode-forward (DF) and compressforward (CF) in [2]. These techniques now become standard building blocks for cooperative and relaying networks, which have been extensively studied in the literature (e.g., [3,4]). For many applications, communication is inherently two-way. A typical example is the telephone service. In fact, the study of two-way channel is not new and can be traced back to Shannon's work in 1961 [5]. However, the model of two-way relay channel, though natural, did not attract much attention. Recently, probably due to the advent of network coding [6] in the last decade, there is a growing interest in this model. The application of DF and CF to two-way relay channel was considered in [7]. The halfduplex case was studied in [8,9]. The results in [10] showed that feedback is beneficial only in a two-way transmission. Network coding for the two-way relay channel was studied in [11,12]. Physical layer network coding based on lattices is considered recently [13], and shown to be within 0.5 bit from the capacity in some special cases [14].
All the aforementioned works are for one relaying node. It is easy to envisage that in real systems, more than one relay can be used. Schein in [15] started the investigation of the network with one source-destination pair and two parallel relays in between. This model was further studied in [16] under the assumption of half-duplex relay operations. For one-way multiple-relay networks in general, cooperative strategies were proposed and studied in [17]. We remark that a notable feature that does not exist in the singlerelay case is that the multiple relays can act as a virtual antenna array so that beamforming gain can be reaped at the receiver. In this paper, we follow this line of research and consider two-way communications. Two-relays are assumed, for this simple model already captures the essential features of the more general multiple-relay case. We are interested in knowing how different techniques can be used to construct transmission strategies for the two-way two-relay network and how they perform under different channel conditions. In particular, we apply the idea of network coding to both the 2 EURASIP Journal on Wireless Communications and Networking physical layer and the network layer. Besides, channel coding techniques for multiple access channel (MAC) and multiinput multi-output (MIMO) channel are also employed. Several transmission strategies are thus constructed and their achievable rate regions are derived.
We remark that the channel model that we consider in this paper is also called the restricted two-way two-relay channel [7]. This means that the signal from a source node depends only on the message to be transmitted, but not on the received signal at the source. Besides, our results are obtained under the half-duplex assumption, which is realistic for practical systems. Each node is assumed to transmit one half of the time and receive during the other half of the time. The performance of our proposed strategies can be further improved if the ratio of transmission time and receiving time is optimized. We do not consider this more general case, since it complicates the analysis but provides no new insights. This paper is organized as follows. Our network model is described in Section 2. Some basic coding techniques are reviewed in Section 3. Based on these coding techniques, several transmission strategies are devised in Section 4. Their performance at high signal-to-noise ratio regime is analyzed in Section 5. The rate regions of these strategies are compared under some typical channel realizations in Section 6. The conclusion is drawn in Section 7.

Channel Model and Notations
The two-way two-relay (TWTR) network consists of four nodes: two terminals A and B, and two parallel relays 1 and 2 (see Figure 1). Terminals A and B want to exchange messages with the help of the two relays. We assume there is no direct link between the two terminals and between the two-relays. Furthermore, all of the nodes are half-duplex. The total communication time, 2N, are divided into two stages, each of which consists of N time slots. In the first stage, the terminals send signals and the relays receive. In the second stage, the relays send signals and the terminals receive. The solid arrows in Figure 1 correspond to stage 1 and the dashed arrows correspond to stage 2.
Suppose that terminals A and B are equipped with n antennas, whereas each of relays 1 and 2 has only one antenna. For i ∈ {A, B} and j ∈ {1, 2}, we use X i (t) ∈ R n to denote the transmit signal from node i, and Z j (t) ∈ R to denote independently and identically distributed (i.i.d.) Gaussian noise with distribution N (0, σ 2 ). The channel is assumed static and the channel gain from node i to j is denoted by an n-dimensional column vector h i j . We assume channel reciprocity holds so that h i j = h ji . In the first stage, the outputs of the network at time t = 1, 2, . . . , N, are given by Figure 1: Model of two-way two-relay network. The labels of the arrows indicate the corresponding link gains.
In the second stage, for t = N + 1, N + 2, . . . , 2N, the outputs at the terminal nodes are where . We assume that the link gains h A1 , h A2 , h B1 , and h B2 are time-invariant and known to all nodes. We have the following power constraints in each stage: for i ∈ {A, B}, and for j ∈ {1, 2}, where P A , P B , P 1 , and P 2 denote the power constraints on terminals A and B and relays 1 and 2, respectively. Let R A and R B be the data rates of terminal A and B, respectively. In a period consisting of 2N channel symbols (N symbols for each phase), terminal A wants to send one of the 2 2NRA symbols to terminal B, and terminal B wants to send one of the 2 2NRB symbols to terminal A. A (2 2NRA , 2 2NRB , 2N) code for the TWTR network consists of two message sets two relay functions and two decoding functions For i = A, B, terminal i transmits the codeword f i (m i ) in stage one, where m i is the message to be transmitted. For j = 1, 2, relay j applies the function φ j to its received signal and transmits the resulting signal in the second stage. Let the received signals at terminals A and B be Y N A and Y N B , respectively. In this paper, we will use a superscript "Y N " to indicate a sequence of length N. So Y N A and Y N B are sequences of length N, with each component equal to a vector in R n . After the second stage, terminal i decodes the message from the other source node by g i . We note that the decoding function g i uses the message from source terminal i as input as well. We say that a decoding error A rate pair (R A , R B ) is said to be achievable if there exists a sequence of (2 2NRA , 2 2NRB , 2N) codes, satisfying the power constraints in (5) and (6), with P 2N e → 0 as N → ∞. Although the terminals are equipped with n antennas, the transmitted signals from the terminals are essentially 2 dimensional. To see this, we observe that the first term in the right hand side of (1), namely, h T A1 X A (t), is a projection of X A (t) in the direction of h A1 . Any signal component of X A (t) orthogonal to h A1 will not be picked up by relay 1. Likewise, from (2), we see that any signal component of X A (t) orthogonal to h A2 will not be sensed by relay 2. There is no loss of generality, if we assume that the signals transmitted from the terminals take the following form: We consider the 2dimensional vector λ i (t) as the input to the channel at node i. The power constraint in (5) can be written as for i ∈ {A, B}.
Notations. We will treat 2 × 1 random vectors λ A and λ B as input signals at terminal A and B, respectively, and let K A and K B denote their corresponding 2×2 covariance matrices. For i ∈ {A, B} and j ∈ {1, 2}, let be the signal to noise ratio of the signal received at relay j from terminal i. Shannon's capacity formula is denoted by . Also, for n × n matrices, we let C n (X) 0.25log 2 det(I n + X), where I n denote the n × n identity matrix. The reason for the factor of 0.25 before the log function, instead of a factor of 0.5 in the original capacity formula, is due to the fact that the total transmission time is divided into two stages of equal length. All logarithms in this paper are in base 2. The set of non-negative real numbers is denoted by R + . Gaussian distribution with mean zero and covariance matrix K is denoted by N (0, K).

Review of Coding Techniques and Capacity
Regions from Information Theory The proposed transmission strategies are based on a host of existing coding techniques and capacity results. A review of them is given in this section.

Physical-Layer Network Coding.
In wireless channel, the channel is inherently additive; the received signal is a linear combination of the transmitted signals. This fact is exploited for the two-way relay channel in [18][19][20][21]. Consider the following single-antenna two-way network with two sources and one relay in between. There is no direct link between the two sources, and the exchange of data is done via the relay node in the middle. Let x i (t) be the transmitted signal from source i, for i = 1, 2. The transmission is divided into two phases. In the first phase, the relay receives where z(t) is an additive noise. For simplicity, it is assumed that both link gains from the sources to the relay are equal to one. In the second phase, the relay amplifies the received signal y(t), and transmits a scaled version ζ y(t) of y(t), where ζ is a scalar chosen so that the power requirement is met. Since source 1 knows x 1 (t), the component ζx 1 (t) within the received signal at source 1 can be treated as known interference, and hence be subtracted. Similarly, source 2 can subtract ζx 2 (t) from the received signal. Decoding is then based on the signal after interference subtraction.

Multiplexed
Coding. Multiplexed coding [22] is a useful coding technique for multi-user scenarios in which some user knows the message of another user a priori. Consider the two-way relay channel as in the previous paragraph. Node 1 wants to send message m 1 to node 2 via the relay node, and node 2 wants to send message m 2 to node 1 via the relay node. For i = 1, 2, let n i be the number of bits used to represent message m i . The transmission of the nodes is divided into two phases. In the first phase, the two source nodes transmit. Suppose that the relay node is able to decode m 1 and m 2 . For the encoder at the relay, we generate a 2 n1 ×2 n2 array of codewords. Each codeword is independently drawn according to the Gaussian distribution such that the total power of each codeword is less than or equal to P. In the second phase, the relay node sends the codeword in the (m 1 , m 2 )-entry in this array. Suppose that the received signal at source node i is corrupted by additive white Gaussian 4 EURASIP Journal on Wireless Communications and Networking noise with variance σ 2 i , for i = 1, 2. At source 1, since m 1 is known, the decoder knows that one of the 2 n2 codewords in the row corresponding to m 1 had been transmitted. Out of these 2 n2 codewords, it then declares the one based on the maximal likelihood criterion. By the channel coding theorem for the point-to-point Gaussian channel, source 1 can decode reliably at a rate of 0.5 log(1+P/σ 2 1 ). Likewise, by considering the columns in the array of codewords, source 2 can decode at a rate of 0.5 log(1 + P/σ 2 2 ). Multiplexed coding can be implemented using concepts from network coding. We assume, without loss of generality, that n 2 ≥ n 1 . We identify the 2 n2 possible messages from source node 2 with the vectors in the n 2 -dimensional vector space over the finite field of size 2, F n2 2 , and identify the 2 n1 messages from source node 1 with a subspace of F n2 2 of dimension n 1 , say V 1 . We generate 2 n2 Gaussian codewords independently, one for each vector in F n2 2 . To send messages m 1 and m 2 in the second phase, the relay node transmits the codeword corresponding to m 1 + m 2 , where the addition is performed using arithmetics in F n2 2 . The output of the decoder at node 1 is a vector in F n2 2 . We subtract from it the vector in V 1 corresponding to m 1 . If there is no decoding error, this gives the codeword corresponding to m 2 , and the value of m 2 is recovered. Now let us consider node 2. Since m 2 is known a priori, node 2 is certain that the signal transmitted from the relay is associated with one of the vectors in the affine space m 2 + V 1 . The message m 1 can be estimated by comparing the likelihood function of the 2 n1 codewords associated with m 2 + V 1 . It can be seen that the maximal data rate is the same as in the array approach mentioned in the previous paragraph, but the size of the codebook at the relay reduces from 2 n2+n1 to 2 n2 .

Capacity Region for MIMO Channel.
Consider a MIMO channel with n T transmit antennas and n R receive antennas, with the link gain matrix denoted by a real n R × n T matrix H. The channel output equals where X is the n T -dimensional channel input and Z is an n R -dimensional zero-mean colored Gaussian noise vector with covariance matrix K Z . Without loss of information, we whiten the noise by pre-multiplying both sides of (15) by The transformed channel output is thus The covariance matrix of the noise vector K −1/2 Z Z is now the n R × n R identity matrix. By the capacity formula for MIMO channel with white Gaussian noise [23], the capacity for the MIMO channel in (15) is given by where K X denotes the n R × n R covariance matrix of X. Using the identity which holds for any n × m matrix A and m × n matrix B, we rewrite (17) as

Capacity Region for Multiple-Access Channel (MAC).
The channel output of the two-user single-antenna Gaussian multiple-access channel is given by where x i is the signal from user i, for i = 1, 2, and z is an additive white Gaussian noise with variance σ 2 . Each of the two users wants to send some bits to the common receiver.
Suppose that the power of user i is limited to P i , for i = 1, 2.
The rate pair (R 1 , R 2 ), where R i is the data rate of user i, is achievable in the above 2-user MAC if and only if it belongs to We refer the reader to [24] for more details on the optimal coding scheme for MAC.

Channel-Network Coding Strategies
We develop five transmission schemes for TWTR network.
In the first scheme (AF), the received signals at both relay nodes are amplified and forwarded back to terminals A and B. In the second and third scheme (HLC, HMC), one of the relays employs the amplify forward strategy, while the other decodes the messages from terminals A and B. In the fourth scheme (DF), both relays decode the messages from terminals A and B. In the last strategy (PDF), another mixture of decode-forward and amplify-forward strategy is described.

Amplify Forward (AF).
In this strategy, relay node j ( j ∈ {1, 2}) buffers the signal received in the first stage, and amplifies it by a factor of ζ j . The amplified signal is then transmitted in the second stage. At the end of the second stage, each terminal, who has the information of itself, subtracts the corresponding term and obtains the desired message from the residual signal.
EURASIP Journal on Wireless Communications and Networking 5 By putting (25) into (3), we can write the received signal at terminal A as Here, we have replaced X A (t) and X B (t) by their 2dimensional representations H A λ A (t) and H B λ B (t). Since terminal A knows its own input λ A (t) as well as the link gains and amplifying factors, the signal component containing λ A (t) as a factor can be subtracted from Y A (t). The residual signal is The message from terminal B can then be decoded using a decoding algorithm for point-to-point MIMO channel. The received signal at terminal B is treated similarly. where ζ 1 , ζ 2 ∈ R and K A and K B are 2 × 2 covariance matrices, such that the following power constraints: are satisfied.
Proof. The residual signal (27) at terminal A can be written as H T af H B λ B (t) plus a noise vector with covariance matrix N A af . The residual signal at terminal B equals H af H A λ A (t) plus a noise vector with covariance matrix N B af . Therefore, after self-signal subtraction, the resultant channels can be considered MIMO channels with two transmit antennas and n receive antennas. From (19), we obtain the rate constraints in (28). The inequalities in (30) are the power constraints for terminals A and B, and those in (31) are the power constraints for relays 1 and 2.

Hybrid Decode-Amplify Forward with Linear
Combination (HLC). In this strategy, relay 1 decodes the messages from terminals A and B, and meanwhile, relay 2 employs the amplify-forward strategy. In order to obtain beamforming gain, after decoding the two messages, relay 1 reconstructs the codewords corresponding to the decoded messages and sends a linear combination of them in the second stage.
In the first stage, relay 1 and terminals A and B form a multiple-access channel with relay 1 as the destination node. We use the optimal encoding scheme for MAC at terminals A and B, and the optimal decoding scheme at relay 1. In the second stage, relay 1 decodes and reconstructs X A (t) and X B (t), and then transmits a linear combination for some z A and z B ∈ R n . Relay 2 amplifies Y 2 (t) by a scalar factor ζ and transmits X 2 (t) = ζY 2 (t).
At terminal A, after subtracting the signal component that involves X A (t), we get At terminal B, the residual signal after subtraction is The decoding is done by using decoding method for MIMO channel. where z A , z B ∈ R n , ζ ∈ R, and K A and K B are 2 × 2 covariance matrices such that the following power constraints: are satisfied.
In (35), the product of a real number x and a set A is defined as xA {xa : a ∈ A}.
Proof. From the rate constraints for MAC channel in (22)-(24), we have the rate constraints for relay 1 in (35). We multiply by a factor of one half because the first phase only occupies half of the total transmission time.
The conditions in (36) and (37) are derived from the capacity formula for MIMO channel with colored noise in (19). The inequalities in (39) are the power constraints for sources A and B. The inequalities in (40) and (41) are the power constraints for relays 1 and 2, respectively.
The parameters z A , z B , K A , and K B can be obtained by running an optimization algorithm. For example, we can aim at maximizing a weighted sum (HMC). As in the previous strategy, relay 1 decodes and forwards the messages from A and B, and relay 2 amplifies and transmits the received signal. However, in this strategy, relay 1 re-encodes the messages into a new codeword to be sent out in the second stage. Terminals A and B decode the desired messages based on multiplexed coding.

Hybrid Decode-Amplify Forward with Multiplexed Coding
where Proof. The proof is by random coding argument and we will sketch the proof below. More details can be found in [25].
Our objective is to show that any rate pair (R A , R B ) that satisfies the condition in the theorem is achievable. For i = A, B, terminal i randomly generates a Gaussian codebook with 2 2NRi codewords with length N, satisfying the power constraint in (5). Label the codewords by X N i (m i ), for m i ∈ M i . For relay 1, we generate a 2 2NRA × 2 2NRB array of Gaussian codewords of length N and power P 1 . The codeword in row m A and column m B is denoted by X N 1 (m A , m B ), and satisfies the power constraint in (6).
After the first stage, relay 1 is required to decode both messages from terminals A and B. This can be accomplished with arbitrarily small probability of error if the rate constraints for MAC in (22) to (24) are satisfied. This corresponds to the rate constraint in (42). Let the estimated messages from A and B be m A and m B .
In the second stage, relay 1 transmits X N 1 ( m A , m B ). Relay 2 amplifies its received signal and transmits ζY 2 (t). From (41), the amplified signal has average power no more than P 2 .
After subtracting the term ζh A2 h T A2 X A (t), which is known to terminal A, the residual signal at terminal A is Note that terminal A knows its message m A , and m A = m A with probability arbitrarily close to one if (42) is satisfied. The idea of multiplexed coding can then be used. In (48), the covariance matrix of the signal in square bracket is given by G B hmc in (46), and the covariance of the noise term is given by N A hmc . Applying the capacity expression, we obtain the rate constraint in (44). In a similar manner, we obtain (43).

Decode Forward (DF).
In the DF strategy, terminal node i, (i ∈ {A, B}) splits the message m i into two parts: the common part m ic and the private part m ip . The two common messages are transmitted via both relay nodes. The private message m Ap is decoded by relay 1 only, and can be interpreted as going through the path from terminal A to relay 1 to terminal B. Symmetrically, the private part of message m Bp is decoded by relay 2 only, and can be interpreted as going through the path from terminal B to relay 2 to terminal A. After the first stage, relay 1 decodes the common messages of both terminals and the private message of terminal A. Relay 2 decodes the common messages of both terminals and the private message of terminal B. The encoding and decoding schemes in the first stage is similar to those developed by Han and Kobayashi for the interference channel (IC) in [26]. Since both relays have access to the common messages, the channel in the second stage can be considered a multiple access channel with common information. Furthermore, since terminals A and B have information of themselves, we can further improve the rate region by the idea of multiplexed coding.
We have the following characterization of the rate region for the DF strategy: Theorem 4. For i ∈ {A, B}, let R ip and R ic be the rates of the private and common messages, respectively, from terminal i. Let Γ j denote P j /σ 2 for j = 1, 2, and let K Ac , K Ap , K Bc , and K Bp denote 2 × 2 covariance matrices, and Tr for some nonnegative α j and α j .
Details of the DF coding scheme and the proof of Theorem 4 are given in the Appendix.

Partial Decode Forward (PDF).
In the PDF strategy, both relays decode the message of terminal A. Each relay then subtracts the reconstructed signal of terminal A from the received signal. Call the resulting signal the residual signal. The message of terminal A is re-encoded into a new codeword, and linearly combined with the residual signal. This linear combination is then transmitted in the second stage. Since both relays know the message of terminal A, the two-relays can jointly re-encode the message of terminal A using some encoding scheme for a MIMO channel with two transmit antennas and n receive antennas.

Theorem 5. A rate pair (R A , R B ) is achievable by the PDF strategy if it satisfies
where and ζ j ∈ R and K A , K B , K R are 2 × 2 covariance matrices such that the following power constraints hold Proof. The two-relays treat the signal originated from terminal B as noise, and decode the message of terminal A. The rate requirement in (58) guarantees that the message of terminal A can be decoded with arbitrarily small probability of error at both relays. Let the decoded message of terminal A be denoted by m A .
At the relays, we employ two Gaussian codebooks for the re-encoding of the message from terminal A. For each message m A , we generate two correlated codewords U 1,mA (t) and U 2,mA (t), with mean zero and each pair of symbols at any t distributed according to a 2 × 2 covariance matrix K R . At relay j, the decoded message m A is re-encoded into U j, mA (t), which is a codeword with power K R ( j, j). In the second stage, relay j transmits for some amplifying factor ζ j . The inequality in (63) ensures that the power constraint is satisfied at the relays. At the end of stage 2, terminal A subtracts the signal component that involves U 1,mA and U 2,mA from its received signal and obtains From the capacity formula for MIMO channel (19), terminal A can recover the message from terminal B reliably if (60) is satisfied.

EURASIP Journal on Wireless Communications and Networking
For the decoding in terminal B, we subtract all terms involving X B (t), and get This is equivalent to a MIMO channel with link gain matrix H B and colored noise. Recall that K R is the covariance matrix of the encoded signal. By the capacity formula of MIMO channel (19), we obtain the rate constraint in (59).

Remark 1.
We note that the matrices N i af , N i hlc , N i hmc and N i pdf , for i = A, B, are invertible. Indeed, by checking that v T Nv is strictly positive for all non-zero v ∈ R n , we see that the matrix is positive definite, and hence invertible.

Performance in High SNR Regime
In this section, we compare the performance of the five strategies described in the previous section in the high Signal-to-Noise Ratio (SNR) regime.
For fixed powers and link gains, let C sum (σ 2 ) denote the sum rate R A + R B as a function of the noise variance σ 2 . We use the multiplexing gain (also called degree of freedom) [27], defined by M lim as the performance measure at high SNR. At high SNR, that is, when σ 2 is very small, we can approximate the sum rate by (M/2) log(σ −2 ) if the multiplexing gain is equal to M. Consider the multiplexing gain of the AF scheme. When the sum rate R A + R B is maximized subject to the rate constraints (28) in Theorem 1, the equalities in (28) hold. We can assume without loss of generality that We first suppose that the covariance matrices K A and K B , and the amplifying constants ζ 1 and ζ 2 , are fixed. Note that if the power constraint in (31) holds, then it continues to hold if σ 2 becomes smaller. Therefore, when σ 2 → 0, the power constraints in (30) and (31) are satisfied.
Each of the expressions in (68) and (69) can be written in the form where M is a 2 × 2 matrix that equals in (73) can be expanded as a polynomial in σ −2 , with the degree equal to the rank of M. Therefore, the limit depends only on the rank of the matrix M, and equals 0, 0.5, or 1, if the rank of M is 0, 1, or 2, respectively. The problem of determining the multiplexing gain now reduces to determining the rank of the matrices in (71) and (72).
Recall that the rank function satisfies the following properties [28, page 13]: (i) if A and C are square invertible matrices, then rank(ABC) = rank(B) for all matrix B, whenever the matrix multiplications are well-defined; (ii) for all m × n matrices A, we have rank(A T A) = rank(A). Consider the matrix in (72). After replacing H af by its definition, we can express the matrix in (72) as where Z denotes the diagonal matrix diag(ζ 2 1 , ζ 2 2 ). We assume that H A and H B have full rank. This assumption holds with probability one if the link gains are generated from a continuous probability distribution function such as Rayleigh. Also, we assume that Z, K A , and K B are of full rank. This assumption does not incur any loss of generality, because they are design parameters that we can choose. We can perturb them infinitesimally, and the resulting matrices will be of rank two, but the value on the right hand side of (69) deviates negligibly. By property (i), and the fact that Since the above argument holds for all invertible K A and K B , and positive ζ 1 and ζ 2 , we conclude that the multiplexing gain of the AF strategy is equal to 2. For HLC and HMC, relay 1 is required to decode the messages of the terminals, and in both schemes the sum rate is subject to the sum rate constraint in the MAC channel in the first phase. The multiplexing gains of both the HLC and HMC strategies are limited by Similarly, the multiplexing gain of DF is also limited by the decoding of messages at the relays. The rate constraints (50) and (51) imply that it is no more than 0.5. The multiplexing gain of the PDF scheme is somewhere in between the multiplexing gains of AF and DF. The transmission from terminal B to terminal A can be considered AF, while the transmission from terminal A to terminal B in the other direction is limited by the message decoding after stage 1. From (58), we get and from (60), we have provided that the H A has full rank. Therefore, its maximal multiplexing gain is 1.5. We summarize the performance of the five schemes at high SNR in Table 1. We can see that the AF strategy has the highest multiplexing gain. It is well known that the maximal multiplexing gain of the Gaussian MIMO channel with two transmit antennas and two received antennas is equal to two [23]. We see that at high SNR, the AF strategy behaves like a transmission scheme achieving full multiplexing gain in the MIMO channel with two transmit antennas and two received antennas.

Numerical Examples
We compare the information rates achievable by the proposed strategies in Section 4 with the cut-set outer bound in [29]. Since the derivation is straightforward, we state the outer bound without proof. For i, j ∈ {1, 2}, and k ∈ {A, B}, let

Theorem 6 (Outer bound). A rate pair (R A , R B ) is achievable in the TWTR network only if it satisfies
for some real number ρ between 0 and 1, and 2 × 2 covariance matrices K A and K B such that Tr( We select several typical channel realizations and show the corresponding achievable rate regions in Figure 3 to Figure 8. To simplify the calculation, we consider the single antenna case where n = 1. The power constraint is set to P = 1 and the noise variance is set to σ 2 = 1. In Figure 3, we plot the rate regions when all link gains are large (the link gain is 10 for all links). As mentioned in the previous section, the AF strategy has the largest multiplexing gain in the high SNR regime. We can see in Figure 3 that the AF strategy achieves the largest sum rate.
In Figures 4 and 5, we consider the case where relay 1 has larger link gains than relay 2. In Figure 4, the link gains h A1 and h B1 are the same. In this case, HMC dominates all other strategies. In Figure 5, the two link gains, h A1 and h B1 , are not equal. In this case, HLC dominates HMC. HLC performs better in this asymmetric case because of its ability to adjust power between signals and utilize the beamforming gain.
When both relays are close to one of the terminals, PDF has the best performance, as can be seen in Figure 6. The reason is that both relays are able to decode reliably the message from the closer terminal, and then they cooperatively forward the message to the other terminal using MIMO techniques. Figures 7 and 8 presents two scenarios in which DF dominates all other transmission strategies. We remark that DF is quite flexible in that it has many tunable parameters. The case where both h A1 and h B2 are relatively large is shown in Figure 7. Another case where h A1 and h A2 are larger than h B1 and h B2 is shown in Figure 8. In both cases, DF is much better than other strategies.
We can further summarize the numerical results in Table 2. It is not supposed to be a precise description on the  relative merits of the schemes. Instead, it provides a rough guideline for easy selection of a suitable scheme. In the table, "G" refers to "the channel condition is good" and "B" refers to "the channel condition is bad." We say that a channel is good if its link gain is two to three times, or more, than the link gain of a bad channel. When all the link gains are large, we should use AF. In the case when one pair of the opposite links of the network is good, whereas the other pair is weak, DF provides larger throughput. If one of the relays is good but the other relay is bad, HMC or HLC should be used.   PDF scheme is the best one in the scenario where one of the sources has large link gains but the other does not.

Conclusion
We have devised several transmission strategies for the TWTR network, each of which is derived from a mix-andmatch of several basic building blocks, namely, amplifyforward strategy, decode-forward strategy, and physicallayer network coding, and so forth. We can see from the numerical examples that there is no single transmission strategy that can dominate all other strategies under all channel realizations. In other words, transmission strategy should be tailor-made for a given environment. In this paper, we have investigated the pros and cons of different building blocks and demonstrated how they can be used to construct transmission strategies for the TWTR network. We believe that the idea can be applied to other relay networks as well.
While in this paper we only consider the case where there are only two-relays, the ideas of our proposed schemes can be applied to the case with more than two-relays. In particular, AF and PDF can be directly implemented without any change. As for DF, HMC, and HLC, the design may be more complicated, since we have to determine which relay to decode which source's message. On the other hand, the idea behind remains the same.
In our work, we have assumed that the channels are static. When link gains are time varying, our result reveals that a static strategy can only be suboptimal. To fully exploit the available capacity of the network, adaptive strategies that can switch between several modes are needed. How to determine a good strategy based on channel state information is an open problem. It is especially difficult if the switching is based on local information only, and we leave it for future work.

Proof of Theorem 4
The following information-theoretic argument shows that any rate pair (R A , R B ) satisfying the conditions in Theorem 4 is achievable. By (56) and (57), with very high probability the power constraints on node A and node B are satisfied. There is a common codebook for relay 1 and relay 2. We generate an array of codewords with 2 2NRA c rows and 2 2NRB c columns. The codewords have length N and each component is drawn independently from N (0, 1) Each of them is drawn independently with each component generated from N (0, α 1 P 1 ). Let X N 1 (m Ac , m Bc , m Ap ) be the linear combination Since α 1 +α 1 is strictly less than 1, X N 1 (m Ac , m Bc , m Ap ) satisfies the power constraint of node 1 with very high probability.
For relay 2, we generate 2 2N(RB c +RBp+RA c ) codewords, labeled by for m Bp ∈ M Bp , m Bc ∈ M Bc , m Ac ∈ M Ac . The components of each codeword are generated independently from The codeword X N 2 (m Ac , m Bc , m Bp ) satisfies the power constraint of node 2 by the hypothesis that α 2 + α 2 < 1. Decoding: For i = 1, 2, the channel output at relay i is (A.6) The receiver at relay 1 treats the signal component as noise, and tries to decode m Ac , m Bc and m Ap . It reduces to a MAC with two users, but three independent messages; two messages from node A and one message from node B. In order to decode these three messages reliably, we need the requirement in (50). Likewise, we have the requirement in (51) for correct decoding at node 2.
Relay 2 treats the signal component h T A2 H A W A (m Ap )(t) as noise, and tries to decode m Ac , m Bc and m Bp . This can be done with arbitrarily small error if the condition in (51) holds.
In the second stage, terminal A receives where I is the mutual information function. This gives the conditions in (54) and (55).
Similarly, we have the conditions in (52) and (53)

Introduction
The demand for ubiquitous communications has motivated the deployment of a variety of wireless devices and technologies that accommodate ad hoc communications. In large numbers, such devices, despite their different sizes, processing constraints, and levels of affordability, form a Wireless Sensor Network (WSN). The WSN cooperatively monitors the physical world and enables sharing of computing capabilities, bandwidth, and energy resources, offering more integrated and essential information than with any single-sensor node. The WSN is generally built as a hierarchical structure by placing a sparse network of access points connected by a high-bandwidth network within a random homogeneous ad hoc network, in which wireless relay nodes serve exclusively as forwarders [1], as in Figure 1. In addition, the hierarchical sensor network with an access point and a single forwarding node can be modeled as a Multiple Access Relay Channel (MARC), which is a multisource extension of the well-known single-user relay channel [2]. With dedicated relay nodes, cooperative communications [3][4][5][6][7] among WSN exploit the broadcast characteristics and inherent spatial diversity to form a large transmit and/or receive antenna array (also known as Multiple Input Multiple Output, MIMO). Collaborative clusters are able to achieve spatial diversity as well as rate multiplexing by making "negotiations" among neighboring nodes to fully utilize the rich wireless propagation environments across multiple protocol layers and offers numerous opportunities to improve network performance in terms of throughput [2], reliability [8][9][10], longevity, and flexibility. The most important element in cooperative communications is coding protocols responsible for interaction between cooperative nodes. Over the past few years, several coding strategies have been deployed for cooperative communications. Distributed space-time coding was originally proposed for MIMO systems [6]; nevertheless, synchronization among cooperative nodes is the unavoidable problem when the space-time coding strategy is brought into cooperative communications. Lately, as the network grows, traditional relay schemes have become increasingly bandwidth-inefficient. To break through the bandwidth bottleneck, network coding [11]-a technique originally developed for routing in lossless wireline networks-has been recently applied to wireless relay networks. Traditional relaying [12][13][14] entails a loss in spectral efficiency that can be mitigated through network coding in cooperative communications, for its information theoretical scheme and cooperative nature. However, certain fundamental aspects of wireless communication, interference, fading, and mobility make the problem of applying network coding to cooperative communications particularly challenging.
The application of a cooperative network coding strategy is based on the fact that network coding has automatically been associated with cooperative communications as it employs intermediate nodes to combine packets [15][16][17][18][19][20][21][22]. Some approaches with practical advantages have been established to introduce network coding strategies into relay cases. In a two-way relay channel, the relay node combines received messages via network coding and broadcasts them to the opposite sited sources [15,16]. Such a strategy has been demonstrated to reduce the number of time slots required to exchange a packet from 4 to 2, and thus a significant gain in throughput. A recently developed idea based on joint network coding with channel coding or source coding [17,18,21,22] suggests that network coding is a generalization of source coding and channel coding [23]. Effros et al. [19] used network information theory to show that joint design of source, channel, and network coding in end-to-end transmission could yield much better performance, especially for the situation in which source, channel, and network separation between these codes does not hold in underlying networks.
In essence, the contribution of this paper is to employ network coding with additional parity-check bits generated from the two sources' information bits in relay nodes with linear acceptable complexity. The extra parity-check bits are designed as side information to fill up the mutual information gap between Source-Destination and Relay-Destination transmissions and hence approach the MARC "Cut-Set" bound, which is not addressed in most of the previous research works. Specifically, this paper constructs a multidimensional LDPC code to realize the network coding in a cooperative pair of nodes, as the graphical description of LDPC can flexibly bridge distributed processing and can be customized to emulate a random coding scheme of any rate. Although density evolution (DE) has high precision, the resulting increase in the complexity of DE poses a significant challenge to design a multidimensional LDPC decoder. Our work concentrates on practical implementation to present the behaviors of constituent decoders by Extrinsic Information Transfer Charts (EXITs) with a modified Gaussian approximation, which reduces the infinite dimensional problem of tracking densities to a one-dimensional problem of tracking means that is readily addressed with linear programming tools. The remainder of the paper is organized as follows. Section 2 describes a MARC model as well as system settings. In Section 3, we analyze the achievable sum-rate with information theory as the motivation for coding design and propose network-coding cooperative transmit strategy with multidimensional LDPC codes. The work in Section 4 focuses on the optimization of multidimensional LDPC code profile using modified Gaussian approximation and EXIT as a linear-constraint optimization. Finally, simulations are conducted and discussed to demonstrate the effectiveness of the network-coded cooperative strategy.

System Model and Coding Strategy
This section briefly introduces the two-source MARC model used throughout the paper and LDPC code preliminaries as the basis of the paper.

System Model.
To exhaustively describe the network coding strategy, we formulate our system to MARC, a model for network topologies in which multiple sources communicate with a single Destination in the presence of a Relay node. Basically, the system consists of two Sources (S 1 , S 2 ), one Relay (R) and one Destination (D), as in Figure 2. This MARC model has a symmetric positioning of S 1 , S 2 with respect to R and D. The relay moves along the line connecting D with the origin, which is normalized to 1. The distance between S and R is set to d. Path loss is proportional to 1/d 2 . The channels between each node are independent of each other. Perfect global channel knowledge is assumed at all nodes.
Since radio terminals cannot transmit and receive simultaneously in the same frequency band, most cooperative strategies are based on the half-duplex mode [24]. The nodes are allocated orthogonal channels by TDMA. S 1 and S 2 are assumed to send messages with no priority. One block transmission is separated into two consecutive time slots, normalized to t 1 + t 2 = 1. Furthermore, one block length of the source is N (for brevity and clarity, the symbols S 1 and S 2 are equal to and independent of each other) and is further divided into two subblocks with t 1 N and t 2 N-long codewords for two slots' transmissions.
We use X, Y to represent the signals sent and received. In particular, x i j , i, j ∈ {1, 2} denotes the signals sent by S 1 , S 2 . The subscript i identifies S 1 and S 2 , and the subscript j represents the two consecutive channels. x 3 is the signal sent by R, and y r is the signal received by R. The variables y d 1 and y d 2 are signals received by D in consecutive channels. Specifically, in time slot t 1 , S 1 and S 2 broadcast their messages x 11 and x 21 to R and D. In time slot t 2 , R forwards the network-coded message x 3 , and S 1 and S 2 send the messages x 12 and x 22 (new or old) to D, as in Figure 2. The equivalent baseband transmission model is shown in (1): Rayleigh flat fading is adopted to model these links. Specifically, h i j are channel coefficients capturing the effects of path-loss, shadowing, and fading, modeled by independent circularly symmetric complex Gaussian random variables with a mean of zero and a variance of σ 2 i j . Furthermore, w i , i = r, d 1 , and d 2 account for noise and other additive interferences at the receiver, modeled with an independent, zero-mean additive Gaussian white noise with variance σ 2 .

Power Control. The transmit power of each source
, where i = 1, 2, 3 denote S 1 , S 2 , and R, respectively, is constrained by

LDPC Codes.
The cooperative coding scheme adopts LDPC code. A binary LDPC code is represented by a binary sparse parity-check matrix H k×n which connects to a bipartite graph with n variable nodes (corresponding to n columns) and k check nodes (corresponding to k rows). An attractive property of LDPC is that it can be designed graphically by a bipartite graph, which naturally matches the network topology for cooperation. The LDPC code is presented by its variable and check nodes degree distributions (λ(x), ρ(x)), where λ i (ρ i ) represents the fraction of edges connected to a variable (check) node with degree i. The rate of the code is given in terms of (λ(x), ρ(x)):

Parity-Check Network Code Design
There are two particular highlights of our cooperative strategy: one is the cooperative design of side information at the relay node to exactly fill up the gap of mutual information between SR and SD channels (based on the MARC model, relay is in the middle of S 1 , S 2 , and D, and SR thus subject to less path loss than SD channel); the other is the network coding procedure to combine extra check bits for one-slot transmission. Particularly, the insight of the first highlight is to approach the MARC DF "Cut-Set" bound, and the second is to ensure BER QoS as network coding is extended to wireless fading environment. This section will address these two challenges.

Achievable Rates.
This subsection analyzes the paritycheck network coding cooperative strategy, extended from decode-and-forward in MARC using information theory, as a fundamental instruction to develop the coding design as described below.
The key element in the proposed strategy is that the relay node forwards redundant bits as side information for both S 1 and S 2 to D, which is based on the essential idea of "channel coding with side information." This process is a "dual thought" of "source coding with side information" [25,26]. Channel coding with side information is to append some extra check bits to codewords, which is a "binning" process assigning a set of codewords to different bins and enlarging the minimum distance between them. At the receiver, the side information provides an index of the message, and then the decoding process chooses the closest codeword in a box with a specific index. Application of such an idea in a traditional three-node relay network can be found in [8,9]. This paper applies the binning approach to a MARC with network coding. Besides, the extra check bits are generated with the goal of approaching MARC DF capacity. The resulting network-coded strategy is capable of balancing the problem of spatial diversity and multiplexing.
Usually, the informational theoretical view deals with achievable rates. In the MARC scenario, we consider the sum-rate, which conveys more intuition. For decode-andforward strategy in a general multiple source half-duplex relay channel, the bounds on all combinations of the rate tuples for reliable detection at R and D are as follows [1]: The first terms in min(·) of (4) represent the maximum rate at which R can decode the messages x 11  New or old information C t2 S2 old info. ratio is β 1 Figure 3: The cooperative strategy based on parity-check network coding.
the maximum rate at which D can decode x 12 and x 22 in the presence of x 3 . The second terms in min(·) of (4) represent the maximum rate at which D can decode the messages x 11 and x 21 , and the maximum rate at which D can decode all three messages x 12 , x 22 , andx 3 . The cooperative strategy in this study employs network coding in the sense of cooperation between S 1 , S 2 , and R to achieve MARC capacity in (4). The detailed protocol is as follows.
Then, S 1 (S 2 ) broadcasts C t1 S1 (C t1 S2 ) to R and D. D receives the data and waits for decoding at the end of the block transmission.
To achieve maximum throughput, S 1 and S 2 broadcast messages at the sum-rate (5). R is able to decode x 11 , x 21 with an arbitrarily low error probability, since R t1 S1R + R t1 S2R equals the capacity of the SR channels. According to the geometric configuration in Figure 2, intuitively, I(X 11 , X 21 ; Y r ) > I(X 11 , X 21 ; Y d ); the physical channel of SD is more attenuated by the path loss than that of the SR channel. Consequently, although D also receives x 11 and x 21 , it is unable to uniquely decode them and requires extra bits t 1 N(I(X 11 , X 21 ; Y r ) − I(X 11 , X 21 ; Y d )) to make x 11 and x 21 decodable.

In Time Slot t 2 : Relay Node
Operation. R sends these extra bits, t 1 N(I(X 11 , X 21 ; Y r ) − I(X 11 , X 21 ; Y d )), to D at the rate Specifically, after decoding the codewords from S i , R estimates C t1 S1 and C t1 S2 , and cooperatively uses the codewords C t1 S1 and C t1 S2 to generate extra check bits for both S 1 and S 2 , and then combines them with network coding to produce k net = t 1 N(I(X 11 , X 21 ; Y r ) − I(X 11 , X 21 ; Y d1 )) extra check bits. The process is "network coding." For transmission, k net is encapsulated by R's LDPC codeword C t2 R and sent to D. Hence, the extra check bits with S 1 and S 2 codes C t1 S1 and C t1 S2 construct the cooperative multidimensional LDPC code C SR , as illustrated in Figure 3. The elements in the blue rectangle construct the cooperative code C SR with respect to the information in time slot t 1 . The procedure in the red rectangle is the network coding which produces and combines extra bits for both S 1 and S 2 . In particular, k net check bits encapsulated by codeword C t2 R sent to D capture the RD channel's fading characteristics and provide an effective extinct message at D to realize a spatial diversity.
From the perspective of information theory, C SR is cooperatively encoded by S 1 , S 2 , and R on the grounds of coding with side information. "Binning" is performed by extra check network-coded bits (or syndromes) in R's message generated from C t1 Si , i ∈ {1,2} to perform decoding of x 11 , x 21 ∈ {1, 2, . . . , 2 t1NI(X11,X21;Yr ) } by restricting them into 2 t1NI(X11,X21;Yd 1 ) bins of 2 t1N(I(X11,X21;Yr )−I(X11,X21;Yd 1 )) in size each. From Figure 4, the "binning" process of R partitions the space of codewords of S 1 and S 2 , enlarging their minimum distances to make the source's message decodable.

Source Nodes Operations.
In time slot t 2 , each S 1 (S 2 ) sends a message to D independently because R is in the half-duplex mode. According to the channel status, S 1 and S 2 can choose to send new or old information using the independent codebook C t2 S1 (C t2 S2  in the t 2 time slot at the sum-rate inherited from the DF rate region in (4) is Source transmissions in the t 2 slot are isolated from the operation of R, as in Figure 3, which illustrates the operation in time slot t 2 with independent information transmissions by S 1 , S 2 , and R. Thus, we deal with codebook C t2 S1 (C t2 S2 ) as a single LDPC code and choose a suitable LDPC codebook to satisfy the rate constraint in (7).
At the end of one block transmission, D successively decodes C t2 R , C t2 S1 , and C t2 S2 . Then, the extra check bits k net are obtained for joint decoding of C SR with C t1 S1 and C t1 S2 .The network coding cooperative strategy is summarized as follows in Table 1.
In the cooperative protocol mentioned above, MARC DF capacity in (4) is approximated via the rate allocation scheme in (5) through (7). Especially, if I(X 11 , X 21 ; Y r ) > I(X 11 , X 21 ; Y d ), the rate at I(X 11 , X 21 ; Y r ) to transmit information of S 1 and S 2 in time slot t 1 to D will be achieved, resulting in a rate gain by cooperation between S 1 , S 2 , and R.
However, strictly speaking, the network coding performed here is not exactly the same as the network layer coding, which mainly focuses on routing problems and packet-level combination. Here, we borrow the kernel idea of the network layer coding to combine the extra check bits in R, which improves the bandwidth efficiency by R's extra check bits transmitted in one slot for both S 1 and S 2 .

Parameters in the Cooperative
Protocol. The achievable sum-rate is as a function of three parameters: d, t 1 , and β i . The definition of d and t 1 is in Section 2.1. β i is the fraction of S i allocate to the old messages in t 2 . And β 1 = β 2 = β is set for the symmetric geometry. The achievable sum-rate in (4) can be evaluated by these three parameters with the AWGN channel capacity. The outer bound of the sum-rate is the maximum of (8) subject to the value of t i β i , i ∈ {1, 2}.
The received signal-to-noise ratio for R and D is listed with the channel gains as is the input signal-to-noise ratio (SNR), where power P is constrained within 10 dB in both time slots using (2), and σ 2 is the variance of noise at the receivers of R and D, which are assumed to be equal.
The rates of (8) are plotted in Figure 5. Note that, when d is around 0.5, the sum-rate is at its maximum. The function of best β against d is more like a step function. When R is physically closer to the source d < 0.3, β = 1 is optimal, which means that the old information takes up all source transmissions in time slot t 2 . This could be attributed to a path loss of the RD channel, and so S 1 and S 2 send the same information again to fill up the gap. The other extreme is when R is physically closer to D, d > 0.7, β = 0 which means that sources send new information in time slot t 2 . However, R must successfully decode the source's information.
To obtain the time partition factor t 1 , the sum-rate in (7) can be used to calculate t by manipulating the sum-rate at the corner point of the capacity region, which means that the two terms of min(·) in (7) are equal. In Figure 5, we evaluate t by setting the two terms mentioned above to be equal, with different d. When d = 0.5 and t 1 = 0.7, the MARC DF sum-rate achieves its maximum value, which means that R's transmission takes up t 2 = 0.3 slot, resulting in a free degree of each source of 0.7/2 + 0.3 = 0.65.

Cooperative Design Framework.
This subsection depicts the network-coded cooperative framework to realize above achievable rates. Specifically, the layered structure is constructed with multidimensional LDPC constituent codes corresponding to S 1 and S 2 , as in Figure 6. This coding strategy is based on a half-duplex TDD mode, so that the  1 Broadcast C t1 S1 and C t1

S2
Receives C t1 S1 and C t1 S2 and stores them for deocoding.
t 2 Send C t2 S1 and C t2 S2 to D.
operation of R only cooperates with the source transmissions in time slot t 1 . In time slot t 2 , S 1 , S 2 , and R send their information independently.
The cooperative codeword C SR 's parity-check matrix H SR is constructed with three LDPC constituent codes as in Figure 6, including sub-LDPC parity-check matrices H S1 and H S2 , and the network code parity-check matrix H net . H S1 (H S2 ) is employed by S 1 (S 2 ) to encode the message x 11 , (x 21 ) locally; thus, H S1 (H S2 ) is a complete parity-check matrix. H S1 (H S2 ) has n 1 (n 2 ) variable nodes and k 1 (k 2 ) check nodes. The sources' codeword C t1 S1 (C t1 S2 ) is enforced to satisfy k 1 (k 2 ) check bits.
In addition, parity-checks k 1 and k 2 do not interact with each other or do not check each others' variable nodes since the independent sources S 1 and S 2 cannot produce checks for unknown information bits.
The extra check nodes k net have the same variable nodes as the check nodes of S 1 and S 2 ; otherwise, they cannot provide any checks for the codewords of S 1 and S 2 . Therefore, in H net , the variable nodes n 1 and n 2 are sequentially arranged as information bits and are enforced to satisfy k net check bits. Hence, H net has k net rows and (n 1 + n 2 + k net ) columns, as H net : k net ×(n 1 +n 2 +k net ). Above all, the network coding procedure uses H net to merge the extra check bits.
Random linear codes are capacity-approaching for the Gaussian channel under maximum likelihood decoding. Therefore, the extra checks are randomly connected to the set of variable nodes n 1 + n 2 in H net . However, if H net is constructed in a completely random way, encoder and decoder implementations become very difficult as the code size grows due to the pseudorandom interconnection and the large memory required. Structured LDPC codes would be a good option to facilitate implementation without compromising performance. Therefore, H net is constructed in the partial dual-diagonal form so that most parity check bits can be obtained via back-substitution. Partial dualdiagonal form is merely in the k net portion, as illustrated in Figure 6, and the remainders are still randomly constructed.
Linear-time encoding can be achieved by using the neartriangular parity portion. The extra check bits b 1 ,b 2 , . . . , b knet are generated by a direct encoding procedure, as follows: The addition of the above equations is in a binary field; b 0 is an additional variable used to calculate extra check bits b 1 ,b 2 , . . . , b knet .
As mentioned above, the cooperative LDPC code C SR is satisfied by the parity-check constraints as Moreover, once k net extra check bits are obtained via optimization cooperatively conducted with H S1 and H S2 , the quasidiagonal part of H net is determined. Hence the paritycheck matrix H SR can be simplified to H SR by removing the columns of the quasidiagonal part, and the optimization is then performed on H SR instead.
In H SR , variable nodes have two types of checks: their own checks and extra checks offered by network coding.  Accordingly, each variable node in H SR has two types of variable node degrees, expressed by λ SR i, j : sub-LDPC degree (in H S1 or H S2 ) i, i ≥2, and extra degree j, j ≥ 0 (in H net ). Assuming that 0 < η 1 (η 2 ) < 1 is the ratio of the edges in H S1 (H S2 ) to the edges in H SR , the variable node degree distributions γ S1 (x) (γ S2 (x)) of H S1 (H S2 ) in terms of λ SR i, j are γ S1 The relationship of (12) is used for cooperative code profile optimization in next section.
Then, we will give the kernel constraint of the cooperative design, which determines how the extra check bits are connected to the variable node set in the cooperative code C SR . Since the extra checks are appended to the sub-LDPC codes C t1 S1 and C t1 S2 , which have the same set of variable nodes as C SR , the degree of C SR variable nodes turns out to be greater than that of the same set of variable nodes in sub-LDPC codes C t1 S1 and C t1 S2 . However, due to the random construction, the extra checks connected to one specific variable node cannot be determined; in other words, it is impossible to list exactly which variable node receives the extra checks. Under this circumstance, we derive the relationship between C t1 S1 , C t1 S2 , and C SR in terms of variable nodes' number with respect to a specific degree i, denoted by N i = (λ i /i) · E, where E is the total number of edges of the parity-check matrix concerned. Theorem 1. If the cooperative code C SR has a maximum degree d v,SR and a total number of edges E SR and, similarly, two sub-LDPC codes C t1 S1 and C t1 S2 have maximum degrees d v,S1 and d v,S2 and total edges E S1 and E S2 , respectively, then one has the following relationships: The proof of Theorem 1 is in the appendix. Theorem 1 ensures that the network-coded messages from the relay node as the extra check bits independently sent through the fading channel offer spatial diversity gain to the cooperative strategy. And the number of extra check bits is determined by

Cooperative Code Profile Optimization
Next, the challenge to the construction of H SR lies in finding the optimal code profile of C SR , including optimal profiles of sub-LDPC constituent codes C t1 S1 , C t1 S2 together with extra check bits.
In engineering, optimization has always been a difficult problem due to its computational complexity, particularly for cost-constraint hardware. Therefore, to restrict our optimization algorithms to a linear programming is the mainly interest in this section. We will use Gaussian approximation and Extrinsic Information Transfer (EXIT) charts as the linear programming tool to obtain a C SR code profile in a cooperative framework illustrated in Figure 6.

EURASIP Journal on Wireless Communications and Networking
Generally, optimization of LDPC code profile can be done in two different ways. One is to fix noise variance and maximize information transmit rate to search for the optimal degree distributions (λ(x), ρ(x)). The other is to fix the rate to find the (λ(x), ρ(x)) that yields the largest noise threshold. The cooperative strategy discussed in this paper prefers bandwidth efficiency to noise threshold. Information transmit rate seems straightforward, which is defined as the ratio of information bits sent by sources to all bits transmitted for the concerned message (source messages in time slot t 1 ), and thus, Equation (15) can be expressed by the degree distribution as The optimization algorithm maximizes rate R SR to obtain the degree distribution of H SR .
It is difficult to obtain (λ(x), ρ(x)) in one operational procedure, and so we fix ρ(x) to get λ(x) and then get ρ(x) with fixed λ(x), given the maximum number of iterations. With a constant ρ(x), the maximizing rate is equivalent to maximizing i≥2, j≥0 λ SR i, j /(i + j). EXIT [27] provides a computationally simple tool for predicting the asymptotic convergence behavior of iterative coding schemes by tracking trajectories of extrinsic information exchange between variable nodes and check nodes in the bipartite graph. Furthermore, operations of variable and check nodes are referred to the variable-node decoder (VND) and check-node decoder (CND), respectively. We also use mutual information as the surrogate to analyze and optimize LDPC codes by matching the EXIT functions with the constituent decoders (VND, CND) based on the area property of the functions. Figure 7 illustrates iterative joint decoding of VND and CND in H SR . Specifically, C t1 S1 and C t1 S2 are received by D at t 1 time slot, while C t2 R is received by D at t 2 time slot. Channel S 1 D captures its own fading factor via C t1 S1 ; channel S 2 D captures its own fading factor via C t1 S2 ; channel RD captures its own fading factor via C t2 R . These three codewords are used to cooperative decode x 11 and x 21 . Each sub-LDPC code is related to a coupling of a VND-CND decoder. The network code plays a role as the interleaving function of the two CNDs with extra extrinsic information.
EXIT charts compute two curves, the VND curve and the CND curve, corresponding to the steps of each decoder's density evolution. With the VND curve, I A is interpreted as the mutual information between the VND "input" LLR message and the transmitted symbol of the check node at iteration l. I E is interpreted as the mutual information between the VND "output" LLR message and the transmitted symbol of the variable node at iteration l. With the CND function, the interpretations of I E and I A are opposite.
Gaussian approximation is an effective way to track the means of the log likelihood ratio (LLR) message, which is assumed to be symmetrically Gaussian distributed [28].
Even with an irregular LDPC code [27], the Gaussian approximation can still be precise after a few modifications; that is, the distribution of the variable node LLR message is a mixture of Gaussian approximations, and the corresponding VND EXIT function is where J(x) is defined by The corresponding CND EXIT function is where ϕ i (x) is defined by The decoding process is expected to converge progressively after each decoding iteration. Therefore, we require I Ev (I Ec (I A )) > I A for all I A ∈ [0, 1] to ensure successful decoding. This is equivalent to I Ev (I A ) > I −1 Ec (I A ). The decoding process is thus predicted to converge if and only if the VND curve is strictly greater than the reversed-axis CND curve.
Next we will formulate the constraints to fulfill the optimization and obtain code profiles of C SR , C t1 S1 , and C t1 S2 : (λ SR i, j , ρ SR ), (γ S1 , ρ S1 ), and (γ S2 , ρ S2 ). In this paper, for simplicity but still revealing the insights of the cooperative design, we let node S 1 and S 2 completely symmetric, that is, C t1 S1 and C t1 S2 are equal, and thus in simulations we can treat them as one LDPC code C t1 S1 (or C t1 S2 ). First, (13) in Section 3 is the kernel constraint. Specifically, if S 1 and S 2 are completely symmetric, which means that C t1 S1 and C t1 S2 have an equal number of extra checks and an equal number of bipartite graph edges, that is,  Equation (21) poses a constraint to ensure the cooperative design of C SR together with C t1 S1 and C t1 S2 . Moreover, the rate constraints are imposed to further restrict cooperative design. Cooperative C SR has more check bits than both sources; so the cooperative code rate should be lower than any rate of S 1 and S 2 : As mentioned above, using EXIT, the VND curve must be strictly greater than the reversed-axis CND curve to ensure the convergence of a propagation decoding algorithm, which requires that all the constituent codes satisfy the condition I Ev (I A ) > I −1 Ec (I A ), with additional irregular LDPC modification for all I belonging to a discrete, fine grid over (0, 1): γ S1 i I S1 EV ,i I AV , I ch > I −1 S1 EC I AC + δ, Clearly, the degree distribution λ(x) of a complete paritycheck matrix sums to "1" wherever it occurs in H S1 and H S2 or in H SR : The above four constraints (21)-(24) formulate the cooperative design and optimization and maintain the linear features of the variable node degree distribution λ(x). Linear optimization with respect to λ(x) yields a good C SR profile λ(x) with a fixed and concentrated check node degree ρ(x). Meanwhile, fixing the variable node degree distribution λ(x), similar optimization principles hold for the check node degree distribution ρ(x).

Simulations and Results
This section validates the performance of parity-check network coding in MARC via numerical simulations. These simulations focus on two goals: (1) demonstrating that the cooperative framework produces a good cooperative code C SR profile as well as a single code C t1 S1 or C t1 S2 profile and (2) investigating the BER performance of the cooperative code C SR under different channel settings compared with C t1 S1 or C t1 S2 .

EXIT Chats of Code Profile.
EXIT charts of C SR , C t1 S1 (or C t1 S2 ) are shown in Figure 8. With the rate R t1 S1R = 0.5 (R t1 S2R = 0.5) and the SNR = 1.2 dB, the EXIT curve of VND and CND is obtained. The curve of CND is strictly lower than the reversed-axis VND curve. Figure 8(b) draws the EXIT chart of the cooperative code C SR subject to the EXIT chart of C t1 S1 (or C t1 S2 ) in Figure 8(a) with the linear optimization algorithm mentioned in Section 4. The curves of VND and CND approach asymptotically as the code rate increases. However, a comparison of the two subfigures shows that the gap between the VND and CND curves of C SR is greater than that of C t1 S1 (or C t1 S2 ) because more check bits work on the same set of variable nodes for code C SR . Table 2 lists the optimal degree distributions at the C t1 S1 (or C t1 S2 ) code rates of 0.3, 0.4, 0.5, and 0.6. In each column of the rate, C t1 S1 (or C t1 S2 ) code profile is in the left subcolumn, while the cooperative C SR code profile is in the right subcolumn. It is obvious that the distributions satisfy the constraints of (21)- (24); this is especially true for the cooperative design constraints. Moreover, Figure 9 shows the maximum of C SR transmit rates and related C t1 S1 (or C t1 S2 ) transmit rates obtained by the linear optimization algorithm in Section 4. Here, we assume that the S 1 and S 2 have the same transmit rate (this is not necessary). We also plot the MARC decode-andforward "Cut-Set" capacity in the same figure to compare the proposed parity-check network coding strategy. These results show that the cooperative strategy achievable rate is approximately 0.5 dB below the MARC capacity. And for better illustration the cooperative strategy, the direct link transmission capacity, that is, I(X 11 , X 21 ; Y d1 ) without relay is also plotted.

BER Performance.
Next, we will use an optimized C SR code profile (λ(x), ρ(x)) in Table 2 to analyze performance in terms of Bit Error Rate (BER) in the AWGN and Rayleigh fading channels, respectively. In simulations, the soft decision information from the demodulator is input into the decoder. The parameters used in simulations are listed in Table 3. The time partition parameter t 1 = 0.7 (obtained as in Section 5.2) is chosen to maximize the network coding capacity of the DF MARC model. Codeword length is 10 4 . In a decode-and-forward cooperative strategy, R needs to decode information from sources correctly. This requires the entire codeword to be correctly transmitted. Therefore, codes should have excellent frame error ratios (FERs). To ensure the FER performance of LDPC codes, small circles in the parity-check matrix must be removed. Then, parity-check matrices of C SR , C t1 S1 , and C t1 S2 are randomly constructed by λ(x) and ρ(x)), respectively. Accordingly, the girth of length 4 in the bipartite graph has been detected and removed. Figure 10 shows the BER curve against the SNR at the different C t1 S1 (or C t1 S2 ) code rate. Obviously, with the help of the cooperative mechanism, the result has a great improvement of performance on BER, because R is near to D and provides almost a 1.2 dB increase in spatial diversity in low SNR in the AWGN channel at a C t1 S1 (or C t1 S2 ) rate of 0.5. In the AWGN channel at a C t1 S1 (or C t1 S2 ) rate of 2/3, the performance still improves by 1 dB. In such  circumstances, the direct link between S and D cannot offer a service-satisfied physical layer QoS transmission, but with the cooperative relaying, the transmission will be employed again. The simulations demonstrate that this cooperative strategy has improved reliability, especially for the cases in low SNR. In the Rayleigh channel as shown in Figures 11 and  12, the average gain in diversity is larger than 2 dB. Two kinds of modulation schemes are plotted to compare the performance: BPSK and QPSK. The BPSK scheme is shown to have a lower BER performance than the QPSK. Besides, the presented network-coded cooperative strategy has better BER performance under fast fading channel than that under slow fading channel. And it is concluded that in fast fading or mobile environment, the employment of a relay node indeed could provide effective spatial diversity.
We also intend to investigate the effects of the relay position factor d on BER performance. BER curves with Bit erorr rate C t1 S1 and C t1 S2 BER C SR BER C t1 S1 and C t1 S2 BER C SR BER Fast fading C t1 S1 and C t1 S2 BER Fast fading C SR BER Slow fading C t1 S1 and C t1 S2 BER Slow fading C SR BER BPSK Figure 11: The BER performance under the Rayleigh Fading channel with BPSK, d = 0.5, and R t1 S1R = R t1 S2R = 0.5. plotted in Figure 13. The comparisons show that increasing d increases the performance of BER versus SNR. This is because the path loss of the RD channel decreases, which is easier to decode C t2 R , resulting in decoding of the extra check bits with a lower error rate. However, from Figure 5 in Section 3.1, when d = 0.5, the achievable sum-rate is optimal; the slope of the curve with d > 0.5 is larger than that of the curve with d < 0.5. In other words, as d = 0.8, Bit erorr rate Fast fading C t1 S1 and C t1 S2 BER Fast fading C SR BER Slow fading C t1 S1 and C t1 S2 BER Slow fading C SR BER QPSK Figure 12: The BER performance under the Rayleigh Fading channel with QPSK, d = 0.5, and R t1 S1R = R t1 S2R = 0.5. Bit erorr rate the achievable sum-rate is less than that as d = 0.2. Likewise, we also give the BER performance with different settings of the relay position factor d under Rayleigh Slow Fading and Rayleigh Fast Fading channels in Figure 14 and Figure 15. As a result, the spatial diversity and multiplexing can be balanced by the factor d in the parity-check network coding cooperative strategy.
Rayleigh slow fading channel Bit erorr rate C t1 S1 and C t1 S2 BER Rayleigh fast fading channel Besides, we also investigate the effects of BER with different numbers of extra check bits under AWGN and Rayleigh Fading channels through Figure 16 to Figure 18. It is valid that the more extra bits are sent, the better the BER performances are, since the rate of cooperative code C SR is reduced. Thereby, the spatial gain obtained by sending more extra check bits is at the cost of throughput of the whole  C t1 S1 and C t1 S2 BER C SR BER k net = 0.5k 1 = 0.5k 2 C SR BER k net = 0.3k 1 = 0.3k 2 C SR BER k net = 0.8k 1 = 0.8k 2 Rayleigh slow fading channel system. As a result, the spatial diversity and multiplexing can be balanced by maximizing the rate of the cooperative strategy to obtain optimal relay position d and optimal extra check bits length. Bit erorr rate Rayleigh fast fading channel

Conclusion
This study investigated a cooperative strategy based on parity-check network coding. The relative performance improvement of the schemes lies in a decode-and-forward strategy at the relay node. In particular, this study has revealed that a successful design should (1) employ the most effective extra check bits to make full use of the information contained in x 3 to help decode the messages from S 1 and S 2 and (2) perform linear network coding with the extra check bits. Specifically, we provide an implementation of paritycheck network coding based on layered multidimensional LDPC code and a corresponding belief propagation decoding algorithm. The parity-check network coding for both sources removes the bandwidth loss that occurs in relaying, which is only 0.5 dB from the MARC DF "Cut-Set" capacity, and yet the parity-check bits ensures an attractive spatial diversity of cooperative communication. In the future, we would like to extend the proposed scheme to correlated multiple source nodes and conduct further research on network coding in GF(q) fields. has two parts: N d, j=0 , the number of degree d without extra checks in C t1 S1 (or C t1 S2 ); N i<d, j / = 0,i+ j=d , the number of turning into degree d after extra checks added in C t1 S1 (or C t1 S2 ). For the maximum degree d v,SR in cooperative C SR , let d v,S = max(d v,S1 , d v,S2 ) and d v,SR ≥ d v,S . N dv,S,j=0 represents the number of variable nodes in C t1 S1 (or C t1 S2 ). Clearly, for the maximum degree d v,SR , Hence, N dv,SR ≥ N dv,S,j=0 is tenable, and based on Next, considering degree d v,SR −1, the number of variable nodes with degrees larger than d v,SR − 1 is where (N dv,S, j=0 + N dv,S−1, j=0 ) is the number of variable nodes with degrees larger than d v,SR − 1 in C t1 S1 (or C t1 S2 ). Thus, based Then, for all degrees in the descending sequence in C SR , it is confirmed that Using the expression in terms of degree Therefore, the relationships of (13) hold under the cooperative constructions.

Introduction
Recently, more research attention has been directed towards wireless sensor networks. Once deployed, sensors are expected to operate for extended periods of time, and it is impractical to physically reach all sensors. However, it is quite often necessary to update the software running on those sensors or add new functionality to the sensors [1][2][3]. Reprogramming the network needs to reliably disseminate large data objects (50-100 KB) to every sensor in the network with energy efficiency [2].
Protocols for reliably disseminating large data objects in WSNs have been developed over years. Protocols in [1][2][3][4] achieve data dissemination reliability through different mechanisms such as hop-by-hop recovery, NACKs or ACKs mechanisms, while another requirement of disseminating large objects in WSNs, energy efficiency, has not been well studied.
In WSNs, energy consumption is a critical issue and sleep scheduling has been well studied as a conservative approach to minimize the energy consumption due to idle listening [5,6]. Though sleep scheduling can save energy, sensors in sleep mode cannot receive data packets. In addition, due to the unreliability of wireless communication, a sensor may not receive the packet successfully even when it is in active mode [7]. Hence, a data packet may be transmitted several times in order to be disseminated to all sensors, which wastes energy and increases the delay of the whole data dissemination process. In other words, the data dissemination process consists of sending native data packets and recovering "wanted" packets that each sensor has not received due to sleep scheduling and/or link unreliability. In order to complete the data dissemination process in a timely manner and achieve energy efficiency, it is crucial to assure that the maximum number of "wanted" packets at all sensors can be recovered at each time slot.
Recently, network coding has become a promising approach to improve the system throughput in wireless networks. Network coding with XORs operation in wireless broadcast has been studied in [8], which shows the advantage of the proposed XORs coding scheme over the traditional wireless broadcast in the bandwidth efficiency through simulations and theoretical analysis. In XORs coding, a coded packet carries both the coding vector information and the encoded data. Thus, upon receiving a coded packet, the receiver knows which packets are encoded together and how to decode the packet with the available packets at the receiver. The work in [9] has proved that optimal XORs encoding decision for wireless broadcast, which decides the coding vector of each coded packet, is an NP-hard problem. Heuristic algorithms of encoding decision problem for wireless broadcast and multicast are proposed in [9,10]. However, the proposed encoding decision approach can only be applied to the scenario where all receivers remain active during the whole time period of recovery. Such an approach can not be applied to WSNs with sleep scheduling because different sets of active sensors may be available at different time slots.
In this paper, given the sleep scheduling information at the sensors, we aim to determine an effective XORs encoding strategy such that the minimum number of transmissions is required in order for each sensor in the network to successfully receive the whole set of disseminated data packets. Thus, energy consumption can be reduced and the data dissemination process can be accomplished in a timely manner. To achieve such an objective, it is important to maximize the expected number of active sensors that can decode out one "wanted" data packet at each time slot in the recovery process, which is the focus of this paper. The contribution of the proposed work is summarized as follows.
(i) The proposed work takes both link unreliability and sleep scheduling into consideration and proposes an XORs encoding decision algorithm to maximize the expected number of active sensors that can decode out one native packet in their "wanted" data packet sets at each time slot in the recovery process.
(ii) We analyze the impact of each link's packet loss probability and each sensor's sleep probability at each time slot on the network coding gain, which is an extension of the analysis given in [8].
(iii) We also study the effectiveness of sleep scheduling on energy saving, which is offsetted by the total number of active time slots consumed in the data dissemination process. A threshold is derived to decide whether the current sleep scheduling is effective on energy saving or not. The simulation results also confirm the accuracy of our analysis.
The rest of the paper is organized as follows. Related work is reviewed in Section 2. Section 3 introduces the system architecture and data dissemination schemes. The problem description and its complexity is presented in Section 4. Section 5 describes the algorithm design. Theoretical analysis is given in Section 6. Section 7 gives the simulation results. Finally, we conclude the paper in Section 8.

Related Work
In this section, we review the related work of network coding in WSNs. Network coding is originally proposed in information theory [11] and recently has become a promising approach to improve the system throughput in wireless networks [11][12][13][14][15][16]. Adaptive network coding is proposed in [17] to reduce traffic in the process of software updates where linear network coding technique is used. As computation ability and the memory at sensor nodes are very limited, the complexity of linear encoding and decoding introduces extra overhead. Thus, it is more appropriate to use XORs operation in WSNs since both encoding and decoding operations are much simpler. In fact, XORs coding has been widely used in wireless networks to reduce the complexity of linear network coding [8,10,18,19]. COPE proposed in [18] improves the throughput of unicast with XORs coding. By exploiting the broadcast nature of wireless medium, each node buffers overheard packets for a short time and notifies its neighbors which packets it has heard. When a node transmits a packet, it uses its knowledge of what its neighbors have heard to perform opportunistic coding and XORs multiple packets to transmit them as a single packet while ensuring that each intended next-hop has enough information to decode the encoded packet.
Network coding with XORs operation in wireless broadcast has also been studied in [8], which shows the advantage of the proposed network coding scheme over traditional wireless broadcast in bandwidth efficiency through simulations and theoretical analysis. However, encoding decision has not been given in [8]. The work in [9] has proved that optimal XORs encoding decision problem for wireless broadcast is an NP-hard problem.
Several heuristic algorithms for encoding decision in wireless broadcast and multicast have been proposed in [9,10]. With the knowledge of the "wanted" packet set at each receiver, an auxiliary graph is constructed. The encoding decision during the recovery process is then converted to a clique partition problem in the auxiliary graph. However, the proposed encoding decision algorithms can only be applied to the scenario where all receivers remain active during the time period of recovery. Such an approach cannot be applied to WSNs since different set of active sensors may be available at different time slots. Thus, encoding decision in WSNs with sleep scheduling cannot be converted into finding a minimum clique partition in the graph.
The work in [20] proposes a retransmission scheme, which only uses reception estimation to determine the coding set selection. However, the reception estimation at the source node may not be accurate enough, consequently, some receivers may not be able to decode useful information from the coded packet and more retransmissions will be needed. In addition, the coding decision based on reception estimation does not consider the impact of sleep scheduling, which affects the decoding probability at the receivers in low duty-cycled WSNs.
Cluster head Member sensor Figure 1: Hierarchical architecture.
In this paper, we propose to use XORs coding in data dissemination in a large scale WSN which is organized as a multihop cluster hierarchy [21]. A multihop cluster hierarchical architecture consists of multiple layers as shown in Figure 1. In the lowest layer, all the nodes in the network are grouped into clusters. In addition, besides being a member in a cluster, a node may act as a cluster head in a down layer cluster, for example, p 2 in the figure. Within each cluster, the cluster head communicates with its member sensors in a one-hop fashion [22]. We also assume that each sensor is aware of its one-hop neighbors' sleep scheduling and the reliability of the wireless links between the sensor to its neighbors. This can be easily accomplished by one-hop information exchange and link loss inference [23].

System Architecture and Data Dissemination with Network Coding
Our data dissemination process is conducted at each cluster head so as to make sure that finally all the sensors obtain the updating packets. In a multihop cluster hierarchy, if a cluster head in an intermediate layer starts to transmit the received packet immediately after receiving one fresh packet, the gain of network coding cannot be fully utilized. On the other hand, if a cluster head waits and starts to transmit packets until it receives all packets from the cluster head in the upper layer, it will waste bandwidth and introduce extra delay. In order to achieve the balance between bandwidth efficiency and network coding gain, we propose to use a threshold α to determine when the current cluster head starts to transmit the packets to its member nodes. Specifically, for each cluster head, after obtaining αM fresh native packets, where 0 < α ≤ 1 and M is the number of native packets available at its upper-layer cluster head, it will conduct XORs coding scheme to transmit the packets to its member nodes. In the simulation part, we will study the impact of the threshold α on the delay and energy consumption.
In the rest of the paper, we focus on how a cluster head encodes the packets and transmits them to its member sensors. The coding decision at other cluster heads can use the same approach.
As we mentioned earlier, the data dissemination process consists of sending native data packets and recovering "wanted" packets for each receiver. We now give an example to show that network coding can indeed recover "wanted" packets for all neighbors more efficiently.
Suppose that four packets d 1 , d 2 , d 3 , and d 4 need to be transmitted to sensors p 1 , p 2 , p 3 and p 4 as shown in Figure 2. The sleep scheduling at each receiver is given in Figure 2(a) where 1 denotes that this sensor is active at the current time slot, otherwise, it is in sleep mode. For the sake of simplicity, in this example, we assume that no packet is lost due to unreliable wireless communication, which means that a sensor can receive a packet successfully when it is in active mode. We also assume that an active sensor can only transmit or receive one packet at each time slot [5]. We show that different data dissemination approaches will lead to different finishing time of data dissemination.
(i) Without network coding, 4 native packets will be sent firstly, followed by sending native packets to recover "wanted" packets at sensors. Figure 2(b) gives the "wanted" data packet set at each sensor after 4 native packets are sent out. Without network coding, it will take 10 time slots to finish the data dissemination process as shown in Figure 2(c). (ii) With network coding, 4 native packets can be sent at first followed by sending encoded packets to recover "wanted" packets at sensors. Assume that our coding strategy at each time slot is to maximize the number of active receivers that can decode the encoded packet. For example, at time t 5 , if d 1 ⊕ d 2 is sent, all four receivers can obtain a "wanted" packet by . Eventually, it will take 8 time slots to finish the data dissemination process as shown in Figure 2(d). Under such a data dissemination approach, as all native packets are sent at first, the available packets at sensors are most diversified. Thus, the best network coding gain can be achieved. This, however, means that each sensor needs to buffer all received native packets in order to decode out "wanted" packets, which might not be feasible in a WSN due to limited memories at sensors. (iii) An alternative approach will be to divide the data dissemination process into several batches where in each batch, M native packets are sent followed by the recovering process [24]. Once all M native packets are received by all sensors in the cluster, the cluster head proceeds to transmit the following batch of packets. The data dissemination is accomplished when all batches of packets are obtained by all sensor nodes in the network. In Figure 2(e), we send two native 4 EURASIP Journal on Wireless Communications and Networking t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t 10 · · · 0 1 0 1 1 1 1 1 0 1 · · · 1 0 0 1 1 1 0 1 1 0 · · · 0 1 0 0 1 0 1 1 0 1 · · · 0 1 1 0 1 1 0 0 1 1 · · · (a) Sleep scheduling at sensors Sending native packets:  packets at first, followed by sending encoded packets to recover "wanted" packets of the first batch at sensors, then send the last two native packets followed by sending encoded packets to recover "wanted" packets of the second batch at sensors. It takes 9 time slots to finish the data dissemination process.
We now discuss how the cluster head can maintain "wanted" packet set at each member sensor. After sending out a packet, the cluster head needs to collect the "wanted" packet set at each member sensor. In order to reduce ACKs implosion, only the active receivers that have received a packet at current time slot successfully and can obtain/decode one "wanted" packet from the received packet will send an ACK message to the cluster head. Thus, according to ACKs from receivers, the cluster head can derive the "wanted" packet set for each active receiver.
With the information of "wanted" packet set of each receiver at each time slot in the recovery process, an encoding decision which aims to maximize the expected number of active sensors that can decode out one "wanted" packet at current time slot will be introduced in the following section.

Problem Description and Complexity
In this section, we first describe the encoding decision problem that aims to decide which native packets should be encoded at each time slot t in the recovering process such that the maximum expected number of active sensors at time slot t can decode out one "wanted" native packet. Thus, we limit our discussion to the recovery process of one data dissemination batch in a cluster, which can also be applied to other batches in all other clusters.
Suppose that D = {d 1 , d 2 , . . . , d M } is the set of data packets in a batch which need to be disseminated to all the sensors in a cluster. Let P t = {p i1 , p i2 , . . . , p il } be the set of active member sensors in the cluster at tth time slot At each time slot, the cluster head can obtain its neighbor sensors' "wanted" packet set based on ACKs feedback. Let r i, j be 1 if packet d j is not available at active sensor p i at current time slot where d j ∈ D, otherwise, let it be 0. Let R(p i ) = {d j | r i, j = 1 and p i ∈ P t } be the "wanted" data packet set of active sensor p i at current time slot t as shown in Figure 2(b). Assume that l i is the probability that sensor p i can not successfully receive a packet from the cluster head when p i is in active mode.
Let a j be 1 if native packet d j ∈ D is combined in current encoded packet, otherwise, let it be 0. Let c i, j be 1 if active sensor p i can decode out one "wanted" native packet d j from the current encoded packet where d j ∈ R(p i ), otherwise, let it be 0. Considering unreliable wireless communication, the probability that an active sensor p i can successfully obtain one "wanted" packet at the current time slot is M j=1 c i, j (1−l i ). Thus, at current time slot, the expected number of sensors that can decode out one "wanted" packet is i∈{i|pi∈Pt} M j=1 c i, j (1 − l i ), which needs to be maximized in order to save energy.
Still take Figure 2(d) as an example, after t 4 , the cluster head starts to recover the "wanted" packets at its member sensors. At t 5 , if the cluster head sends an encoded packet d 1 ⊕ d 2 , in an ideal condition where no packet will be lost, active receivers p 1 , p 2 , p 3 , p 4 can decode out one "wanted" packet by d 1 ⊕(d 1 ⊕d 2 ) or d 2 ⊕(d 1 ⊕d 2 ). Assume that l 1 = 0.1, l 2 = 0.2, l 3 = 0.3, l 4 = 0.15 in a practical wireless network where the probability of successfully receiving a packet at p 1 , p 2 , p 3 , and p 4 is 0.9, 0.8, 0.7 and 0.85 respectively due to unreliable wireless communication. Thus, the expected number of active receivers that can decode out one "wanted" packet after receiving the current encoded packet d 1 ⊕ d 2 is 0.9 * 1 + 0.8 * 1 + 0.7 * 1 + 0.85 * 1 = 3.25, which is maximum at the current time slot. Thus, the cluster head will send out d 1 ⊕ d 2 at the current time slot. In this paper, such an encoding decision problem using XORs coding is referred to as network coding based data dissemination (NCDD) problem.

Problem Formulation.
We can formally formulate the NCDD problem at time slot t in the recovery process as follows: In the above formulation, the term of the objective represents the expected number of active receivers that can decode out one "wanted" data packet from the encoded packet at the current time slot. Equations (2) and (6) ensure that each receiver can only decode out at most one "wanted" native packet from the encoded packet. Equations (3) and (4) give two requirements that active receiver p i can decode out one "wanted" packet d j : (1) packet d j is in p i 's "wanted" packet set and d j is participated in the encoded packet; (2) all other combined native packets except d j in the encoded packet have already been successfully received by receiver p i . Equation (5) guarantees that if packet d j is available at all active receivers at current time slot t, d j must not be combined into the encoded packet.

Theorem 1. NCDD problem is NP-hard.
Proof. We prove the theorem by a reduction from MAXI-MUM ONE-IN-THREE SAT problem which is a well known NP-hard problem in the strong sense.
MAXIMUM ONE-IN-THREE SAT: We are given a set U = {u 1 , u 2 , . . . , u M } of M boolean variables and a collection C = {c 1 , c 2 , . . . , c n } of clauses with exactly three literals. Each of these clauses is a boolean formula and it is true if and only if exactly one of its three literals is true. Without loss of generality, we assume that the three literals in c i are {u i1 , u i2 , u i3 }. The objective of MAXIMUM ONE-IN-THREE SAT is to find a truth assignment such that the maximum number of clauses is true. We use OPT s to denote the optimal solution of this problem.
Given an instance of MAXIMUM ONE-IN-THREE SAT, we can construct an instance of the decision version of the NCDD problem in polynomial time as follows. Let there be M data packets needed to be disseminated from the cluster head to n receiver nodes. If u j = 1, packet d j is participated in encoding, otherwise, d j is not participated in encoding. For each clause c i , if u j is a literal of c i , then d j is a "wanted" packet at p i . In other words, each sensor p i has lost exactly three packets and has all other packets. Let the probability that an active sensor can successfully receive a packet be 100%. Then, our objective is to maximize i∈{i|pi∈Pt} M j=1 c i, j For a given encoded packet, p i can decode a new native packet if and only if exactly one native packet in R i is encoded into the new encoded one. The problem is to find an encoding strategy to maximize the number of receivers which can decode out one "wanted" packet from the encoded packet. We use OPT p to refer to the result of this objective.
(i) Suppose that there is a true assignment for MAX-IMUM ONE-IN-THREE SAT with the maximum number of clauses. If c i is true, there must be exactly one true assignment for {u i1 , u i2 , u i3 }. Without loss of generality, we assume that u i2 is true while u i1 , u i3 are both false. According to the construction of the instance, only d i2 is participated in encoding while neither d i1 nor d i3 is participated in encoding. In other words, only one lost packet of p i is participated in encoding and p i has all other packets involved in encoding, thus, p i can decode out one "wanted" native packet d i2 . Therefore, if there is a clause which is true in the MAXIMUM ONE-IN-THREE SAT problem, there must be a receiver which can obtain a "wanted" native packet. Then, we have OPT s ≤ OPT p .
(ii) Suppose that there is an encoding strategy such that the maximum number of receivers can decode the new native packet. Assume that p i can decode a new native packet d i2 from the encoded one. According to the decoding strategy, the other two "wanted" packets d i1 , d i3 must not be encoded into the new one, that is, u i1 , u i3 both have false assignment while u i2 is true.

EURASIP Journal on Wireless Communications and Networking
In this assignment, c i also has a true value. So, we have OPT p ≤ OPT s .
The above analysis shows that OPT p = OPT s . Thus NCDD problem is NP-hard.

Algorithm for NCDD Problem
In this section, we first introduce an auxiliary graph in which each vertex is assigned a weight. We then show that the proposed NCDD problem can be converted into finding a maximum weight clique problem in the auxiliary graph, based on which we develop a heuristic algorithm for the NCDD problem.

Model Design.
At any tth time slot, let R(p i ) ⊆ D be the set of packets "wanted" by p i and H(p i ) ⊆ D be the set of packets received by p i . We can construct an auxiliary graph G(V , E) similar to [9] where V = {v i, j | d j ∈ R(p i ) and p i ∈ P t }, which means that every "wanted" packet of each active sensor has a vertex in G. Considering two receivers p i1 and p i2 , if they have lost the same packet d j , then they can both recover d j if only native packet d j is encoded at current time slot. We use a link e ∈ E between v i1, j and v i2, j to denote such recoverability. If d j1 is a "wanted" packet of p i1 and d j1 ∈ H(p i2 ), while d j2 is a "wanted" packet of p i2 and d j2 ∈ H(p i1 ), then p i1 can recover d j1 when it receives d j1 ⊕ d j2 and p i2 can recover d j2 when it receives d j1 ⊕ d j2 . We use a link e ∈ E between v i1, j1 and v i2, j2 to denote such recoverability. In other words, j1 , v i2, j2 , . . . , v ik, jk } in the graph, let P = {p i |v i, j ∈ Q, 1 ≤ j ≤ M} be the sensors which have "wanted" packets in Q and D = {d j | v i, j ∈ Q, 1 ≤ i ≤ n} be the set of "wanted" packets of those sensors in Q. Suppose that there are m packets in D . For any vertex v i, j ∈ Q, according to the edge assignment of G, p i must have already successfully obtained the packets in D − {v j } but still requires packet v j . Thus, if d j1 ⊕ d j2 ⊕ · · · ⊕ d j m where d j1 , d j2 , . . . , d j m ∈ D are encoded and sent at tth time slot, each sensor in P will be able to decode out one "wanted" packet if the encoded packet can be successfully received by all sensors in P . To consider the unreliability of wireless communication, we assign weight w i, j = 1 − l i in the vertex v i, j for any j ∈ {j | v i, j ∈ V }. Then the weight for clique Q which is defined in is equivalent to the expected number of active sensors which can successfully decode out one "wanted" packet if all packets in D are encoded together. Thus, our NCDD problem which aims to maximize the expected number of active sensors that can decode out one "wanted" packet is converted into finding a maximum weight clique in graph G. For example, after the whole 4 native packets are sent, the "wanted" packet set in Figure 2(b) can be constructed into Figure 3. Thus, the encoding decision for recovery process at t 5 is then converted into finding a maximum weight clique in such a graph. As shown in Figure 3, the clique that consists of {v 1,1 , v 2,2 , v 3,1 , v 4,1 } is the clique with the maximum weight 0.9 + 0.8 + 0.7 + 0.85 = 3.25. After the encoded packet d 1 ⊕ d 2 is sent, active receivers p 1 , p 2 , p 3 , p 4 can decode out d 1 , d 2 , d 1 , d 1 , respectively, if all sensors successfully receive d 1 ⊕ d 2 .

Algorithm Design.
Assume that the total number of vertices in G(V , E) is N. We first sort all vertices into nonincreasing order according to w i, j . For the example given in Figure 3, vertices in G will be sorted into For the simplicity of presentation, we abuse the notation a little bit and assign a unique id v k for each vertex in G, which uses one-dimensional subscript for vertices in G instead of using two-dimensional subscripts. Correspondingly, we use w k to denote the weight of v k . Thus, for the example given in Figure 3, , v 9 (v 3,4 )}. Without loss of generality, we assume that Let Q i be the clique with maximum weight in the subgraph which only contains vertices of S i = {v i , v i+1 , . . . , v N } and let C(Q i ) be the weight of clique Q i . In other words, Q i represents the maximum weight clique the algorithm has found considering of the subgraph consisting of vertices {v i , v i1 , . . . , v N }. The algorithm starts with i = N and iteratively considers more vertices until all vertices in G are considered. The algorithm stops when Q 1 is found.
When we consider vertex v i−1 , there are two cases. If Q i ∪ {v i−1 } is also a clique, then Q i−1 = Q i ∪{v i−1 } and C(Q i−1 ) = C(Q i )+w i−1 , otherwise, if Q i ∪ {v i−1 } is not a clique, we need to find out a clique Q i−1 that includes v i−1 in the subgraph consisting of Algorithm 1: Maximum weight clique algorithm. and update S i−1 , that is, S i−1 = S i−1 ∩ N(v j ). If S i−1 is still not ∅, we then add another vertex whose index is the smallest in S i−1 into the clique Q i−1 . We repeat this process until there is no vertex in S i−1 , that is, S i−1 = ∅. By comparing the weight of the clique Q i without including v i−1 and the weight of the clique Q i−1 including v i−1 , the clique Q i−1 with maximum weight in the subgraph including vertices in {v i−1 , v i , . . . , v N } is set to be the one with the larger weight. The detail of the algorithm is given in Algorithm 1. After this algorithm, Q 1 gives all vertices in the found maximum weight clique. All native packets involved in Q 1 will be encoded together and be sent out at current time slot.
We now show how to find the maximum weight clique of the graph shown in Figure 3. Assume that Q 2 has been found, which consists of {v 2 , v 4 , v 6 }. Next, we will consider Q 1 . Since {v 1 } ∪ Q 2 is not a clique, we need to find Q 1 which includes vertex v 1 in the subgraph consisting of S 1 = {v 1 , v 2 , . . . , v 9 }. The corresponding steps for finding such Q 1 is given in Algorithm 2 where v k (v i1, j1 ) in V denotes that we use a unique id v k in the algorithm to replace the original vertex v i1, j1 . After Q 1 is found, we compare it with Q 2 which has the weight C(Q 2 ) = 2.55. Since C(Q 1 ) is larger than C(Q 2 ), the clique Q 1 = {v 1 , v 3 , v 5 , v 7 } is the maximum weight clique found in graph G. Vertices in Q 1 indicate that p 1 , p 3 and p 4 lost packet d 1 and p 2 lost packet d 2 . The encoding decision will be to send d 1 ⊕ d 2 .

Analysis
In this section, we firstly analyze the impact of packet loss probability and sleep probability on network coding gain. Then, we derive a threshold to decide whether the current sleep scheduling can save energy compared with no sleep scheduling. We only limit the analysis to one cluster in the multihop cluster hierarchy.

Impact of Packet Loss Probability and Sleep
Probability on Network Coding Gain. Suppose that N a is the number of transmissions that the data dissemination process requires without coding and N b is the number of transmissions required with XORs coding. Assume that the probability that receiver p i is in sleep mode is s i at each time slot, and l i is the probability that receiver p i can not successfully receive a packet even when it is in active mode due to unreliable wireless communication. We have the following two lemmas.
Proof. See Appendix A.

Lemma 2. The total number of transmissions with XORs coding for transmitting sufficient large M packets to n receivers is
Proof. See Appendix B.
With the analytical result of N a and N b , we can define analytical network coding gain as Take two receivers as an example, assume that l 1 = 0.1, l 2 = 0.25, s 1 = 0.15, s 2 = 0.05 and M is sufficient large. According to (8) and (9), we can calculate that N a = 1.6382 M, N b = 1.4035 M. Then, the analytical network coding gain is γ = 0.1433. From Lemmas 1 and 2, we can also obtain the following corollary. Proof. See Appendix C.

EURASIP Journal on Wireless Communications and Networking
Step 1: Step 3: Step 4: Algorithm 2: The steps of finding Q 1 .

Impact of Sleep Probability on Energy Consumption.
Though sleep scheduling can save energy consumption due to idle listening, sensors in sleep mode cannot receive data packets, which imposes retransmission and may consume more energy. If sensor p i is active at tth time slot, we say that tth time slot is an active time slot for sensor p i . We know that only at its active time slot, sensor p i consumes its energy. Thus, we can use the total number of active time slots consumed for the sensors to successfully receive the whole set of packets as the energy consumption for data dissemination. We define a threshold as follows: (1 − s i ) − n l min min i∈{1,2,...,n} where l min = 1 − max i∈{1,2,...,n} {l i }. Then, we have the following lemma.

Lemma 3.
In XORs coding, if ε < 0, the current sleep scheduling can save energy consumed by idle listening; otherwise, the current sleep scheduling has no contribution to energy saving.
Proof. See Appendix D.
Take two receivers with l 1 = 0.23, s 1 = 0.15, l 2 = 0.27, s 2 = 0.18 as an example, according to (11), we have ε > 0. Thus, the energy saving with sleep scheduling is offsetted by more retransmissions. In this case, the cluster head should wake up more sensors. An interesting problem is how to design an optimal sleep scheduling such that energy saving of sleep scheduling will not be offsetted by more retransmission, which is out of the scope of this paper.

Simulation Results
In this section, we demonstrate the effectiveness of our dissemination schemes through simulations using C++ simulator. In our simulations, a multihop cluster hierarchical WSN is randomly generated with the fixed value of the number of sensors if without specification. We group the packets required to send into batches, and each batch has M packets. Recovery process with network coding starts after every M native packets are transmitted. In a cluster, we randomly generate sensor p i 's sleep scheduling according to its sleeping probability s i .
To demonstrate the advantage of our coding scheme, we introduce two baseline algorithms, namely, dissemination without coding algorithm and dissemination with random coding algorithm. Dissemination without coding algorithm randomly transmits a native "wanted" packet at each time slot until all receivers obtain their "wanted" data packets while dissemination with random coding algorithm transmits an XORs packet which is randomly generated at each time slot until all receivers obtain their "wanted" packets.
In the simulation, we are interested in evaluating the performance of our coding schemes from the following perspectives.

Comparison with Different Data Dissemination Schemes.
The effectiveness of our coding scheme for maximizing the expected number of sensors that can obtain one "wanted" packet at one time slot is demonstrated by comparing with dissemination without coding algorithm and dissemination with random coding algorithm.
We evaluate the performance of our algorithms by varying the number of active sensors within a cluster at one time slot in the range of [10,40] for M = 50, and l i = 0.2. As shown in Figure 4, the number of active sensors that can obtain one "wanted" packet by our coding scheme is much more than that by dissemination without coding algorithm and dissemination with random coding algorithm.
For one batch data dissemination process within a cluster, to demonstrate the performance of our coding scheme, the total number of transmissions required is also compared with the other two baseline algorithms: dissemination without coding and dissemination with random coding algorithms. We vary the number of packets needed to be sent in the range of [60, 100] for n = 10, s i = 0.3, l i = 0.2. As shown in Figure 5, the total number of transmissions required in one batch dissemination by our coding scheme is much less than that by dissemination without coding and dissemination with random coding algorithms. Hence, for data dissemination with a large set of packets, our XORs coding scheme can efficiently decrease the number of transmissions required. Thus, more energy can be saved.

Network Coding Gain Comparison with Analytical Results.
We demonstrate the effectiveness of the proposed network coding algorithm by comparing the network coding gain obtained through simulation with the analytical network coding gain.   Figure 6(a), the network coding gain obtained by our simulation follows the same trend as the analytical results. In addition, the maximum network coding gain is achieved when l 1 = l 2 = 1−(1−0.2)(1−0.3) = 0.44 with both our simulation results and analytical results, which verifies Corollary 1. When l 1 = l 2 , most likely the "wanted" packets at one receiver are the packets available at another receiver, thus, coding opportunity is high, which achieves maximum network coding gain.
We also extend the simulation to 10 receivers in a cluster. The loss probability of p 1 is varied along the x-axis for M = 100; s i = 0.2, 1 ≤ i ≤ 5; s i = 0.3, 6 ≤ i ≤ 10 and l 2 = l 3 = l 1 + 0.02, l 4 = l 5 = l 1 + 0.04, l 6 = l 7 = l 1 + 0.06, l 8 = l 1 + 0.08, l 9 = l 10 = l 1 + 0.1. As shown in Figure 6(b), the simulation results are very close to the analytical results. In addition, Figure 6 verifies that network coding indeed can bring gains on reducing the number of transmissions required.
In Figure 7, we vary the sleep probability at sensors, similar results as Figure 6 can be observed and the network coding gain obtained through simulations is quite close to the analytical results.

The Impact of Sleep Scheduling on Energy Saving.
We now study the impact of sleep scheduling on the energy consumption. Our simulation is conducted within one cluster. We use the total number of active time slots consumed to denote the energy consumption in data dissemination process.
Suppose that XORs coding is applied. Let η s be the total number of active time slots consumed for data dissemination with sleep scheduling and η ns be the total number of transmissions for data dissemination without sleep scheduling.  The energy saving in XORs coding with sleep scheduling over that without sleep scheduling is For data dissemination without coding, we can define energy saving with sleep scheduling over that without sleep scheduling in a similar way.
As shown in Figure 8, the simulation results are very close to the analytical results.
For our XORs coding, from the figure, we know that the energy consumption with sleep scheduling is less than that without sleep scheduling when s 2 is less than 0.15. When s 2 = 0.15, the energy consumption with sleep scheduling is equal to that without sleep scheduling. When s 2 is larger than 0.15, sleep scheduling has no contribution to the energy saving, it even incurs more energy consumption than that without sleep scheduling. This interesting result is plausible since when the number of sleep sensors becomes larger, more retransmissions are required, which imposes more energy consumption. In this case, the energy saving with sleep scheduling is offsetted by more retransmissions, which means that the threshold ε > 0 and the cluster head should wake up more sensors to receive packets in order to save energy.

The Impact of Threshold α on the Delay and the Total Number of Transmissions Required.
We now study the impact of threshold α on the delay of the data dissemination process in a multihop cluster hierarchical WSN. The threshold α is varied in the range of [0.2, 1.0] for M = 30, 40, 50. Figure 9 gives the delay required for data dissemination when the number of layers is 5 and 6, respectively. We can see that the delay increases with the threshold α. This is because the cluster heads need to wait more time before they can transmit their available packets to their members with the increasing of α. Thus, the cluster heads in down layers can do nothing for a long time. Specifically, when α = 1, each cluster head cannot transmit its available packets until receiving all M packets. In this case, concurrent transmissions cannot be allowed even if there is no collision between them, which thus increases the delay. From Figure 9, we can also see that the delay increases with the number of layers, because the number of receivers increases with the number of layers.
We further study the impact of the threshold α on the total number of transmissions required under a multihop cluster hierarchical WSN. The threshold α is also varied in the range of [0.2, 1.0] for M = 30, 40, 50. As shown in Figure 10, the total number of transmissions required decreases with the threshold α. When α is small, the cluster heads transmit the packets to their members more quickly. Therefore, the number of fresh packets available at cluster heads is small, which can not fully utilize the network coding gain. Hence, the total number of transmissions required is more than with larger threshold α.

Conclusion
This paper studies data dissemination in wireless sensor networks with network coding to achieve energy efficiency. In order to quickly complete the whole process of data dissemination, at each time slot in the recovery process, we aim to transmit an encoded packet such that the expected number of active sensors that can decode out one "wanted" packet is maximized. A maximum weight clique model is proposed here to achieve such an objective. We further study the impact of packet loss probability and sleep probability on network coding gain. We also analyze the impact of sleep probability on energy saving gain and derive a threshold which can be used to decide whether the current sleep scheduling is effective on energy saving or not. The simulation results verify the work proposed in the paper.
However, in the data dissemination process, receiver sensor p i may be in sleep mode and can not successfully receive a packet. Therefore, the probability that sensor p i can successfully receive the packet at any time slot is (1−l i )(1−s i ). In other words, the probability that sensor p i will lose the packet is 1−(1−l i )(1−s i ). Thus, considering sleep scheduling, the total number of transmissions required without coding is

B. Proof of Lemma 2
From [8], we know that the total number of transmissions with XORs coding to successfully deliver sufficient large M packets to n receivers is M/(1−max i∈{1,2,...,n} {l i }), where each receiver keeps in active mode during the data transmission process. As in Appendix A , the probability that sensor p i can not successfully receive the packet with sleep scheduling is changed into 1 − (1 − l i )(1 − s i ). Thus, the total number    of transmissions required with XORs coding to transmit sufficient large M packets to n receivers is Without loss of generality, suppose that l 1 ≥ l 2 and l 2 = βl 1 , 0 ≤ β ≤ 1. We have Define a function f (β) = γ with β being the variable. We can easily prove that f (β) is an increasing function. Thus, when β is 1, the value of function f (β) is maximum. That is when l 1 = l 2 , the network coding gain γ is maximum, which proves our Corollary 1.

D. Proof of Lemma 3
From the analysis in the previous section, we can see that the total number of active time slots consumed for data dissemination with XORs coding is where s i is the probability that sensor p i is in sleep mode at each time slot. However, if there is no sleep scheduling at sensors, that is s i = 0, the total number of transmissions for disseminating sufficient large M packets to n receivers with XORs coding is Since no sensors are in sleep mode, the total number of active time slots consumed for disseminating packets with XORs coding is From the above formulation, we know that only if η s < η ns , sleep scheduling has contribution to save energy consumed by idle listening, otherwise, the retransmission due to sleep scheduling in sensors imposes more energy consumption. The above η s < η ns changes into That is (1 − s i ) < n l min min i∈{1,2,...,n} Thus, if (D.8) can be satisfied, the current sleep scheduling must have contribution to save energy compared with no sleep scheduling.

Introduction
In this paper, we are concerned with designing feedbackbased adaptive network coding schemes that can deliver high throughputs and low decoding delays in packet erasure networks. We first present some background on existing work and emphasize that the notion of delay and the choice of a suitable network coding strategy are highly entangled with the underlying application.

Motivation and Background.
Consider a broadcast packet-based transmission from one source to many destinations where erasures can occur in the links between the source and destinations. Two main throughput optimal schemes to deal with such erasures are fountain codes [1] and random linear network codes (RLNC) [2]. In the latter scheme, for example, the source transmits random linear mixtures of all the packets to be delivered. It is well-known that if the random coefficients are chosen from a finite field with a sufficiently large size, each coded packet will almost surely become linearly independent of all previously received coded packets and hence, innovative for every destination [2]. The scheme is therefore almost surely throughput optimal. Another benefit of fountain codes and RLNC is that they do not require feedback about erasures in individual links in order to operate.
However in these schemes, throughput optimality comes at the cost of large decoding delays, as the receiver needs, in general, to collect all coded packets in a block before being able to decode. Despite this drawback, there are applications which are insensitive to such delays. Consider, for example, a simple software update (file download). The update only starts to work when the whole file is downloaded. In this case, the main desired properties are throughput optimality and the mean completion time and there is often little or no incentive to aim for partial "premature" decoding. The completion time performance of RLNC for rateless file download applications has been considered in [3]. In [3], the mean completion time of RLNC is shown to be much shorter than scheduling. Reference [4] considers time division duplex systems with large round-trip link latencies and proposes solutions for the number of coded packet 2 EURASIP Journal on Wireless Communications and Networking transmissions before waiting for acknowledgement on the received number of degrees of freedom.
There are applications where partial decoding can crucially influence the end user's experience. Consider, for example, broadcasting a continuous stream of video or audio in live or playback modes. Even though fountain codes and RLNC are throughput optimal, having to wait for the entire coded block to arrive can result in unacceptable delays in the application layer. But, we also note that partial decoding of packets out of their natural temporal order does not necessarily translate into low delivery delays desired by the application layer. The authors in [5,6] have proposed feedback-based throughput-optimal schemes to deal with the transmitter queue size, as well as decoding and delivery delays at the destinations. When the traffic load approaches system capacity, their methods are shown to behave "gracefully" and meet the delay performance benchmark of singlereceiver automatic repeat request (ARQ) schemes.
There is yet another set of applications for which partial decoding is beneficial and can result in lower delays irrespective of the order in which packets are being decoded. Consider, for example, a wireless sensor network in which there is a fusion/command center together with numerous sensors/agents scattered in a region. Each sensor/agent has to execute or process one or more complex commands. Each command and its associated data is dispatched from the center in a packet. For coordination purposes, each agent needs to know its own and other agents' commands. Therefore, commands are broadcast to everyone in the network. In this application, in-order processing/execution of commands may not be a real issue. However, fast command execution may be crucial and therefore, it is imperative that innovative packets arrive and get decoded at the destinations as quickly as possible regardless of their order. As another example, consider emergency operations in a large geographical region where emergency-related updates of the map of the area need to be dispatched to all emergency crew members. In such situations too, updates of different parts of the map can be decoded in any order and still be useful for handling the emergency.
Finally, some applications may be designed in such a way that they are insensitive to in-order delivery. This can be particularly useful where the transport medium is unreliable. In such a case, it may be natural to use multiple-description source coding techniques [7], in which every decoded packet brings new information to the destination, irrespective of its order. In light of the emergency applications described above, one can perform multiple-description coding for map updates, so that updates of different subregions can be divided into multiple packets and each packet can provide an improved view of one region in a truly order-insensitive fashion.

Contributions.
In this paper, we are inspired by the last set of order-insensitive packet delivery applications and hence, focus on designing network coding schemes that, with the help of feedback, can deliver innovative packets in any order to the destination and also guarantee fast decoding of such packets. As a first step towards such goal, we limit ourselves to broadcast erasure channels, but emphasize that the ideas can be extended to other more complicated scenarios. We also consider the class of instantaneously decodable network coding schemes, in which each coded transmission contains at most one new source packet that a receiver has not decoded yet. The rationale is that in an orderinsensitive application, any innovative packet that cannot be decoded immediately incurs a unit of delay. Obviously, one other source of delay is when a coded packet does not contain any new information for a receiver and hence, is not innovative. A similar definition of the decoding delay was first considered in [8], where the authors presented a number of heuristic algorithms to reduce order-insensitive decoding delay. In this context, our main contributions are the following.
(i) In Section 1.1, we have motivated the problem in light of possible applications in sensor and ad hoc networks. To the best of our knowledge, such application-dependent classification of network coding delays did not previously exist in the literature.
(ii) In Section 3.1, we present a systematic framework for the minimization of decoding delay in each transmission subject to the instantaneous decodability constraint. We show that this problem can be cast into a special integer linear programming (ILP) framework, where instantaneously decodable packet transmission corresponds to a set packing problem [9] on an appropriately defined set structure.
(iii) In Section 3.2, we provide a customized and efficient method for finding the optimal solution to the set packing problem (which is in general NP-hard). Our numerical results in Section 6 show that for reasonably sized number of receivers, the optimum solution(s) can be found in a time that is linearly proportional to the total number of packets.
(iv) In Section 4, we discuss decoding delay minimization for an important class of erasure channels with memory, which can occur in wireless communication systems due to deep fades and shadowing [10]. We show that the general set packing framework in Section 3 can be easily modified to account for the erasure memory. Our results in Section 6 reveal that by adapting network coding decisions based on channel erasure conditions, significant improvements in delay are possible compared to when decisions are taken irrespective of channel states.
(v) In Section 5, we provide a number of heuristic variations of the optimal search for finding (possibly suboptimal) solutions faster, if needed. Our results in Section 6 show that such heuristics work very well and often provide solutions that are very close to the search algorithm. Moreover, they improve on the proposed random opportunistic method in [8].

Network Model
Consider a single source that wants to broadcast some data to N receivers, denoted by R i for i = 1, . . . , N. The data to be broadcast is divided into K packets, denoted by m j for j = 1, . . . , K. Time is slotted and the source can transmit one (possibly coded) packet per slot. A packet erasure link L i connects the source to each individual receiver R i . Erasures in different links can be independent or correlated with each other. Different erasures in a single link can be independent (memoryless) or correlated with each other (with memory) over time.
For memoryless erasures, an erasure in link L i can occur with a probability of p e,i in each packet transmission round independent of previous erasures.
For correlated erasures, we consider the well-known Gilbert-Elliott channel (GEC) [11], which is a Markov model with a good and a bad state. If the channel is in the good state, packets can be successfully received, while in the bad state packets are lost (e.g., due to deep fades or shadowing in the channel). The probability of moving from the good state G to the bad state B in link L i is b i Pr(C i, = B | C i, −1 = G) and the probability of moving from the bad state B to the good state G is g i , where is the time slot index. Steady-state probabilities are given by P G,i Pr Following [12], we define the memory content of the GEC in link L i as 0 ≤ μ i = 1 − b i − g i < 1, which signifies the persistence of the channel in remaining in the same state. A small μ means a channel with little memory and a large μ means a channel with large memory.
Before transmission of the next packet, the source collects error-free and delay-free 1-bit feedback from each destination indicating if the packet was successfully received or not. A successful reception generates an acknowledgement (ACK) and an erasure generates a negative acknowledgement (NAK). This feedback is used for optimizing network coding decisions at the source for the next packet transmission round, as described in future sections.
In this work, we consider linear network coding [2] in which coded packets are formed by taking linear combinations of the original source packets. Packets are vectors of fixed size over a finite field F q . The coefficient vector used for linear network coding is sent in the packet header so that each destination can at some point recover the original packets. Since in this paper we are only dealing with instantaneously decodable packet transmission, it suffices to consider linear network coding over F 2 . That is, coded packets are formed using binary XOR of the original source packets. Thus, network coding is performed in a similar manner as in [13].

Definition 1.
A transmitted packet is instantaneously decodable for receiver R i if it is a linear combination of source packets containing at most one source packet that R i has not decoded yet. A scheme is called instantaneously decodable if all transmissions have this property for all receivers.
Definition 2. At the end of transmission round in an instantaneously decodable scheme, the knowledge of receiver R i is the set consisting of all packets that the receiver has decoded so far. The receiver can therefore, compute any linear combination of the packets that it has decoded for decoding future packets.

Definition 3.
In an instantaneously decodable scheme, a coded packet is called non-innovative for receiver R i if it only contains source packets that the receiver has decoded so far. Otherwise, the packet is innovative.

Definition 4.
A scheme is called rate or throughput optimal if all transmissions are innovative for the entire set of receivers.
Definition 5. In time slot , receiver R i experiences one unit of delay if it successfully receives a packet that is either noninnovative or not instantaneously decodable. If we impose instantaneous decodability on the scheme, a delay can only occur if the received packet is not innovative.
Note that in the last definition, we do not count channel inflicted delays due to erasures. The delay only counts "algorithmic" overhead delays when we are not able to provide innovative and instantaneously decodable packets to a receiver.
As an example, if the knowledge of R 1 is {m 1 , m 2 , m 3 }, receiving m 1 ⊕ m 2 will cause R 1 to experience one unit of delay, whereas m 1 ⊕m 2 ⊕m 5 is innovative and instantaneously decodable, hence does not incur any delay.
We note that a packet that is not transmitted yet or transmitted but not received by any receiver can be transmitted in an uncoded manner at any transmission slot without incurring any algorithmic delay. In fact, this is how the transmission starts: by sending m 1 uncoded, for example.
A zero-delay scheme would require all packets to be both innovative and instantaneously decodable to all receivers. Thus zero-delay implies rate optimality, but not vice versa. As the authors show in [8, Theorem 1] for the case of N = 2 and N = 3 receivers, there exists an offline algorithm that is both rate optimal and delay-free. For N ≥ 4 the authors prove that a zero-delay algorithm does not exist. By offline we mean that the algorithm needs to know future realizations of erasures in broadcast links. In contrast, an online algorithm decides on what to send in the next time slot based on the information received in the past and in the current slot. In this paper, we focus on designing online algorithms.

Problem Formulation Based on Integer Linear
Programming. Instantaneous decodability can be naturally cast into the framework of integer optimization. To this end, let us fix the packet transmission round to and consider the knowledge of all receivers, which is also available at the source because of the feedback. The state of the entire system at time index (in terms of packets that are still needed by 4 EURASIP Journal on Wireless Communications and Networking the receivers) can be described by an N × K binary receiverpacket incidence matrix A with elements Columns of matrix A are denoted by a 1 to a K . We assume that packets received by all receivers are removed from the receiver-packet incidence matrix. Hence, A does not contain any all-zero columns.
Example 1. Consider N = 2 receivers and K = 3 packets. Before the transmission begins, the receiver-packet incidence matrix A is an all-one 2 × 3 matrix. If we send packet m 1 in the first transmission round = 1 and assuming that only receiver R 2 successfully receives it, A will become If we send packet m 2 in the next transmission round = 2 and assuming that only receiver R 1 successfully receives it, A will then be The condition of instantaneous decodability means that at any transmission round we cannot choose more than one packet which is still unknown to a receiver R i . In the example above, at = 3, we cannot send m 1 ⊕ m 3 because it contains more than one packet unknown to R 1 . Let x represent a binary decision vector of length K that determines which packets are being coded together. The transmitted packet consists of the binary XOR of the source packets for which x j = 1. More formally, we can define the instantaneous decodability constraint for all receivers as Ax ≤ 1 N , where 1 N represents an all-one vector of length N and the inequality is examined on an element-by-element basis (Note that although x is a binary or Boolean vector, Ax is calculated in real domain. Hence, Ax ≤ 1 N is in fact a pseudo-Boolean constraint.). This condition ensures that a transmitted coded packet contains at most one unknown source packet for each receiver. A vector x is called infeasible if it does not satisfy the instantaneous decodability condition. In other words, x is called infeasible if and only if there exists at least one p for which b p > 1 in A vector x is called a solution if and only if it satisfies Ax ≤ 1 N . In the rest of this paper, "Ax ≤ 1 N " and "x is a solution" are used interchangeably. Now consider sets M 1 , . . . , M K ⊂ {R 1 , . . . , R N }, where M j is the nonempty set of receivers that still need source packet m j . Note that these sets can be easily determined by looking at the columns of matrix A. The "importance" of packet m j can be, for example, taken to be the size of set M j , which is the number of receivers that still need m j .
We now formally describe the optimization procedure that should be performed at the transmitter. Maximizing the number of receivers for which a transmission is innovative, subject to the constraint of instantaneous decodability, can be posed as the following (binary-valued) integer linear program (ILP): where w T = (|M 1 |, . . . , |M K |). This is a standard problem in combinatorial optimization, usually called set packing [9]. Here the universe is the set of all receivers and we need to find disjoint (due to instantaneous decodability condition) subsets M j with the largest total size. In the (most desirable) case when equality holds in Ax ≤ 1 N for every receiver, we also speak of a set partition. This is equivalent to a zero-delay transmission.
In Section 4, we will consider other measures of packet importance and discuss the role of w in tailoring the optimization problem according to the application requirements or channel conditions, such as memory in erasure links.
We assume that elements of w, which signify packet importance, are all positive. If one has already found a solution such as can only result in a w T x 0 = v 0 strictly smaller than v 1 . We say that given solution x 1 , x 0 is clearly suboptimal and hence, can be discarded in an algorithm that searches for the optimal solution(s).

Efficient Search Methods for
Finding the Optimal Solution of (4). It is well known that the set packing problem is NPhard [9]. Here, we present an efficient ILP solver designed to take advantage of the specific problem structure. Later, we will see that for many practical situations of interest, our method performs well empirically. Based on this framework, we will also present some heuristics in Section 5 to deal with more complicated and time-consuming problem instances.
We begin presenting our method by first defining constrained and unconstrained variables. Definition 6. Two binary-valued variables are said to be constrained if they cannot be simultaneously 1 in a solution. Or formally, x i and x j are constrained if for any x satisfying Ax ≤ 1 N , x i + x j ≤ 1 (Again, note that the addition of variables takes place in real domain.). We also say that x j is constrained to x i and vice versa. It can be proven that x i and x j are constrained if and only if there exits at least one row index p in A for which a pi = a p j = 1.

Definition 7.
The set of all variables constrained to x i is called the constrained set of x i and is denoted by C i . That is, If x i and x j are not constrained to each other (x i / ∈ C j and x j / ∈ C i ), then columns a i and a j in A cannot have nonzero elements in the same row position. That is, for each row index p, a pi = 1 ⇒ a p j = 0 and a p j = 1 ⇒ a pi = 0.

Save solution
Combine the solution with previously resolved variables Resolve constraints Most constrained Return [solution(s)] Combine the solution with previously resolved variables Solve (P k−ku−1 ) x s = 0 Figure 1: A schematic of Algorithm 1 with greedy pruning for finding the optimal network coding solution of (4). Note that the algorithm is recursive as it calls P k−ku−ks−1 and P k−ku−1 within itself.

Definition 8.
A variable x i is said to be unconstrained if C i = ∅. The set of all unconstrained variables is denoted by U and is referred to as the unconstrained set.
If x i is an unconstrained variable, then for each row index p, a pi = 1 ⇒ a p j = 0 for all j / = i (otherwise, x i and x j would become constrained).

Example 2. Consider the following receiver-packet incidence matrix
One can easily verify the relations defined above. For example, variables x 1 and x 3 are constrained because for p = 1, a p1 = a p3 = 1. Variables x 1 and x 4 are not constrained to each other because columns a 1 and a 4 do not have a nonzero element in the same row position. Variable x 6 is unconstrained because no other column has a nonzero element in rows 6 or 7. In summary, To design an efficient search algorithm, one needs to efficiently prune the parameter space and reduce the problem size. We make the following observations for pruning of the parameter space.
(1) Unconstrained variables must be set to 1. In other words, setting those variables to 0 does not contribute to the optimal solution (note that the elements in w are positive). In the above example, x 5 and x 6 must be set to 1 because no other variable is constrained to them (we will make this statement formal in the optimality proof of the algorithm in the appendix).
(2) If a constrained variable is set to 1 all members of its constrained set must be set to 0. In the above example, setting x 1 = 1 forces x 2 and x 3 to zero.

EURASIP Journal on Wireless Communications and Networking
(3) At a given step, the parameter space can be pruned most by resolving the variable with the largest constrained set.
Application of the third observation, in a search algorithm results in greedy pruning of the parameter space. We note that greedy pruning is only optimal for a given step of the algorithm and is not guaranteed to result in the optimal reduction of the overall complexity of the search. We now make a final remark before presenting the search algorithm. In particular, we have observed that finding constrained sets for each variable in each step of the algorithm can be somewhat time consuming. A very effective alternative is to first sort matrix A, columnwise, in descending order of the number of 1's in each column. Setting the "most important" head variable x 1 (with the highest |M 1 |) to 1 is likely to result in the largest constrained set (because it potentially overlaps with many other variables) and hence, many variables will be resolved in the next recursion. We will refer to the approach based on finding the largest constrained set as the greedy pruning strategy and to the alterative approach as the sorted pruning search strategy.
The greedy pruning search strategy is shown in Figure 1, which with appropriate modifications can also represent the sorted pruning variation. Let P k denote the problem of size k whose input is an N × k receiver-packet incidence matrix A k and whose output is a set of solutions of the form x of length k which satisfy the instantaneous decodability condition A k x ≤ 1 N . The algorithms can be described as shown in Algorithm 1.
In the appendix, we prove by structural induction that Algorithm 1 is guaranteed to return all optimal solutions of (4). However, we note that not every solution returned by Algorithm 1 is optimal. The nonoptimal solutions can be easily discarded by testing against the objective function (4) at the end of the algorithm. We also note that in Algorithm 1, we can simply remove those packets received by every receiver from the problem. If there are K 0 such variables, we can start step (1) above from k = K − K 0 instead of K. The Matlab code for both the greedy and sorted pruning algorithms can be found at http://users.rsise.anu.edu.au/∼parastoo/netcod/.
We conclude this section by a brief note on the computational complexity of Algorithm 1. Let us denote the number of recursions required to solve the problem of size k by C k . According to Algorithm 1, this problem is always broken into two smaller problems of size k − k u − k s − 1 and k − k u − 1. Therefore, one can find the number of recursions required to solve P k by recursively computing C k = C k−ku−ks−1 + C k−ku−1 . The recursion stops when one reaches a problem of size 1 (only one packet to transmit) where C 1 = 1.

Adaptive Network Coding in the Presence of Erasure Memory
Here, we present a generalization of the set packing approach for coded transmission in erasure channels with memory. The idea is that the importance of a packet m j is no longer determined by how many receivers need m j , but by the probability that m j will be successfully decoded by the receivers that need it. In computing this probability, one can use the fact that successive channel erasures in a link are usually correlated with each other and hence, their history can be used to make predictions about whether a receiver is going to experience erasure or not in the next time slot. To present the idea, we focus on the GEC model for representing channel erasures. More general memory models for erasure can also be incorporated into our framework. We define the reward p i of sending a packet to receiver R i as the probability of successful reception by R i in the next time slot: p i = Pr(C i, = G | C i, −1 ), where C i, −1 is the state of R i in the previous transmission round (Statements like "state of R i " should be interpreted as the state of the physical link L i connecting the source to R i .). The total reward or importance of sending packet m j is then The above weight vector gives higher priority to a packet m j for which there is a higher chance of successful reception, because the receivers that need m j are more likely to be in good state in the next time slot. With this newly defined weight vector, one can try to solve the optimization problem given in (4) under the same instantaneous decodability condition.
Remark 1. We conclude this section by emphasizing that the optimization framework in (4) is very flexible in accommodating other possibilities for the weight vector w, which can be appropriately determined based on the application. For example, instead of allocating the same weight to a packet needed by a subset of receivers, one can allocate different weights to the same packet (looking column-wise at A) depending on the priorities or demands of each user. In the map update example described in the Introduction, different emergency units can adaptively flag to the base station different parts of the map as more or less important depending on their distance from a certain disaster zone. The task of the base station is then to send a packet combination that satisfies the largest total priority. One can also combine user-dependent packet weights with the channel state prediction outcomes in a GEC. One possibility is to multiply the probabilities p i by the receiver priority. It could then turn out that although a receiver is more likely to be in erasure in the next transmission round, it may be served because of a high priority request.

Heuristic Search Algorithms
In Section 3.2, we proposed efficient search algorithms for finding the optimal solution(s) of (4). However, there may be situations where one would like to obtain a (possibly suboptimal) solution much more quickly. This may be the case, for example, when the total number of packets to be transmitted is very large. Therefore, designing efficient heuristic algorithms to complement the optimal search is (1) Start with the original problem of size k = K.
(2) if sorted pruning strategy is desired then (3) Rearrange the variables in A k in descending order of packet importance (number of 1's in each column).
(4) end if (5) Solve (P k ): (6) if k = 1 then (7) Return x 1 = 1 (since the variable is not constrained). (8) else (9) if greedy pruning strategy is desired then (10) Determine the constrained set for all variables x 1 to x k . (11) Denote the index of the variable with the largest constrained set by s and the cardinality of its constrained set by k s . (12) else (13) Determine the constrained set for the head variable x 1 with cardinality k 1 and also the set of unconstrained variables (Note that we have overused index 1 to refer to the head variable in the reordered matrix at each recursion.). Set s = 1. important. In this section, we propose a number of such heuristics.

Heuristic 1-Weight Sorted Heuristic
Algorithm. The idea behind this recursive algorithm is very simple. As in Algorithm 1, we start with the original problem of size k = K. We then rearrange the columns of the matrix A in descending order of |w j | (starting from the packet with the highest weight). Note that this is different from the sorted pruning version of the Algorithm 1, in which the columns of A were sorted in descending order of |M j | to potentially result in large constrained sets. We then set the head variable x 1 = 1 and find its corresponding constrained set C 1 to resolve k 1 = |C 1 | variables that are to be set to zero. We then solve the smaller problem of size P k−k1 and continue until the problem cannot be further reduced. One main difference between Heuristic 1 and Algorithm 1 is that at each recursion, the head variable is only set to one; the other possibility of x 1 = 0 is not pursued at all. In a sense, this heuristic algorithm finds greedy solutions to the problem at each recursion by serving the highest priority packet. In this heuristic algorithm, all k u unconstrained variables are naturally set to 1 in the course of the algorithm. The computational complexity of this method is at worst proportional to K, which can happen when there is no constraint between packets.

Heuristic 2-Search Algorithm 1 with Maximum Recursions/Elapsed Time.
It is possible to terminate the recursive search Algorithm 1 prematurely once it reaches a maximum number of allowed recursions/elapsed time. If the algorithm reaches this value and the search is not complete, it performs a termination procedure whereby it heuristically resolves the remaining unresolved packets in the current incomplete solution. That is, it performs Heuristic 1 on a smaller problem, which is yet to be solved. It then returns the best solution that has been found so far. We note that due the extra termination procedure, the actual number of recursions/elapsed time can be (slightly) higher than the preset value.
Two comments are in order here. Firstly, Algorithm 1 is designed to sort the matrix A based on the number of receivers that need a packet. It only reverts to sorting the unresolved variables based on the vector w in the termination process. Secondly, if the maximum number of recursions is set to one, Algorithm 1 just performs the termination process and becomes identical to Heuristic 1.

Heuristic 3-Dynamic Number of Recursions.
This heuristic is based on Heuristic 2, where we dynamically increase the number of allowed recursions as needed. At each transmission round, we start with only one allowed 8 EURASIP Journal on Wireless Communications and Networking recursion (effectively run Heuristic 1). If the throughput (Let Q ⊂ {1, . . . , N} denote the index of receivers that still need at least one packet and R Q denote such receivers. The achieved throughput at time slot is defined as w T x/ f (R Q ), where x is the found solution and f (R Q ) is an appropriate function of receivers' needs. For memoryless erasures f (R Q ) = |R Q | and for GEC's f (R Q ) = q∈Q |p q | (refer to Section 4 and (7)).) is higher than a desired value, there is no need to proceed any further. Otherwise, we can gradually increase the number of recursions by an appropriate step size. This heuristic stops when it either reaches the maximum allowed recursions or when increasing the number of recursions does not result in a noticeable improvement in the throughput.

Numerical Results and Secondary Coding Considerations
We start this section by presenting end-to-end decoding delay results for memoryless erasure channels. We then specialize to erasure channels with memory. The end-to-end problem is the complete transmission of K packets. End-to-end decoding delay of a receiver is the sum of decoding delays for the receiver in each transmission step. In the following, when we say "the delay performance of method X", we are referring to the delay performance of the end-to-end transmission, where method X is applied at each step.
In the course of presenting the results and based on the observed trends, we will discuss some secondary coding techniques and post processing considerations that can improve the decoding delay. Throughout the analysis of this section, we assume independent erasures in different links with identical probabilities. Hence, we can drop subscript i when referring to link erasure probabilities. Figure 2 shows the median of decoding delay for the transmission of K = 100 packets to N = 3 to N = 100 receivers. Channel erasures are memoryless and occur with a high probability of p = 0.5 independently in every link. The median of delay is computed across all receivers and is, in fact, also the median across many stochastic runs of the algorithms. The first curve from below shows the delay obtained from Algorithm 1 (Throughout the numerical evaluations, we used the sorted pruning version of Algorithm 1.). The middle curve is the delay obtained by performing Heuristic 1. The top curve shows a reproduction of delay results reported in [8] which are based on a random opportunistic instantaneous network coding strategy. In this case, the transmitter first selects a packet needed by at least one receiver at random. Then, it goes over other packets in some order and adds a packet to the current choice only if their addition still results in instantaneous decodability. In comparison, Heuristic 1 performs noticeably better than that in [8] and more importantly, is not much far away from the results of Algorithm 1. This is specially important since for some number of receivers, Heuristic 1 can run considerably faster than Algorithm 1, which will be shown in the coming figures shortly. Figure 3 compares the mean delay performance of different heuristics presented in Section 5 with that of Algorithm 1. Similar to the previous figure, mean delay is computed across all receivers. The delay performance of Heuristic 2, Heuristic 3, and Algorithm 1 are close, whereas Heuristic 1 results in the largest delay. A careful reader may notice that the end-to-end performance of Heuristic 2 is at times better than Algorithm 1. While the difference is practically insignificant, this deserves some explanation. The end-to-end transmission problem involves making packet transmission decisions at each step. While all algorithms start with the same packet incidence matrix (all-ones), due to packet erasures and as they make decisions about transmission of packets at each step, they take diverging paths in the solution space. As a result, they end up with different packet incidence matrices to solve over time. Hence, it is conceivable for an algorithm to make suboptimal decisions at one or more steps and yet end up with a better end-to-end delay than Algorithm 1 that strictly makes optimal decisions at every step. Intuition suggests that an algorithm such as Heuristic 1 that consistently makes suboptimal decisions is unlikely to outperform Algorithm 1 end-to-end, which is confirmed by the numerical results. However, an algorithm such as Heuristic 2 which almost always makes optimal decisions with only infrequent exceptions, may outperform Algorithm 1. According to Figure 3, these perturbations in end-to-end performance are practically insignificant and the intuitive choice of the optimal or a largely optimal algorithm at each step will result in the best end-to-end performance.
We note that the delays presented here (and also in the following figures) are, in fact, excess median or mean delays beyond the minimum required number of transmissions, which is K. For example, a mean delay of 10 slots for K = 100 packets signifies on average 10% overhead, which is the price for guaranteeing instantaneous decodability. In other words, one measure of throughput is th 1 where d is the mean delay across all receivers. An example is shown in Figure 3. For up to around 15 receivers in the system, Algorithm 1, Heuristics 2, and 3 ensure an average throughput loss of 10%.
It is quite possible that Algorithm 1 returns multiple network coding solutions all of which have the same objective value w T x. A natural question that arises is whether systematic selection of a solution with a particular property is better than others in the presence of erasures in the channel. Our experiments verify that indeed some secondary post processing on the solutions can improve the end-toend delay. In particular, we compare two post processing techniques: (1) selecting a solution which involves minimum amount of coding (lowest number of 1's in the solution vector x) and (2) selecting a solution with maximum amount of coding (highest number of 1's in the solution vector x). Figure 4 shows the effects of such processing on the overall decoding delays. It is clear that maximum coding is not a reasonable choice and results in worse delays compared with minimum coding. We attempt to explain this behavior by means of an example and intuitive reasoning. Let us assume that there are K = 3 packets to be transmitted to N = 3 receivers and at the beginning of the third transmission round, matrix A is given as follows It is clear that there are two optimal solutions: we can either send packets m 1 ⊕m 2 or packet m 3 by itself, where the former involves coding and latter is uncoded. Now let us assume that and clearly the optimal solution is sending packet m 3 . If in the fourth transmission round only R 1 successfully receives, A will become where it is evident that in the fifth transmission round, we cannot find a packet which is innovative and instantaneously decodable for all the three receivers. On the other hand, one can verify that if we adopt a minimum coding strategy and send packet m 3 in the third transmission round, we can always find innovative and instantaneously decodable packets for all three receivers in the future regardless of erasures in the channel. In summary, solutions with less coding tend to cause less constrains on the problem in the future.
It is noted in Figure 4 that the first solution returned by Algorithm 1 performs almost the same as the minimum coding solution. The reason for this is that Algorithm 1 first ranks the packets based on the number of receivers that need them. Therefore, the first solution picked by the algorithm is likely to contain packets with largest constrained sets and hence, many resolved packets are set to zero, which often translates into small amount of coding. Throughout this  It is interesting to analyze the actual number of recursions that the search in Algorithm 1 takes to find the optimum solution. This is shown in Figure 5 for K = 100 packets along with the number of recursions required in Heuristics 1, 2, and 3. Algorithm 1 shows three modes of behavior: low, medium, and high number of recursions. When the number of receivers is larger than N = 20, Algorithm 1 finds the optimal solution very quickly and the number of recursions is very close to the number of packets K. However, when the number of receivers is lower, the constraints that each receiver imposes on the network coding decisions cannot limit the search space enough and hence, a large number of combinations have to be tested. Obviously, Heuristic 1 has the lowest number of recursions. Compared to Heuristic 2 with 100 fixed recursions, dynamic Heuristic 3 can almost halve the number of recursions with negligible effect on delay performance (see Figure 3). By referring to Figure 3, we conclude that for the system under consideration, the excessive number of recursions in Algorithm 1 is not warranted as it does not result in any noticeable delay improvement compared to Heuristics 2 or 3. Figure 6 shows the effect of increasing the number of packets on the computational complexity of Algorithm 1 in terms of number of recursions to complete the search. Three different numbers of receivers N = 20, N = 30, and N = 40 are considered. The complexity remains linear with the number of packets for well-sized receiver populations (30 and 40 receivers). This is in agreement with observations in Figure 5. When the number of receivers is not so large (see the blue curve in Figure 6 for N = 20), we see a sudden growth in complexity, in terms of number of recursions, when K 700 packets. In such situations, truncating the number of recursion to be linear with the number of packets (Heuristic 2) is a good alternative. Figure 7 shows the impact of the number of packets and also erasure probability on the decoding delay. The normalized mean delay versus number of packets K is plotted for three different erasure probabilities P e = 0.5, P e = 0.4, and P e = 0.2, which are still high erasure probabilities. The number of receivers is fixed to N = 20. The delay performance of Heuristics 1 and 3 are shown. A few observations are made. Firstly, as expected, the delay (both absolute and normalized measures) decreases as the erasure probability decreases. Secondly, the difference in the delay performance between Heuristics 1 and 3 decreases as the erasure probability decreases. This trend has also been observed for other number of receivers. Moreover, the difference between heuristics and Algorithm 1 decreases with erasure probability, which is not shown here for clarity of figure. Finally, the normalized delay decreases as the number of packets increases. We noted, however, that the absolute delay may increase or decrease depending on the number of receivers in the system. We attribute possible decrease in the normalized delay to the fact that when there are more packets to transmit, the transmitter has more options to choose from and hence, encounters delays less often in a normalized sense.
An important question that may arise in practical situations is how to choose the "block size" or the number of packets that are taken into account for making network coding decisions. If one has a total of K packets to transmit, does it make sense to divide them into subblocks of smaller  The effect of number of packets and erasure probabilities on the normalized delay. The maximum number of recursions for Heuristic 3 is set to 100. As the erasure probability decreases, the delay decreases as expected. The normalized delay decreases with K for this particular N (this is not always the case). sizes or does it make sense to treat them as one single block of packets? The short answer is to include all "order-insensitive" packets in making transmission decisions and only break the packets into subblocks when the assumption of order insensitivity between subblocks breaks down. In the extreme case, an infinite number of order-insensitive packets provides an infinite pool of packets to choose from that can satisfy the demands of all receivers and are instantaneously decodable. Figure 8 shows the end-to-end delay when the number of packets in a block is finite and K = 100 packets is chosen as the reference for comparison. We can see that although the delay of transmitting λK packets, d λK , can be larger than that of transmitting K packets d K , the delay does not increase by a factor of λ. That is d λK < λd K and one does not benefit from breaking λK packets into λ subblocks of size K packets each. By treating λ subblocks of size K as one block of size λK, we add more degrees of freedom in making decisions. Now we turn our attention to the delay performance of our algorithms in channels with memory. Figure 9 shows the mean delay of different algorithms for K = 100 packets and N = 3 receivers. The GEC parameters for all links are identical with b = g. The horizontal axis shows the memory content μ = 1 − 2b. The first curve from above shows the performance of Algorithm 1 when the transmitter does not take channel conditions into account in making coding decisions. In other words, w j = |M j | is used in Algorithm 1 as if the channel states were memoryless. For relatively large memory contents, this method results in the largest mean delay. The next curve shows the delay performance of Heuristic 1. The next two curves, which are almost  Figure 8: The effect of block size on the mean delay. If the delay of transmitting K = 100 packets in Heuristic 1, d 100 , is taken as the reference, we can see that the delay of including λ × 100 packets in transmission is less than λd 100 . The same observation applies to the delay of Algorithm 1. In general, it is recommended to include all "order-insensitive" packets in making transmission decisions and only break the packets into subblocks when the assumption of order insensitivity between subblocks breaks down.
indistinguishable, show the performance of Algorithm 1 which takes channel states into account (using (7)) and Heuristic 2 with 100 recursions. The last curve shows the best delay that can be achieved by occasionally violating the instantaneous decodability rule for one receiver in favor of the other two receivers that are predicted to be in good state in the next transmission round. More details can be found in [14]. Figure 10 shows the delay performance of Algorithm 1 using packet weights according to (7) for N = 3 to N = 15 receivers. Both the mean delay and mean delay plus one standard deviation of delay (across 1000 stochastic runs of the transmission) are shown. As expected, the delay increases as the number of receivers increases. Comparing the delay's standard deviation with its mean, we observe that when the number of receivers is 3-5, the delay is relatively more variant than when the number of receivers is 10-15. For example, for N = 3 and μ = 0.984, the ratio of standard deviation to mean delay is around 3.225/0.8183 4, whereas for N = 15 and μ = 0.94 this ratio reduce to only 7.35/22.49 0.33. One should keep these variations in mind when designing the transmission system.
We conclude this section with a brief look at the effect of post processing on the delay performance in channels with memory. Figure 11 shows different delays for N = 15 receivers and K = 100 packets. The figure confirms our earlier finding that selecting the maximum amount of coding among the optimal solutions provided by Algorithm 1 can result in larger end-to-end delays. We also note that serving the maximum number of receivers can have an adverse effect Algorithm 1, but blind to channel states (w j = |M j |) Heuristic 1 Heuristic 2 Algorithm 1 with predictive weights using (7) Special case for N = 3 receivers [14] Figure 9: Delay performance of different algorithms in Gilbert-Elliott channels. The maximum number of recursions for Heuristic 2 is set to 100. By predicting next channel states and defining packet weights accordingly (see (7)), one can achieve considerably lower delays. 3 receivers mean + std dev 5 receivers mean + std dev 10 receivers mean + std dev 15 receivers mean + std dev 3 receivers mean 5 receivers mean 10 receivers mean 15 receivers mean Figure 10: Delay performance of Algorithm 1 with weights defined using (7) for different number of receivers. As expected, the delay increases with the number of receivers. Both the mean delay (solid curves) and mean delay plus one standard deviation of delay (dashed-dotted curves) across 1000 stochastic runs of the transmission are shown. 1 Channel memory μ = 1 − 2b Algorithm 1, but blind to channel states (w j = |M j |) Heuristic 1 Algorithm 1 with weights from (7) using max coding for max receivers Algorithm 1 with weights from (7) using first returned answer K = 100 packets N = 15 receivers Identical GEC in links with b = g Figure 11: The effect of post processing on mean delay. As explained in the main text, whenever Algorithm 1 returns multiple solutions, choosing the maximum amount of coding and serving maximum number of receivers can often have adverse effects on the delay. on the delay in GEC's. To explain this, consider an example where there are K = 2 left packets to be transmitted to N = 100 receivers. Packet 1 is needed by R 1 to R 99 and packet 2 is needed by R 99 and R 100 . Since both packets are needed by R 99 , we can either send packet 1 or 2, but not both. Now assume that R 1 to R 99 are all predicted to be in good state with probability 0.01 and R 100 is predicted to be in good state with probability 0.98, so that w 1 = w 2 = 0.99 according to (7). Therefore, transmission of either packet seems to be equally optimal. However, one can easily verify that the probability of at least one receiver among R 1 to R 99 receiving packet 1 is only 1 − 0.99 99 = 0.63, whereas the probability of either R 99 or R 100 receiving packet 2 is 1 − 0.99 * 0.02 = 0.9802. Therefore, it makes sense to satisfy only two receivers, one of which has a high priority due its good channel conditions.

Conclusions
In this paper, we provided an online optimal network coding scheme with feedback to minimize decoding delay in each transmission round in erasure broadcast channels. Efficient search algorithms for the optimal network coding solution, as well as heuristic methods were presented and their delay and computational performance were tested in several system scenarios. We found that adopting an optimized approach using as much information about the channel as possible, such as memory, leads to a significantly better decoding delay. An interesting problem for future research is