Multicell Downlink Capacity with Coordinated Processing

We study the potential beneﬁts of base-station (BS) cooperation for downlink transmission in multicell networks. Based on a modiﬁed Wyner-type model with users clustered at the cell-edges, we analyze the dirty-paper-coding (DPC) precoder and several linear precoding schemes, including cophasing, zero-forcing (ZF), and MMSE precoders. For the nonfading scenario with random phases, we obtain analytical performance expressions for each scheme. In particular, we characterize the high signal-to-noise ratio (SNR) performance gap between the DPC and ZF precoders in large networks, which indicates a singularity problem in certain network settings. Moreover, we demonstrate that the MMSE precoder does not completely resolve the singularity problem. However, by incorporating path gain fading, we numerically show that the singularity problem can be eased by linear precoding techniques aided with multiuser selection. By extending our network model to include cell-interior users, we determine the capacity regions of the two classes of users for various cooperative strategies. In addition to an outer bound and a baseline scheme, we also consider several locally cooperative transmission approaches. The resulting capacity regions show the tradeo ﬀ between the performance improvement and the requirement for BS cooperation, signal processing complexity, and channel state information at the transmitter (CSIT)


INTRODUCTION
The growing popularity of various high-speed wireless applications necessitates a fundamental characterization of wireless channels. A significant amount of research effort has been devoted to cellular systems,which are commonly deployed for serving mobile users.Conventionally, the downlink transmission in cellular systems is carried out through single-cell-processing (SCP), which is limited by intercell interference,especially for cell-edge users. The idea of cooperative multicell transmission has been proposed and studied in [1,2] and references therein to mitigate the inter-cell interference and enhance the cell-edge users' performance. The cooperative multicell downlink channel is closely related to the multiple-input multiple-output (MIMO) broadcast channel (BC), whose capacity region [3] is achieved by Costa's DPC principle [4]. However, the significant amount of processing complexity required by DPC prohibits its implementation in practice. Therefore, suboptimal BS cooperation schemes using cophasing [5,6], ZF, and MMSE linear precoders [7] have been proposed and analyzed for both nonfading and fading scenarios [2].
In the first part of this paper, we study the singleclass network, which is a modified Wyner-type multicell model [8] with users clustered at cell-edges. We consider the nonfading scenario (also previously considered in [9]) with fixed path gains and random path phases. (Note that the nonfading scenario in our paper has no path gain fading but has random path phases, which is different from the nonfading scenario in [10]. Our nonfading model with random path phases represents the case where equal transmitter power control is applied.) The addition of random path phases represents the middle ground between the nonfading scenario without random phases and the fading scenario with random path gains that have been considered in [10].With our nonfading model, we are able to characterize the effect of random phases independent of the path gain fading. Moreover, we introduce uniform asymmetry controlled by a single parameter α, which is different from [2], where all users see two symmetric BSs. The analysis 2 EURASIP Journal on Wireless Communications and Networking for uniform asymmetry case motivates our algorithm design for the fading scenario. We have obtained the analytical sum rate expressions for several cooperative downlink transmission schemes: intra-cell time-division-multiplexing (TDM) combined with inter-cell DPC, cophasing, ZF, and MMSE, respectively. Moreover, we analytically study the finite-size Wyner-type model, which sheds some light on the asymptotic behaviors of various precoding techniques in large networks. In particular, we have shown that if each user sees two equally strong paths, the sum rate performances of the ZF and MMSE precoders (combined with intra-cell TDM) deteriorate significantly in large networks, while the performance deterioration is less severe if the two paths to each user are of unequal strength. Therefore, to address this singularity problem, we induce the path gain asymmetry by incorporating path gain fading into our network model and combining multiuser scheduling with the linear precoders. For the Rayleigh fading case, we demonstrate through Monte-Carlo simulation the satisfactory performance of the linear precoders combined with the proposed multiuser scheduling algorithm. Note that our numerical results for the fading case serve the purpose of performance verification only, while [2] also provides analytical bounds.
In the second part of this paper, we consider double-class network (previously considered in [11,12]) by extending our network model to include cell-interior users. We have characterized the per-cell sum rate region for the rate pair of the cell-edge and cell-interior users for various cooperative downlink transmission strategies. Besides an outer bound and the baseline achieved by the cell-breathing [13] scheme, we have also studied several hybrid strategies to serve cell-interior users in each cell and cell-edge users in alternating cells. The comparison between the achievable rate regions of different cooperative transmission schemes exhibits a tradeoff between the performance improvement and the requirement for BS cooperation, signal processing complexity and CSIT knowledge.
Some relevant research work on single-class networks has been independently reported in [14,15]. However, our main contributions include that: we have proposed and studied a modified network model based on the one proposed in [2], incorporating two new elements: path asymmetry and random phases. For the nonfading scenario with random path phases, we have derived the analytical sum rate expressions for several cooperative downlink transmission schemes, identified a connection between the three linear precoders (cophasing, ZF, and MMSE) and a singularity problem with the linear precoding schemes in large networks. In the fading scenario, we have proposed a multiuser scheduling scheme to ease the singularity problem and verified its effectiveness through Monte-Carlo simulations for the Rayleigh fading case. Note that our work has focused on fully synchronized networks, while the asynchronism of interference in BS cooperation has been recently addressed in [16].
The remaining paper is composed of four sections. In Section 2, we introduce the network model and formulate our problem. In Section 3, we consider the single-class networks. In Section 4, we investigate the double-class networks. We conclude the paper in Section 5.

NETWORK MODEL & PROBLEM FORMULATION
We consider two simplified Wyner-type network models: one with cell-edge users only (single-class network), the other with both cell-edge and cell-interior users (doubleclass network). We will define both the downlink and the dual uplink channels, since we will frequently use the uplinkdownlinkduality [17][18][19] in our analysis.

Double-class network
The (N, K i , K e ) double-class network is composed of N cells, each with a single-antenna BS, a group of K i single-antenna cell-interior users, and a group of K e single-antenna celledge users. Note that the classification of users based on their distances from the BSs was originally proposed in [11]. The BSs are located uniformly along a ring. The cell-interior users are located close to their own BS. The cell-edge users are located at the cell-edge between their own BS and the adjacent BS. The cell-interior users see their own BS with path gain β, while the cell-edge users see their own BS with path gain 1 and the adjacent BS with path gain α. The paths are of i.i.d. random phases. The (4, 6, 5) double-class network is shown in Figure 1.
The downlink channel and the dual uplink channel (with the BSs' and the users' roles reversed) of the (N, K i , K e ) double-class network are represented as follows: Sheng Jing et al.

3
where , and w d ∼CN (0, I N ). The channel matrix H has the following form: where collects the path gains from BS m to the cell-interior users of cell n,which are specified as follows: and h T e,mn = [h e,mn1 , . . . , h e,mnKe ] collects the path gains from BS m to the cell-edge users of cell n, which are specified as follows: where [n] N means n modulo N.

Single-class network
The (N, K e ) single-class network layout is the same as the (N, K i , K e ) double-class network except that there are no cellinterior users (K i = 0). The (4, 5) single-class network is shown in Figure 2.
The downlink and the dual uplink channels of the (N, K e ) single-class network are also expressed as (1) and (2) where the channel matrix H simplifies to be Cell-edge user Base-station h T e,mn = [h e,mn1 , . . . , h e,mnKe ] collects the path gains from BS m to the cell-edge users of cell n, which are separately specified for two different scenarios as follows: (i) nonfading scenario with random path phases where [ j] N denotes j modulo N.

Problem formulation
In the downlink channel, the information vector b d is represented as follows: with the following power allocation In the dual uplink channel, x u itself is the information vector: with the following power allocation: x u i,nk is a power-P u i,nk information symbol from the kth cellinterior user in the nth cell, and x u e,nk is a power-P u e,nk information symbol from the kth cell-edge user in the nth cell. we use x u to denote the estimated information vector at the BSs using a N ×N (K i +K e ) linear filter V. Incorporating the filter matrix, our dual uplink channel expression (2) reduces to The sum power constraints on the downlink and the dual uplink are as follows: (i) downlink sum power: (ii) uplink sum power: while the corresponding per-cell power constraints are as follows: (i) downlink per-cell power: (ii) uplink per-cell power: We mainly focus on the downlink channel under the per-cell power constraint (18), where SNR is the BS-side signal-tonoise ratio. The BSs are allowed to cooperate in transmission, while the users are restricted to the single user receiver without successive cancelation. Moreover, encoding and decoding can spread over many fading blocks. For the downlink channel (12), in each fading block, the cooperative BSs choose the power allocation P d and the precoding matrix U based on the channel matrix H † . We then compute each user's signal-to-noise-and-interference ratio (SINR) SINR d i and the associated maximal achievable rate log 2 (1 + SINR d i ). We impose the per-cell power constraint (18) on each fading block. Our objective in single-class networks is to maximize the long-term ergodic per-cell sum rate: where the summation is over all users. Our objective in double-class networks is to optimize the long-term ergodic per-cell sum rate pair:

SINGLE-CLASS NETWORK
In this section, we focus on the (N, K e ) single-class network described in Section 2.2. Our objective is to maximize the ergodic per-cell sum rate (20) under the per-cell power constraint (18). We start by delimiting our working region for the nonfading scenario with a baseline scheme and an upper bound in Section 3.1. We then analyze several cooperative downlink transmission schemes in Section 3.2. We conclude this section with the fading scenario in Section 3.3.

Baseline & upper bound
To help demonstrate the performance of precoding schemes investigated later, we first characterize our working region of the ergodic per-cell sum rate with a baseline scheme and an upper bound as follows.

Baseline: single-cell processing (SCP) with reuse
The performance baseline is achieved by the SCP with reuse scheme, which proceeds as follows: at each time instance, every other BS serves its right user group (equivalently, their own user group with path gain 1) with full power SNR, while the remaining BSs are turned off. The SCP with reuse scheme is illustrated in Figure 3, and its performance is characterized in the following lemma.

Lemma 1 (baseline).
In the (N, K e ) single-class network, the ergodic per-cell sum rate achieved by SCP with reuse under the per-cell power constraint SNR is as follows: Proof. In the cells where the BSs are actively transmitting information, their cell-edge users see no interference since Sheng Jing et al. the neighboring BSs are turned off. Moreover, since the celledge users see equally strong paths from their own BS, the maximal sum rate is achieved by the BS transmitting to any cell-edge user with full power SNR, which is log 2 (1 + SNR). The ergodic per-cell sum rate expression (22) follows immediately by incorporating the 1/2 factor since only half of the BSs are active at any time instance.

Upper bound: dirty-paper coding (DPC)
In [19], the authors established a connection between sum capacities of the downlink and the dual uplink channels under linear power constraints (including the per-cell power constraints (18) and (19) as a special case). We list their main results here, which is slightly adapted to address the specific scenario we are considering.
Theorem 1 (minimax uplink-downlink duality [19]). For a given channel matrix H, the sum capacity of the downlink channel (1) under the per-cell power constraint (18) is the same as the sum capacity of the dual uplink channel (2) affected by a diagonal "uncertain" noise under the sum power constraint (17): where Λ and P u are N-dim and NK e -dim nonnegative diagonal matrices such that Tr(Λ) ≤ 1/SNR and Tr(P u ) ≤ 1.

Remark 1.
The average per-cell sum capacity of the downlink channel (1) under the per-cell power constraint (18) is C sum /N. Note that this rate may not be simultaneously achievable in all cells for a particular channel matrix H † .
We apply Theorem 1 to obtain the following performance upper bound for the (N, K e ) single-class network, which is similar to [2]. Theorem 2 (upper bound). In the (N, K e ) single-class network, the maximal achievable ergodic per-cell sum rate under the per-cell power constraint SNR has the following upper bound: Proof. The detailed proof is included in Appendix A.
Remark 2. Compared with the baseline scheme performance (22), the upper bound (24) is superior in two perspectives.
(i) The upper bound enjoys full degrees of freedom, while the baseline scheme suffers a half degree of freedom loss. (ii) The upper bound enjoys a power gain of (1 + α 2 ) as compared to the baseline scheme.
However, the upper bound can be approached only if the number of users per cell K e is large, and the complex DPC scheme is used across the entire network over all NK e users, which involves significant complexity and is hard to implement in practice. In the following, we address this issue by studying cooperative transmission schemes with lower complexities but still achieve good performance.

Precoding with intra-cell time division multiplexing (TDM)
For the following schemes in the single-class network, we assume that TDM is used within each cell, that is, only one user in each cell is actively receiving information at any time instance. With intra-cell TDM, the channel matrix H simplifies to be We define several macro-phase parameters as follows:

EURASIP Journal on Wireless Communications and Networking
We first characterize the inherent performance loss incurred by intra-cell TDM, which is accomplished by the following inter-cell DPC performance characterization.

Inter-cell DPC
The inter-cell DPC scheme proceeds as follows: the N BSs transmit to the N active users cooperatively using DPC, which is essentially the capacity-achieving scheme in the (N, 1) single-class network. The following theorem characterizes the ergodic sum rate performance of the intercell DPC scheme.

Theorem 3 (inter-cell DPC).
In the (N, K e ) single-class network, the maximal ergodic per-cell sum rate achievable by the inter-cell DPC scheme under the per-cell power constraint SNR is as follows: where γ ± are defined as follows: Proof. The detailed proof is included in the Appendix B.
It is worth mentioning that,for the scenario without path loss fading or random phases, the ergodic per-cell sum rate performance of the DPC precoder (with or without intracell TDM) under the per-cell power constraint has been characterized in [2]. Assuming that intra-cell TDM is used, the above theorem has extended the results in [2] to the nonfading scenario with fixed path gain and random path phases. Though Theorem 3 is proved along the same line as in [2] based on Theorem 1, the key step is new, which shows that |HP u H † +Λ| is rotational invariant in the diagonal entries of P u given that Λ = (1/N SNR)I N and |HP u H † + Λ| are symmetrical in the diagonal entries of Λ given that P u = (1/N)I N . Some techniques used in proving this step were reported in [20].

Remark 3.
Examining (27), it is noted that γ N + and γ N − are the dominant terms as N increases. Therefore, the random path phases effect Θ vanishes as the network size N increases. Similar observations were also made in [20]. Remark 4. This corollary has significance in two folds:

Corollary 1. In single-class network with a large number of cells, the asymptotic performance loss incurred by intra-cell TDM is
(i) the performance upper bound (24) is tight within less than one bit; (ii) intra-cell TDM does not incur significant performance loss.

Inter-cell cophasing with reuse
The inter-cell cophasing scheme [5,6] proceeds as follows: at each time instance, every other active user is receiving information from its own BS and the reachable adjacent BS, which coherently beamform to the targeted user; the other active users remain silent in this time instance. The inter-cell cophasing with reuse scheme is illustrated in Figure 4, and its ergodic per-cell sum rate performance is characterized in the following lemma.
Lemma 2 (inter-cell cophasing with reuse). In the (N, K e ) single-class network, the maximal ergodic per-cell sum rate achievable by the inter-cell cophasing scheme under the per-cell power constraint SNR is as follows: Proof. Beamforming from the two neighboring BSs to the active user provides a magnitude gain of 1+α. The cophasing performance expression (30) can be confirmed by further including the half degree of freedom loss incurred by only serving every other active user.
Sheng Jing et al.

Inter-cell zero-forcing (ZF)
The inter-cell ZF scheme [7] proceeds as follows: the N BSs cooperatively transmit to the N active users using the ZF precoder. We assume that the channel matrix H (N × N assuming intra-cell TDM) is nonsingular, since ZF precoder is not well-defined otherwise. The un-normalized ZF precoder is expressed as The ergodic per-cell sum rate of the inter-cell ZF scheme is characterized as follows.  (2) under the per-cell power constraint (19).

Lemma 4 (inter-cell ZF).
In the (N, K e ) single-class network, the maximal ergodic per-cell sum rate achievable by the intercell ZF scheme under the per-cell power constraint SNR is as follows: Proof. The detailed proofs of Lemmas 3 and 4 are included in Appendix C.
Corollary 2 (asymptotic inter-cell ZF performance gap). In single-class network with a large number of cells, the high SNR performance loss incurred by inter-cell ZF is bounded as follows: Proof. The detailed proof of this corollary is also included in Appendix C.

Remark 5.
As each user's two reachable paths get increasingly asymmetric (α→0), the asymptotic performance loss incurred by inter-cell ZF shrinks. On the other hand, the asymptotic performance loss of inter-cell ZF widens as each user sees two increasingly symmetric paths (α→1). The extreme case is when each user sees two equally strong paths, which is detailed in the following corollary.
Corollary 3 (inter-cell ZF, α = 1). In the special (N, K e ) single-class network with α = 1, the maximal ergodic per-cell sum rate achievable by the inter-cell ZF scheme under the percell power constraint SNR is as follows: Remark 6. For fixed SNR, the inter-cell ZF rate performance (34) decreases to zero as network size N increases. Compared with (27), the inter-cell ZF scheme incurs significant performance loss in large networks, which echoes (33). Since Wyner-type model approximates real networks only in large networks, the significant performance loss (34) poses a singularity problem for the inter-cell ZF scheme, which will be addressed in the following sections.

Inter-cell MMSE
The inter-cell MMSE scheme proceeds as follows: the N BSs cooperatively transmit to the active N users using the MMSE precoder. The un-normalized MMSE precoder is We characterize a lower bound to the maximal ergodic symmetric rate achievable by the inter-cell MMSE scheme as follows.

Lemma 5 (inter-cell MMSE).
In the (N, K e ) single-class network, the maximal ergodic symmetric rate achievable by the inter-cell MMSE scheme under the per-cell power constraint SNR has the following lower bound: where γ + and γ − are defined in (28).
Proof. The detailed proof is included in Appendix D.

Performance comparison
In the (32, 5) single-class network, we compare the above cooperative transmission schemes (together with the performance upper bound and lower bound) using Monte-Carlo simulation. The comparison is carried out for the following two α settings: (i) α = 0.75 case shown in Figure 5, Figure 6.
Remark 7. Figures 5 and 6 echo the asymptotic performance losses of inter-cell DPC (29) and inter-cell ZF (33).Moreover, Figure 6 indicates an underlying relationship connecting the performance of inter-cell cophasing, inter-cell ZF, and intercell MMSE.

Connection: cophasing, ZF, and MMSE
It is observed in Figures 5 and 6   observation in the asymptotic of SNR. We conjecture that similar analysis carries over to the general α ∈ (0, 1) case.
Theorem 4 (cophasing, ZF, and MMSE connection). In the single-class network where the network size N and the SNR scale to infinity simultaneously as N = SNR η , we have the following asymptotic characterization of the MMSE performance.
Remark 8. For the 32-cell single-class network, the dividing point of the above two regimes is SNR = N 2 ≈ 30.1(dB), which agrees with Figure 6.
Remark 9. This corollary confirms that the MMSE precoder coincides with the ZF in the high-SNR regime.

Corollary 5.
In large networks with a fixed SNR, Remark 10. The MMSE precoder loses half of the degrees of freedom in the low-SNR regime (SNR < N 2 ), which agrees with Figure 6 and also agrees with the performance of linear MMSE equalizer on 2-tap ISI channels [21].  In detection and estimation theory or filter theory, it is well known that MMSE outperforms ZF in the low-SNR regime, while the two are essentially the same in the high-SNR regime. Therefore, the above results do not seem surprising at the first glance. However, in our problem setting with α = 1, the division between the low-SNR regime and the high-SNR regime has an explicit characterization and depends on the network size. Moreover, Theorem 4, combined with Corollary 2, shows that although MMSE improves over ZF, it however does not solve ZF's singularity problem in the α = 1 setting.In the following section, we will try to avoid the singularity problem by incorporating fading into our network model.

Fading scenario
To avoid the singularity problem with the ZF and MMSE precoders in the α = 1 nonfading scenario (with random path phases), we incorporate path gain fading into our network model. We further apply multiuser scheduling to the linear precoding schemes to induce the path gain asymmetry missing in the α = 1 nonfading scenario (with random path phases). They are listed here together with the performance upper bound and lower bound. For each user, we use h 1 and h 2 to denote the path gain to its own BS and the adjacent BS, respectively.
(1) Upper bound: optimal DPC [22] across all NK e users under the sum power constraint 6, which is different from the upper bound (24) under the per-cell power constraint.
(2) Lower bound: in each cell, the user with the biggest path gain |h 1 | is selected; the SCP with reuse scheme is then applied to serve the selected users.
(3) Cophasing: in each cell, the user with the biggest beamforming gain |h 1 | + |h 2 | is selected; the cophasing with reuse scheme is then applied to serve the selected users.
(4) ZF: in each cell, the user with biggest path asymmetry |h 1 |/|h 2 | is selected; the optimal ZF precoder is then applied to serve the selected users.
For the Rayleigh fading scenario, we use the Monte-Carlo method to simulate the above precoding schemes in singleclass networks with different network size: (i) (32, 4) single-class network shown in Figure 7; (ii) (64, 4) single-class network (5 repetitions) shown in Figure 8.
Remark 11. Note that our results are obtained form numerical simulation, which is different from the analytical bounds obtained in [2]. From the simulation results, we observe that (1) cophasing and the lower bound lose half of the degrees of freedom, while ZF and the upper bound achieve full degrees of freedom; (2) ZF outperforms cophasing in the high-SNR regime (8-40 dB), while cophasing outperforms ZF in the low-SNR regime (0-8 dB); (3) the performance gap of ZF precoder from the upper bound in the α = 1 fading scenario (see Figures  7 and 8) is almost the same as that in the α = 0.75 nonfading scenario with random phases (see Figure 5). Moreover, ZF precoder with the proposed multiuser scheduling algorithm performs robustly in different network sizes, as shown in Figures 7  and 8. Therefore, by incorporating path gain fading and using multiuser scheduling, the ZF precoder no longer exhibits the singularity problem; (4) the MMSE precoder is not included in the simulation, since the network symmetry is broken by multiuser scheduling, and the MMSE precoder poses a nonconvex optimization. However, by definition, the optimal MMSE precoder should outperform both cophasing and ZF precoders.

DOUBLE-CLASS NETWORK
In real cellular networks, not all users are located at the edge of cells. In this section, we consider the (N, K e , K i ) doubleclass network specified in Section 2.1, where the users are divided into two categories, cell-interior or cell-edge. Our objective is to characterize the ergodic per-cell sum rate region (21) under the per-cell power constraint SNR (the notation SNR emphasizes our assumption of unit variance noise) as specified in (18). Recall that we use R e and R i to denote the ergodic per-cell sum rate for the cell-edge users and the cell-interior users, respectively. As in the previous section, we are particularly interested in suboptimal linear precoding schemes without resorting to DPC. Additionally, in this section, we break the circular array into clusters composed of a few cells, so as to serve both the cell-interior and the cell-edge users through localized BS cooperation. In particular, we present linear precoders based on two-cell clustering and three-cell clustering, respectively. Moreover, we compare their performance together with the outer bound and a baseline scheme, which are first described in the following subsections.

Performance outer bound
Lemma 6 (outer bound). In the (N, K e , K i ) double-class network under the per-cell power constraint (18), an outer bound to the achievable rate region of (R e , R i ) is: let P e and P i denote the average per-cell power allocated to the cell-edge users and the cell-interior users, respectively, then the rate pair (R e , R i ) is bounded as follows: R e + R i ≤ log 2 1 + 1 + α 2 P e + β 2 P i , where P e + P i = SNR.
Proof. The detailed proof is included in Appendix F.

Performance baseline: cell-breathing
We use a simplified cell-breathing strategy [13] as our baseline scheme: at odd time instances, each odd BS transmits to its own cell-edge user group with power Q e , and each even BS transmits to its cell-interior user group with power Q i , as shown in Figure 9. At even time instances, the odd BSs and even BSs switch roles to satisfy the average percell power constraint 8. Note that "cell-breathing" refers to the strategy where BSs alternate which alternate between serving its cell-edge user group and cell-interior user group. The baseline scheme is illustrated in Figure 9, where solid thick arrows denote intended transmissions, and dashed thin arrows denote interferences (also for Figure 11). Note that, the cell-breathing technique can be implemented over  time to satisfy the average per-cell power constraint or over carriers in a multicarrier system to satisfy the instantaneous per-cell power constraint.
Lemma 7 (performance baseline: cell-breathing). The achievable rate region of the cell-breathing strategy, R CB , has the following boundary: where the power allocation parameters Q e and Q i satisfy that Q i + Q e = 2SNR.
Proof. Equation (44) is the cell-edge user group's achievable rate when they are served by their BS (with power Q e ), facing the power-α 2 Q i interference from the neighboring BS. Equation (45) is the cell-interior user group's achievable rate when they are served by their BS (with power Q i ), without interference.
(i) The cell-edge users' performance is affected by the interference from cell-interior users' power (the α 2 Q i term); (ii) both cell-edge users and cell-interiors suffer half of the degrees of freedom loss.
Though the first issue could be addressed by introducing DPC, we would rather not pursue this approach for the sake of complexity. In the following, we would partially address the second issue by introducing several locally cooperative transmission schemes.

Cophasing with super-position coding (SPC)
The cophasing with SPC strategy proceeds as follows: at odd time instances, each odd-even BS pair coherently transmits to their shared cell-edge user group with power Q e1 and Q eα , respectively, and SPC to the cell-interior user group with power Q i1 and Q iα , respectively, as shown in Figure 10; at even time instances, the odd BSs and the even BSs switch roles. Similar to the baseline scheme, the cellbreathing technique can also be implemented over carriers in a multicarrier system to satisfy the instantaneous per-cell power constraint.  Figure 10: CoPhasing with SPC.

Lemma 8 (cophasing with SPC). The boundary of the achievable rate region of the cell-breathing with SPC strategy, R CoPhase-SPC , is characterized as follows. Let
Proof. The BSs add up the information intended for the cell-edge users and the cell-interior users and send it out. The cell-edge users treat the information intended for the cell-interior users as noise, which achieves the maximal rate R e of (46). The cell-interior users decode the information intended for cell-edge users and then decode for their own information, which achieves the maximal rate R i of (47). However, to ensure that the cell-interior users be able to decode the information intended for the cell-edge users, the power allocation parameters need to satisfy min which essentially states that the information intended for the cell-edge users should have better SINR when received by the cell-interior users as compared to when received by the celledge users. Otherwise, the cell-edge users need to lower their rate R e to (48).

Remark 13.
Adding SPC to each BS regains the full degree of freedom for cell-interior users. However, the cell-edge  Figure 11: Cell-breathing with SPC. users still suffer from half degree of freedom loss. Moreover, cophasing to the cell-edge users does require CSIT knowledge.

Cell-breathing with SPC
The cell-breathing with SPC strategy proceeds by breaking the network into 3-cell clusters at each instance. We take Figure 11 as an example to explain this strategy: the center BS serves its cell-interior group with power Q iβ ; the BS in cell 1 serves its cell-edge user group with power Q e1 and SPC to its cell-interior user group with power Q i1 ; the BS in cell 3 serves the cell-edge user group of cell 2 with power Q eα and SPC to its cell-interior user group with power Q iα . Note that cell-breathing (rotating the 3-cell cluster layout around the ring) can be implemented over time such that the average per-cell power constraint 8 is satisfied, or over the carriers in a multicarrier system such that the instantaneous per-cell power constraint is satisfied.
Lemma 9 (cell-breathing with SPC). The achievable rate region of the cell-breathing with SPC strategy, R CB-SPC ,has the following boundary: where the power allocation parameters Proof. The proof is omitted, since it is similar to the proof of Lemma 8.

Remark 14.
The significance of cell-breathing with SPC strategy is that it improves the cell-edge users to have 2/3 degree of freedom while maintaining full degree of freedom for the cell-interior users. Moreover, SPC does not require CSIT and is relatively easy to implement in practice.

Performance comparison
In double-class networks, we compare the above cooperative transmission strategies (together with the performance upper bound and the baseline scheme). The comparison is carried out for α = 1 and β settings. The SNR is set to be 20 dB.

Remark 15.
With fairness in mind, we are most interested in the equal rate performance of various cooperative transmission strategies, which corresponds to the R e = R i line in Figures 12 and 13. Comparing the above numerical results, we obtain the following observations.
(i) The β = 10 case is a typical example of networks with large cell size, where the cell-edge users and cell-interior users experience significantly disparate signal qualities. The β = 2 case is a typical example of networks with small cell size, where the cell-edge users and cell-interior users experience less disparate signal qualities. From Figures 12 and 13, we observe that the cell-breathing with SPC scheme recovers half of the gap between the baseline and the upper bound.
(ii) Note that the cell-breathing with SPC and the baseline cell-breathing scheme do not require CSIT knowledge, while the cophasing with SPC scheme requires perfect local CSIT knowledge.
(iii) Compared with the cell-breathing scheme, both SPC-based schemes, CB-SPC, and cophase-SPC, require some additional processing complexity at the cell-interior users.
Therefore, to choose a suitable cooperative transmission strategy in double-class networks not only depends on many network parameters (like cell size) but also admits a tradeoff between the performance improvement and the requirement for BS cooperation, signal processing complexity and CSIT knowledge.

CONCLUSIONS
In this paper, we investigated the potential benefits of cooperative downlink transmission in multicell networks. In single-class networks where the users are clustered at the cell-edges, we have obtained analytical performance expressions for DPC, cophasing, ZF, and MMSE precoders. In large networks and the high-SNR regime, we have demonstrated the asymptotic performance loss incurred by the ZF precoder, which indicates a singularity problem with the symmetric path gain setting. Moreover, by analyzing the different behaviors of MMSE precoder in different (N, SNR) regimes, we shown that the MMSE precoder does not solve the singularity problem. However, by incorporating path gain fading and multiuser scheduling, we eased the linear precoders' singularity problem, which is verified by Monte-Carlo simulations.
We further extended our network model to include cellinterior users and characterized the per-cell sum rate region Sheng Jing et al. 13 for the rate pairs of the cell-edge and cell-interior users for various cooperative downlink transmission schemes. Besides an outer bound and the baseline achieved by the cell-breathing scheme, we have also studied several hybrid strategies, including cophasing with SPC and cell-breathing with SPC. The comparison of the achievable rates by different transmission strategies exhibits a tradeoff between the performance improvement and the requirement for BS cooperation, signal processing complexity and CSIT knowledge.
Step (A.3) follows by replacing P u with Q u = N SNRP u . Inequality (A.4) follows by applying the Hadamard's inequality for the positive semidefinite matrix (HQ u H † + I).
Step (A.5) follows from the fact that, in the nonfading scenario (with random path phases), the diagonal entries of (HQ u H † + I) are independent of the specific channel matrix H. Inequality (A.6) follows from the known fact that the arithmetic mean is no less than the geometric mean.

B. PROOF OF THEOREM 3 AND COROLLARY 1
In the (N, 1) single-class network resulted from intra-cell TDM, the instantaneous per-cell sum rate under the per-cell power constraint (18) achieved by inter-cell DPC coincides with the per-cell sum-rate capacity of the (N, 1) single-class network, which is specified as follows: where H is specified in (25). Note that the above equation follows from Theorem 1. We calculate R DPC (H, N, SNR) by characterizing an upper bound and a lower bound to above expression and showing that the two bounds coincide.

B.1. Upper bound on R DPC (H, N, SNR)
We obtain an upper bound on R DPC (H, N, SNR) by setting where the last step follows by replacing P u with Q u = N SNRP u . Our objective is to show that the solution to the above maximization is Q u = SNR I. Since [23] shows that log 2 det(HQ u H † + I) is concave in Q u , we only need to verify that it is also invariant to the rotation of Q u , which is proved as follows. Note that HQ u H † + I has the following form: ..,φN =0 +I) is invariant to the rotation of Q u , which can be clearly observed from (B.3). Let first define a sequence of matrices as follows: Proof. The lemma follows from the following two initial conditions and one iterative relation: then we have the following expression: which clearly indicates that det(HQ u H † + I) is invariant to the rotation of Q u . Therefore,

B.2. Lower bound on R DPC (H, N, SNR)
We obtain a lower bound on R DPC (H, N, SNR) by setting P u = (1/N)I: Our objective is to show that the solution to the above minimization is Λ = (1/N SNR)I. Since [23] shows that log 2 (det((1/N)HH † + Λ)/ det(Λ)) is convex in Λ, we only need to show that it is also invariant to the rotation of Λ.
Since the above matrix has almost the same layout as the one in the previous section, we can show that det((1/N)HH † + Λ) is invariant to the rotation of Λ through similar steps. Therefore, det(SNR HH † + I) = The second-order difference equation (B.15) can be converted to which, combined with (B.13) and (B.14), can be reformulated as a first-order difference equation and solved to the following solution: where γ + and γ − are defined in (28). Therefore, The inter-cell DPC performance formula in Theorem 3 follows immediately by averaging of R DPC (H, N, SNR) over the channel matrix H.

B.4. Proof of Corollary 1
For a fixed channel matrix H, the asymptotic performance loss of inter-cell DPC is where the last step follows from the fact that (2(−1) N+1 α N cosΘ + 1 + α 2N ) is bounded. Corollary 1 follows immediately by averaging over H on both sides of the above equation.

C.1. Proof of Lemmas 3 & 4
The un-normalized ZF precoder is U = (H † ) −1 , while the un-normalized ZF filter is V = (H −1 ) † . Plugging V into the uplink channel expression (15), we obtain the following effective uplink channel: where the noise level of x u is ((H † H) −1 ) nn . Since, By dividing the cofactor of H † H by the determinant of H † H, we find that The above equal-diagonal-element property of (H † H) −1 is significant, since it confirms that the symmetric uplink power allocation P u = SNR I achieves the maximal per-cell sum rate using ZF filter under both the sum power constraint (17) and the per-cell power constraint (19). The conventional uplink-downlink duality [17,18] states that, if the downlink precoder U is the same as the uplink filter V, the maximal sum rate is the same in the downlink as in the uplink under the sum power constraints (16) and (17), respectively.
Plugging U into the downlink channel expression (12), we obtain the following effective downlink channel: where the per-cell power constraint (18) reduces to the diagonal entries of (HH † ) −1 are also identical and the same as the diagonal entries of (H † H) −1 : Therefore, we consider the following symmetric downlink power allocation: which satisfies the per-cell power constraint and achieves the same sum rate as the uplink sum rate achieved by the symmetric uplink power allocation P u = SNR I: Since this is also the maximal achievable downlink per-cell sum rate using ZF precoder under the sum power constraint, which should by definition dominate the maximal achievable downlink per-cell sum rate under the per-cell power constraint. Therefore, we have established Lemmas 3 and 4 simultaneously.
Proof. With A, U, C, and V denoting matrices of correct size, the Woodbury matrix identity is (D.10) Substituting A and C with the identity matrix I, U with B † and V with B, the Woodbury matrix identity (D.10) reduces to By multiplying both sides of the above identity with B † , we obtain Now, we are ready to show the connections between the diagonal entries of V MMSE V † MMSE and V † MMSE V MMSE : are identical, which can be verified in the same way as HH † in Appendix A. Therefore, V MMSE V † MMSE also has identical diagonal entries. Similarly, 14) which shows that V † MMSE V MMSE also has identical diagonal entries. Moreover, since 15) we know that V MMSE V † MMSE and V † MMSE V MMSE have the same diagonal entries as each other. Therefore, the maximal downlink per-cell sum rate achieved by MMSE precoder is lower bounded by the uplink per-cell sum rate achieved by the symmetric uplink power allocation. Now, we explicitly compute this lower bound. For the ease of computation, we reformulate the uplink SINR achieved by MMSE filter as in [24] SINR n = SNR·h † n Γ −1 n h n , (D. 16) where h n is the nth column of the uplink channel matrix H † , and Γ n is the noise-plus-interference covariance matrix that user n observes: Since the users' SINR are identical, we carry out the SINR computation for user N only. The noise-plus-interference covariance matrix of user N is where X = αSNR e − jφ1 , X = 1 + (1 + α 2 )SNR , Y = 1 + (1 + α 2 )SNR, Y = αSNR e − jφN−1 , Z = αSNR e jφN−1 , Z = 1 + α 2 SNR. Note that Γ N has an embedded Λ N−2 matrix (defined in (B.5)) in the center. Based on this observation, we have the following recursive expression: det Γ N = α 2 SNR 2 + (1 + α 2 )SNR + 1 det Λ N−2 − α 2 SNR 2 (1 + α 2 )SNR + 2 det Λ N−3 19) where, in the last step, we have applied the iterative expression of det Λ n in Appendix B. Since h N = [αe jθ1N , 0, . . . , 0, e jθNN ] T , where γ + and γ − are defined in (28). Now, Lemma 5 follows immediately.

E. PROOF OF THEOREM 4
Note that γ + and γ − are bounded as follows:  (i) Let P e,n denote the power allocated for cell-edge users in cell n. Assuming that P e is the average per-cell power of the cell-edge users, the average per-cell sum rate achievable for the cell-edge users is upper bounded by the per-cell sum rate of a single-class network under the sum power constraint (16) with SNR replaced by P e . Therefore, given a fixed channel matrix H, R e is upper bounded as follows: which verifies (41). The first inequality holds since the presence of cell-interior users cannot increase the performance of cell-edge users. The second inequality follows from the Hadamard's inequality for positive semidefinite matrices. The third inequality follows from the fact that the arithmetic average is no less than the geometric average.
(ii) Let P i,n denote the power allocated for cell-interior users in cell n. Since P i is the average per-cell power allocated for the cell-interior users, R i is upper bounded as follows: which verifies (42). The first inequality holds since the presence of cell-edge users cannot increase the performance of cell-interior users. The second inequality follows from the fact that the arithmetic average is no less than the geometric average.
(iii) Let P e,n and P i,n denote the power allocated for celledge users and cell-edge users, respectively, in cell n. Note that the per-cell power constraint requires that P e,n + P i,n ≤ SNR. For a given fixed channel matrix H, the per-cell sum rate R e + R i is upper bounded as follows: