Compound Wiretap Channels

This paper considers the compound wiretap channel, which generalizes Wyner’s wiretap model to allow the channels to the (legitimate) receiver and to the eavesdropper to take a number of possible states. No matter which states occur, the transmitter guarantees that the receiver decodes its message and that the eavesdropper is kept in full ignorance about the message. The compound wiretap channel can also be viewed as a multicast channel with multiple eavesdroppers, in which the transmitter sends information to all receivers and keeps the information secret from all eavesdroppers. For the discrete memoryless channel, lower and upper bounds on the secrecy capacity are derived. The secrecy capacity is established for the degraded channel and the semideterministic channel with one receiver. The parallel Gaussian channel is further studied. The secrecy capacity and the secrecy degree of freedom ( s.d.o.f. ) are derived for the degraded case with one receiver. Schemes to achieve the s.d.o.f. for the case with two receivers and two eavesdroppers are constructed to demonstrate the necessity of a preﬁx channel in encoder design. Finally, the multi-antenna (i.e., MIMO) compound wiretap channel is studied. The secrecy capacity is established for the degraded case and an achievable s.d.o.f. is given for the general case.


Introduction
The compound channel models transmission over a channel that may take a number of states and reliable communication needs to be guaranteed regardless of which state occurs. For example, this type of channel might arise in realtime wireless communications when the transmitter has no knowledge of the channel state, but zero performance outage needs to be guaranteed subject to a stringent delay constraint. In this paper, we are interested in the compound channel with an eavesdropper that receives outputs via a compound channel that may also take a number of states. Now the transmitter not only needs to guarantee reliable communication to the legitimate receiver, but also needs to prevent the information from being known by the eavesdropper. This is a generalization of Wyner's wiretap channel [1] to the case of multiple channel states.
We consider the situation in which the channel remains in the same state during the entire transmission, and the channel state is known at the corresponding receivers, but not at the transmitter. However, we note that having the channel state information at the receivers comes at no cost to the communication rate, because the channel states can be learned by the receivers at the beginning of transmission via training symbols whose length is negligible compared to the codeword length.
We can also interpret the compound wiretap channel as the multicast channel with multiple eavesdroppers (see Figure 1). In this case, the number of states to the receiver now becomes the number of receivers with each state corresponding to one receiver, and the number of states to the eavesdropper becomes the number of eavesdroppers with each state corresponding to one eavesdropper. The transmitter wishes to transmit information to all receivers and keep the information secret from all eavesdroppers. In this paper, we adopt this interpretation. From this viewpoint, the compound wiretap channel provides a general framework that includes a number of models studied previously as special cases. These models include the parallel wiretap channel with two eavesdroppers studied in [2,3] .
. . wiretap channels with multiple eavesdroppers studied in [4], and the wiretap channel with multiple receivers studied in [5].
In this paper, we first study the discrete memoryless compound wiretap channel, for which we provide lower and upper bounds on the secrecy capacity. The lower bound indicates that the channel input scheme needs to balance the rates for all receiver-eavesdropper pairs, and hence none of them may achieve their best secrecy rate. We further establish the secrecy capacity for the degraded channel and the semideterministic channel with one receiver and multiple eavesdroppers.
We further study the parallel Gaussian compound wiretap channel, in which the channels to each receiver and to each eavesdropper are parallel Gaussian channels with multiple Gaussian subchannels. Channels of this type arise, for example, in wideband wireless communication systems such as frequency division multiplexing (FDM) systems in which transmission takes place over a number of frequency bands, and the eavesdroppers can tune their receivers to access some of these frequency bands. Understanding this channel is also important for studying the compound timevarying fading wiretap channels, as the parallel channel serves as a general model for the fading channel.
We first consider the degraded parallel Gaussian compound channel with one receiver and multiple eavesdroppers, for which we obtain the secrecy capacity. To further illustrate our results, we study the secrecy degree of freedom (s.d.o. f .), which characterizes how the secrecy capacity scales with log SNR. We show that the s.d.o. f . depends only on the total number of subchannels that the receiver accesses and the maximal number of subchannels that one eavesdropper can access. It is somewhat interesting that the s.d.o. f . does not depend on the total number of subchannels that all eavesdroppers can access and does not depend on the number of eavesdroppers either. We observe that there is a connection between the s.d.o. f . and secure network coding studied in [6]. However, the s.d.o. f . is defined for noisy Gaussian channels while secure network coding addresses deterministic networks.
We then study an example parallel Gaussian compound wiretap channel with two receivers and two eavesdroppers. For this channel, we propose three schemes. Scheme 1 is to map source information directly to Gaussian channel inputs, and this scheme is shown to be strictly suboptimal. Scheme 2 is to introduce a key random variable to randomize the source information, and this scheme achieves the s.d.o. f . Scheme 3 is to randomize the encoder by introducing a random prefix channel, and this scheme is also shown to achieve the s.d.o. f . This example channel demonstrates that randomization of either source information or encoder is necessary to achieve the s.d.o. f . for the parallel Gaussian compound channels.
We finally study the multiinput multioutput (MIMO) compound wiretap channel. We first provide the secrecy capacity for the degraded MIMO compound wiretap channel. We then study the general MIMO compound wiretap channel, for which we propose an input scheme and derive an achievable s. We further note that after our conference publication [7] appeared with the results presented here, another upper bound on the secrecy capacity of the compound wiretap channel was derived in [8]. The secrecy capacity result for the parallel Gaussian compound wiretap channel was also extended to the nondegraded parallel Gaussian compound wiretap channel with one receiver and multiple eavesdroppers in [8]. We also refer the reader to [9] for a review of recent studies on compound wiretap channels.
The rest of the paper is organized as follows. In Section 2, we introduce the model of the compound wiretap channel. In Section 3, we present our results on the discrete memoryless compound wiretap channel. In Sections 4 and 5, we provide the results on the secrecy capacity and the s.d.o. f . for two cases of the parallel Gaussian compound wiretap channel. In Section 6, we provide our results on the MIMO compound wiretap channel. In the last section, we give concluding remarks.

Channel Model
We consider the following compound wiretap channel model. Definition 1. The compound wiretap channel consists of one finite channel input alphabet X, J finite channel output alphabets Y 1 , . . . , Y J , K finite channel output alphabets Z 1 , . . . , Z K , and a set of the transition probability distributions for one channel use where x ∈ X is the channel input from the transmitter, y j ∈ Y j is the channel output at receiver j, and z k ∈ Z k is the channel output at eavesdropper k. The channel is memoryless across channel uses. As the correlation between Y j and Z k does not affect the secrecy capacity (similar to [10, Lemma 1]), without loss of optimality, we assume a transition probability of the form P Yj |X P Zk|X as shown in Figure 1. (ii) an encoder f : W → X n mapping each message w ∈ W to a codeword x n ∈ X n ; (iii) J decoders g j : Y n j → W ( j) for j = 1, . . . , J, each mapping received sequence y n j to a message w ( j) ∈ W for j = 1, . . . , J.
The average block error probability for receiver j for j = 1, . . . , J is defined as The secrecy level of the message W at eavesdropper k for k = 1, . . . , K is defined by the following equivocation rate: A rate-equivocation pair (R, R e ) is achievable if there exists a sequence of (2 nR , n) codes with the average error probabilities as n goes to infinity and with the equivocation rate satisfying In this paper, we are interested in the case of perfect secrecy, that is, R = R e . A secrecy rate R is achievable if the rate-equivocation pair (R, R) is achievable. The secrecy capacity is defined to be the maximal achievable secrecy rate.

Discrete Memoryless Compound Wiretap Channels
In the following, we provide lower and upper bounds on the secrecy capacity of the compound wiretap channel.

Theorem 1.
The following secrecy rate is achievable for the compound wiretap channel: where U is an auxiliary random variable, and the maximum is taken over all distributions P UX that satisfy the Markov chain relationships: Proof. See Appendix A.
Theorem 1 can be interpreted as a worst case result that is, the worst receiver and the best eavesdropper dominate the secrecy rate.

Theorem 2. An upper bound on the secrecy capacity of the compound wiretap channel is given by
where U is an auxiliary random variable whose joint distribution with X, Y j , and Z k factors was shown in (8).
Proof. It can be seen that the quantity in (8) is the secrecy capacity of the wiretap channel with the transition probability distribution P Yj Zk|X [11,Corollary 2]. But the secrecy capacity of the compound wiretap channel is less than the secrecy capacity of any receiver-eavesdropper pair.
We note that it may not be possible to achieve the upper bound given in Theorem 2 in general. This is because the input scheme needs to balance the rates that can be achieved for all receiver-eavesdropper pairs, and consequently, none of them can achieve its best rate. This can also be seen from the achievable rate in (6). The input distribution P UX that maximizes the minimum of the secrecy rates of all receivereavesdropper pairs may not be optimal for any single pair.
We now give an example channel in which the lower bound given in Theorem 1 can be shown to be the secrecy capacity. We say that the compound wiretap channel is degraded if the transition probability satisfies the Markov chain relationships: for all j = 1, . . . , J and k = 1, . . . , K. For the degraded compound wiretap channel, we have the following capacity theorem.  Proof. The achievability follows from Theorem 1 by setting U = X. The converse follows because for each ( j, k) and an input distribution P X , an upper bound

Theorem 3. The secrecy capacity of the degraded compound wiretap channel is given by
can be derived as given in [1].
We next provide the secrecy capacity for the semideterministic compound wiretap channel, which has one receiver (J = 1) and K eavesdroppers. The channel from the transmitter to the receiver is a deterministic channel; that is, the transition probability distribution P Y |X takes on the values 0 or 1 only, where the output at the receiver is denoted by Y (see Figure 2).

Theorem 4.
The secrecy capacity of the semideterministic compound wiretap channel with J = 1 is given by Proof. To prove the achievability, we apply (6) and obtain the following achievable rate: where the maximum is taken over all distributions P UX that satisfy the Markov chain relationship: We further let U = Y . It is clear that this choice satisfies the previous Markov chain condition, and results in an achievable rate The converse is relegated to Appendix B.
We note that the achievable scheme involves choosing an auxiliary random variable U = Y . This indicates that a prefix channel from U to the actual channel input X at the encoder is necessary to achieve the secrecy capacity. . . Figure 3: Parallel compound wiretap channel with one receiver and K eavesdroppers.

Parallel Gaussian Compound Wiretap
In this section, we focus on the case in which J = 1 and K > 1, that is, one receiver and K eavesdroppers (see Figure 3). We further assume that the channel from the transmitter to the receiver is the parallel Gaussian channel with N independent subchannels, and the outputs of the subchannels at the receiver for one channel use are given by where W 1 , . . . , W a are independent Gaussian random variables with variances w 2 1 , . . . , w 2 a , and these noise variables are independent and identically distributed (i.i.d.) across channel uses. We note that for this model, Y 1 , . . . , Y N indicate the outputs at the receiver from the N subchannels, and do not indicate the outputs corresponding to different receivers. The channel input is subject to the average power constraint P, that is, where i is the symbol time index. We assume that each eavesdropper can access some subchannels. On letting A k ⊆ {1, . . . , N} include all indices of the subchannels that eavesdropper k can access, the outputs at eavesdropper k are given by where V ka for a ∈ A k are independent Gaussian random variables with variances v 2 ka . We further assume that v 2 ka ≥ w 2 a for all a ∈ A k , and hence the channel is degraded.
For the degraded parallel Gaussian compound wiretap channel, we have the following secrecy capacity.

Corollary 1. The secrecy capacity of the degraded parallel Gaussian compound wiretap channel is given by
EURASIP Journal on Wireless Communications and Networking 5 Proof. The achievability follows from Theorem 3 by choosing X = X 1 , . . . , X N with independent components and each X a ∈ N (0, P a ). The converse follows from [12,Theorem 2] by setting R 0 = 0 for each eavesdropper.
We note that the parallel Gaussian compound wiretap channel is a more general model than the model in [3] in that the number of eavesdroppers is arbitrary, each eavesdropper may access an arbitrary number of subchannels, and the transmitter is allowed to allocate power among the subchannels to achieve better secrecy rate. We also note that the parallel Gaussian compound wiretap channel reduces to the Gaussian/fading wiretap channel with multiple eavesdroppers studied in [4] if there is only one subchannel.
We further note that after our conference publication [7] appeared with the results presented here, the secrecy capacity of the general (i.e., not necessarily degraded) parallel Gaussian compound wiretap channel with one receiver and multiple eavesdroppers has been obtained in [8]. We refer the reader to [8] for further details.
To gain further insight into the secrecy capacity, we consider the rate at which the secrecy capacity scales with logSNR. In particular, we define the secrecy degree of freedom (s.d.o.f.) as where without loss of generality, we choose w 2 1 as the reference noise level and define SNR = P/(Nw 2 1 Proof. The achievability follows by applying Corollary 1 and choosing P a = P/N for a = 1, . . . , N. The converse follows by considering only eavesdropper k that accesses L subchannels, that is, |A k | = L, and evaluating the first-order SNR expansion of the secrecy capacity.  based on network coding. However, we note that Corollary 2 is applicable for noisy Gaussian channels while the secure rate given in [6, Theorem 2] is derived for deterministic networks.
We also refer the reader to [7] for some example channels for which simple schemes were constructed to achieve the s.d.o. f .

Parallel Gaussian Compound
Wiretap Channels: J > 1 In this section, we study the parallel Gaussian compound wiretap channel, in which J > 1 and K > 1. We address optimal schemes that achieve the best secrecy rate scaling with SNR. For the sake of clarity of exposition on this issue, we study the simplest example when J = 2 and K = 2 to illustrate the key factors that affect optimal schemes. Example 1. Consider the parallel Gaussian compound wiretap channel with J = 2 and K = 2 (see Figure 4). The channel output at receiver 1 is given by where W 1 is a zero-mean Gaussian random variable with variance w 2 1 . The channel outputs at receiver 2 are given by where W 21 and W 22 are zero-mean independent Gaussian random variables with variances w 2 21 and w 2 22 . The outputs at the two eavesdroppers are given by where V 1 and V 2 are zero-mean independent Gaussian random variables with variances v 2 1 and v 2 2 , respectively. The channel input includes three components X 1 , X 21 , and X 22 , and they are subject to an average power constraint P, that is, For this channel, we study the s.d.o. f ., for which we choose w 2 1 as the reference noise level and define SNR = P/w 2 1 .

EURASIP Journal on Wireless Communications and Networking
An achievable rate follows from (6) and is given by In the following, we study three schemes, two of which are based on (27). It can be seen that a prefix channel U → X is necessary to achieve the optimal s.d.o. f . For computational convenience, in the following we assume w 2 This assumption does not affect the s.d.o. f ., which we compute for each scheme. Scheme 1. Choose U = X = (X 1 , X 21 , X 22 ) and X 1 ∼ N (0, P 1 ), X 21 ∼ N (0, P 21 ), and X 22 ∼ N (0, P 22 ) in (27). Based on these distributions, Scheme 1 achieves the following secrecy rate: It can be seen that the optimal power allocation (P * 1 , P * 21 , P * 22 ) should result in four equal terms in the minimum in (28). Hence we obtain the following condition: Combining the preceding equation and the power constraint P * 1 + P * 21 + P * 22 = P, we obtain Substituting the optimal power allocation into (28), we obtain where (a= b) denotes that lim P → ∞ (a/b) = 1. Therefore, Scheme 1 achieves  Figure 5: Illustration of Scheme 2.

Scheme 2.
We choose a Gaussian input and allocate the source power equally for X 1 , X 21 , and X 22 . Each subchannel can hence support the following rate: Recall that the source message W is uniformly distributed over the set {0, . . . , 2 nR − 1}. We generate a key random variable M that is independent of W and is also uniformly distributed over the set {0, . . . , 2 nR − 1}. We transmit W over the channel X 1 → Y 1 and transmit W ⊕ M and M over the channels X 21 → Y 21 and X 22 → Y 22 , respectively (see Figure 5). It is clear that receiver 1 decodes W, and receiver 2 decodes W ⊕ M and M, and hence decodes W. For eavesdroppers 1 and 2, each obtains either W ⊕M or M, both of which are independent of W. Hence eavesdroppers 1 and 2 do not get any information about W, and perfect secrecy is achieved. It is clear that this scheme achieves This is clearly the largest achievable s.d.o. f ., because the maximal degree of freedom achievable for receiver 1 is 1.
We note that Scheme 2 introduces randomness into the information source to achieve secrecy. Interestingly, Scheme 2 can be interpreted as turning the channel into a state dependent wiretap channel as studied in [13]. The key random variable M in Scheme 2 now corresponds to the channel state, which is known to the transmitter only. As shown in [13], the state variable helps improving the secrecy rate.
As remarked in Section 4, Scheme 2 for the noisy Gaussian channel is similar to the scheme designed for deterministic wiretap network models in [6]. More recently, deterministic network models have been proposed and studied (see, e.g., [14]) to obtain sufficiently accurate performance for Gaussian networks. It is hence interesting to apply this approach to study the secrecy capacity or s.d.o. f . for the Gaussian or other noisy wiretap networks. The key step is to come up with deterministic models that approximate the performance (e.g., in terms of s.d.o. f .) of noisy wiretap networks, and whose secrecy capacity can be determined easily.
Scheme 2 also suggests that Scheme 1 is strictly suboptimal. It is then natural to ask if we can modify Scheme 1 by defining the auxiliary random variable U in (27) properly to achieve the optimal s.d.o. f . We hence propose the following Scheme 3.

Scheme 3.
Choose U = (X 1 , X 21 + X 22 ) and X 1 ∼ N (0, P/3), X 21 ∼ N (0, P/3), and X 22 ∼ N (0, P/3) in (27). It is clear that the above choice of U satisfies the Markov chain relationship U → X → (YZ) and is hence valid. The achievable secret rate under this scheme is given by Based on the joint distribution of U and X, we obtain Hence R=(1/2) log SNR, and Scheme 3 achieves Compared to Scheme 1 and Scheme 3 introduces extra randomness in the encoder by introducing a prefix channel U → X, and hence achieves the optimal s.d.o. f . We also note that for Gaussian wiretap channels, including the singleinput single-output channel studied in [15] and the multiinput multi-output channel studied in [16][17][18], the prefix channel is not necessary to achieve the secrecy capacity, that is, U = X. However, the prefix channel is necessary to achieve the optimal s.d.o. f . for the parallel Gaussian compound wiretap channel.
From Schemes 2 and 3, we also observe that introducing randomness either into the information source or into the encoder strictly improves the s.d.o. f . and hence improves the secrecy rate.

MIMO Compound Wiretap Channels
In this section, we consider the MIMO compound wiretap channel in which the transmitter, the receivers, and the eavesdroppers are equipped with multiple antennas. We let N t denote the number of antennas of the transmitter, N r denote the number of antennas of the receivers, and N e denote the number of antennas of the eavesdroppers. We assume that all receivers have the same number of antennas and all eavesdroppers have the same number of antennas, but our analysis below is also applicable without this assumption.
The channel input-output relationship at one time instant is given by where H j for j = 1, . . . , J and G k for k = 1, . . . , K are fixed matrices, and W 1 , . . . , W J and V 1 , . . . , V K are i.i.d. Gaussian random vectors with identity covariance matrices. We assume that the channel input is subject to an average power constraint: where i is the symbol time (i.e., channel use) index.
In the following, we first study the degraded MIMO compound wiretap channel, and then study the general MIMO compound wiretap channel. We use the following notation associated with matrices. We use A 0 to indicate that A is a positive semidefinite matrix, A 0 to indicate that A is a positive definite matrix, and A B to indicate that A − B is a positive semidefinite matrix. The symbols and ≺ indicate the oppositive meanings to those of and , respectively. [19], we define the MIMO compound wiretap channel to be degraded if for each ( j, k) pair, there exists a matrix D jk such that D jk H j = G k and D jk D T jk I. It is easy to check that for each ( j, k) pair, the channel satisfies the Markov chain relationship X → Y j → Z k .

Theorem 5. The secrecy capacity of the degraded MIMO compound wiretap channel is given by
Proof. We only need to show that the secrecy capacity is given by if the input is subject to the covariance matrix constraint where K X i denotes the covariance matrix of X i at symbol time i. Theorem 5 then follows by maximizing (41) over all Q that satisfy the power constraint, that is, Tr(Q) ≤ P. The achievability follows from Theorem 3 by choosing X ∼ N (0, Q). The proof of the converse is relegated to Appendix C.

General MIMO Compound Wiretap Channels.
In this subsection, we study the general MIMO compound wiretap channel defined in (38), where we do not make the the degradedness assumption.
Based on Theorem 1 by choosing U = X ∼ N (0, Q), it is easy to see that the following secrecy rate is achievable.

Lemma 1. For the general MIMO compound wiretap channel, an achievable secrecy rate is given by
In general, the maximization problem in (43) is difficult to solve. To gain some insight, we study the s.d.o. f . defined as in (21), but with SNR = P/N t .
We design the following beamforming scheme. Let r = Rank( J j=1 H T j H j ) and {u 1 , . . . , u r } be the eigenvectors of J j=1 H T j H j that correspond to nonzero eigenvalues. These vectors are directions along which at least one receiver may receive input signals. In fact, if we let {u j1 , . . . , u jrj } be the eigenvectors of H T j H j that correspond to nonzero eigenvalues, then the vectors in the set {(u j1 , . . . , u jrj ) : j = 1, . . . , J} span the same vector space as {u 1 , . . . , u r }.
We let {u r+1 , . . . , u Nt } be the eigenvectors of J j=1 H T j H j that correspond to zero eigenvalues. We further let Then we have where Λ r denotes the diagonal matrix with the eigenvalues of J j=1 H T j H j as the diagonal components, and 0 Nt−r denotes the all-zero matrix of dimension (N t − r) × (N t − r).
We now let L be a subset of {1, 2, . . . , r} and assume L = {l 1 , . . . , l |L| }, where |L| indicates the number of components in the set L. We then let L c denote the complement of L with respect to the set {1, 2, . . . , r} and assume L c = {l 1 , . . . , l r−|L| }. Let If we choose the beamforming directions to be column vectors in U L and allocate power equally for these directions, then the input covariance matrix is given by and we obtain Hence we have Therefore, we have the following theorem.
Theorem 6. An achievable secrecy degree of freedom of the MIMO compound wiretap channel is given by We note that each set L corresponds to one set of directions for which the transmitter allocates power, and hence corresponds to one power allocation strategy. The optimal achievable s.d.o. f . can be obtained by searching over all power allocation strategies. We note that Rank(H j U L ) and Rank(G k U L ) in (50) can be interpreted as the dimensions of the projections of H j and G k , respectively, onto the vector space spanned by the column vectors of U L . Hence the achievable s.d.o. f . is determined by the geometry of the channel matrices to the receivers and eavesdroppers.
For the special case J = 1, that is, there is only one receiver, the channel matrix to the receiver is H, and r becomes the rank of H T H and hence the rank of H. We should always choose L = {1, . . . , r}, and the resulting s.d.o. f . is given in the following corollary to Theorem 6.

Corollary 3.
For the MIMO compound wiretap channel with J = 1, an achievable secrecy degree of freedom is given by where U is the matrix whose columns are the eigenvectors of H T H corresponding to nonzero eigenvalues.
We refer the reader to [7] for an example of MIMO compound wiretap channel for which particular signaling scheme transforms the channel into an equivalent parallel Gaussian compound wiretap channel, and a simple scheme can hence be constructed to achieve the s.d.o. f . for the channel.

Discussion and Conclusions
In this paper, we have studied the compound wiretap channel, which provides a general framework for examining multicast communication with multiple eavesdroppers. We have obtained lower and upper bounds on the secrecy capacity for the general compound wiretap channel and have established the secrecy capacity for the degraded and semideterministic channel. We have further obtained the secrecy capacity for the degraded parallel Gaussian and degraded MIMO compound wiretap channels. The secrecy rate/capacity in general has a worst-case interpretation.
We have also introduced the notion of the secrecy degree of freedom, which captures the most important factors that affect the scaling behavior of the secrecy capacity at high SNR. For the parallel Gaussian compound channel, we have demonstrated that the s.d.o. f . depends only on the maximal number of subchannels that one eavesdropper can access and does not depend on the number of eavesdroppers. We have also shown that randomizing either source information or the encoder strictly improves the s.d.o. f . for an example case when J > 1 and K > 1. For the MIMO compound wiretap channel, we have shown that the achievable s.d.o. f . is determined by the geometries of the matrices describing the channels to the receivers and eavesdroppers. We have also noted that it has been shown via a few example channels in [7] that there are simple schemes to achieve the s.d.o. f . in many cases.
We finally note that the capacity of the general compound wiretap channel is still not known. Several interesting special cases are worth addressing, including the Gaussian parallel compound wiretap channel with multiple receivers and multiple eavesdroppers and the general MIMO compound wiretap channel. Understanding the s.d.o. f . of these scenarios may be a useful first step. The techniques for studying the compound broadcast channel without secrecy constraints [20] may be useful here. In particular, designing a zeroforcing transmission scheme over multiple time slots for the MIMO compound wiretap channel as in [20] may be useful in studying the s.d.o. f . However, we remark that one cannot expect the eavesdropper to look only at subspaces. As a more general model, the MIMO compound broadcast channel is also interesting to study. Some recent studies [21] and [22] have provided useful techniques for further study of this topic.

A. Proof of Theorem 1
The idea of the proof is to show there exists a codebook that consists of a number of subcodebooks (similar to [1]). Each receiver can successfully decode over the entire codebook, but all eavesdroppers can successfully decode only within each subcodebook. Hence the transmitter maps messages to different subcodebooks to confuse the eavesdroppers and achieve perfect secrecy.
For a given joint distribution P X P Y1...YJ Z1...ZK |X , it is sufficient to show the following rate is achievable: Then the rate given in (6) is achievable by prefixing a discrete memoryless channel from U to X with the transition distribution P X|U to the transmitter (similar to [11,Lemma 4]). We first prove a useful lemma that simplifies the proof later on.
Lemma A.1. If I(X; Z 1 ) < I(X; Z 2 ), then there exists a random variable Z such that I(X; Z 1 , Z) = I(X; Z 2 ) and Z satisfies the Markov chain X → (Z 1 , Z 2 ) → Z.
Proof. Let U be a binary random variable with distribution Pr{U = 1} = p and Pr{U = 2} = 1 − p, and U is independent of all other random variables in the model under consideration. Let Z = (Z U , U). Clearly, Z satisfies the Markov chain condition given in the lemma. Let It is clear that Since f (p) is a continuous function for 0 ≤ p ≤ 1, there must exist p * such that f (p * ) = I(X; Z 2 ). Therefore, Z = (Z U , U) with U having distribution Pr{U = 1} = p * satisfies I(X; Z 1 , Z) = I(X; Z 2 ).
Based on the previous lemma, it is sufficient to consider enhanced eavesdroppers, each with outputs Z k = (Z k , Z k ) such that I(X; Z k ) = max k I(X; Z k ). It is clear that if perfect secrecy can be achieved for the enhanced eavesdroppers, it must be achieved for the original eavesdroppers.
We now consider the following codebook: where R is given in (A.1). We assume all codewords are strongly typical, that is, x n ab ∈ T n (P X ), where T n (P X ) denotes the strongly jointly -typical set (see Section 1.2, [23]) based on the distribution P X .
We define the following probabilities of error when the codeword x n ab is transmitted: λ jab = error probability for receiver j in determining (a, b), η kb|a = error probability for eavesdropper k in determining b given a. The following lemma guarantees existence of a certain codebook, which will be used for encoding. Proof. We prove the lemma by a random coding technique. We define the following sum of error probabilities: We show that the average of p e over a random codebook ensemble is small for sufficiently large codeword length n. Then, there must exist at least one codebook such that p e is small for sufficiently large n. For a given distribution P X , we generate codewords x n ab , each uniformly drawn from the set T n (P X ). Index x n ab via a = 1, . . . , 2 nR and b = 1, . . . , 2 nmaxkI(X;Zk) .
Suppose that the codeword x n ab is transmitted and define the following decoding strategies at the receivers and the eavesdroppers.
(1) Receiver j declares that the index pair of x n ab is ( a, b) if there is a unique index pair such that (x n a, b , y n j ) ∈ T n (P XYj ).
(2) Eavesdropper k, given the index a, declares that the index of x n ab is b if there is a unique index such that (x n a, b , z n k ) ∈ T n (P XZ k ), where Z k denotes the enhanced output.
We can compute E C [p e ] by following the standard techniques as in [24,Chapter 14], where E C indicates an average over the random codebook ensemble. We can show that E C p e < , (A.9) for sufficiently large codeword length n, based on the sizes of indices a and b.
Hence there exists one codebook such that for sufficiently large codebook size n p e = j λ j + k η k < . (A.10) This leads to the conclusion that for sufficiently large codebook size n, for j = 1, . . . , J and k = 1, . . . , K.
Based on the codebook that satisfies the property given in Lemma A.2, we define the encoding as follows. We map each message w to a codeword x n wb with b chosen uniformly over the set {1, . . . , 2 nmaxkI(X;Zk) }. Based on Lemma A.2, it is clear that each receiver can decode the message W with small probability of error. For each enhanced eavesdropper, we follow steps similar to those in [10] to obtain the following equivocation rate:

B. Proof of the Converse for Theorem 4
We follow steps that are similar to those given in [25] except for the step of single letter characterization. We include the proof here for the sake of completeness. We consider a code with length n and average error probability P e . The probability distribution we consider is where (a) follows from Fano's inequality, and (b) follows from the chain rule and because conditioning does not increase entropy. We now introduce a random variable Q that is independent of all other random variables in this model, and is uniformly distributed over {1, 2, . . . , n}. Define X = X Q , Y = Y Q , and Z k = Z kQ . It is clear that these random variables satisfy the Markov chain condition Q → X → (Y , Z k ). Using these definitions, (B.3) becomes The bound given in (B.4) is applicable for k = 1, . . . , K, and hence we obtain which completes the proof.

C. Proof of the Converse for Theorem 5
We first prove the following lemma, which gives two useful properties. To prove the converse, we first have the following bound for any ( j, k) pair by referring to [15, Section III]: It is easy to see that the second term is independent of the distribution of X i . The first term is maximized by Gaussian X i if the covariance matrix of X i is fixed to be K X i . This is because h(Y j,i | Z k,i ) is maximized by jointly Gaussian Y j,i and Z k,i for a fixed covariance matrix Q Y j,i Z k,i . Therefore, we have the following bound: where (a) follows from the degradedness assumption and the concavity property given in Lemma C.1, and (b) follows because (1/n) n i=1 K X i Q and from the monotonicity property given in Lemma C.1.