Amplitude Constrained Vector Gaussian Wiretap Channel: Properties of the Secrecy-Capacity-Achieving Input Distribution

This paper studies the secrecy capacity of an n-dimensional Gaussian wiretap channel under a peak power constraint. This work determines the largest peak power constraint R¯n, such that an input distribution uniformly distributed on a single sphere is optimal; this regime is termed the low-amplitude regime. The asymptotic value of R¯n as n goes to infinity is completely characterized as a function of noise variance at both receivers. Moreover, the secrecy capacity is also characterized in a form amenable to computation. Several numerical examples are provided, such as the example of the secrecy-capacity-achieving distribution beyond the low-amplitude regime. Furthermore, for the scalar case (n=1), we show that the secrecy-capacity-achieving input distribution is discrete with finitely many points at most at the order of R2σ12, where σ12 is the variance of the Gaussian noise over the legitimate channel.


Introduction
Consider the vector Gaussian wiretap channel with outputs where X ∈ R n , N 1 ∼ N (0 n , σ 2 1 I n ) and N 2 ∼ N (0 n , σ 2 2 I n ), and with (X, N 1 , N 2 ) being mutually independent. The output Y 1 is observed by the legitimate receiver, whereas the output Y 2 is observed by the malicious receiver. In this work, we are interested in the scenario where the input X is limited by a peak power constraint or amplitude constraint, and assume that X ∈ B 0 (R) = {x : x ≤ R}, i.e., B 0 (R) is an n-ball centered at the origin and of radius R. For this setting, the secrecy capacity is given by where the last expression holds due to the (stochastically) degraded nature of the channel. It can be shown that for σ 2 1 ≥ σ 2 2 the secrecy capacity is equal to zero. Therefore, in the remainder, we assume that σ 2 1 < σ 2 2 . We are interested in studying the input distribution P X that maximizes (3) in the low (but not vanishing) amplitude regime. Since closed-form expressions for secrecy capacity are rare, we derive the secrecy capacity in an integral form that is easy to evaluate. For the scalar case (n = 1), we establish an upper bound on the number of mass points of P X , valid for any amplitude regime. We also argue in Section 2.3 that the solution to the secrecy capacity can shed light on other problems seemingly unrelated to security. The paper also provides a number of numerical simulations of P X and C s , the data for which are made available at [1].

Literature Review
The wiretap channel was introduced by Wyner in [2], who also established the secrecy capacity of the degraded wiretap channel. The results of [2] were extended to the Gaussian wiretap channel in [3]. The wiretap channel plays a central role in network information theory; the interested reader is referred to [4][5][6][7][8] and references therein for a detailed treatment of the topic. Furthermore, for an in-depth discussion on the wiretap fading channel, refer to [9][10][11][12].
In [3], it was shown that the secrecy-capacity-achieving input distribution of the Gaussian wiretap channel, under an average power constraint, is Gaussian. In [13], the authors investigated the Gaussian wiretap channel consisting of two antennas, both at the transmitter and receiver sides, and of a single antenna for the eavesdropper. The secrecy capacity of the MIMO wiretap channel was characterized in [14,15], where the Gaussian input was shown to be optimal. An elegant proof, using the I-MMSE relationship [16], of the optimality of Gaussian input, is given in [17]. Moreover, an alternative approach in the characterization of the secrecy capacity of a MIMO wiretap channel was proposed in [18]. In [19,20], the authors discuss the optimal signaling for secrecy rate maximization under average power constraints.
The secrecy capacity of the Gaussian wiretap channel under the peak power constraint has received far less attention. The secrecy capacity of the scalar Gaussian wiretap channel with an amplitude and power constraint was considered in [21], where the authors showed that the capacity-achieving input distribution P X is discrete with finitely many support points.
The work of [21] was extended to noise-dependent channels by Soltani and Rezki in [22]. For further studies on the properties of the secrecy-capacity-achieving input distribution for a class of degraded wiretap channels, refer to [23][24][25].
The secrecy capacity for the vector wiretap channel with a peak power constraint was considered in [25], where it was shown that the optimal input distribution is concentrated on finitely many co-centric shells.

Contributions and Paper Outline
In Section 2, we introduce the mathematical tools, assumptions, and definitions used throughout the paper. Specifically, in Section 2.1, we introduce the oscillation theorem. In Section 2.2, we give a definition of low-amplitude regimes. Moreover, in Section 2.3, we show how the wiretap channel can be seen as a generalization of point-to-point channels and the evaluation of the largest minimum mean square error (MMSE), both under the assumption of amplitude-constrained input. In Section 2.4, we provide a definition of the Karush-Kuhn-Tucker (KKT) conditions for the wiretap channel.
In Section 3, we detail our main results. Theorem 2 provides a sufficient condition for the optimality of a single hypersphere. Theorem 3 and Theorem 4 give the conditions under which we can fully characterize the behavior ofR n , that is, the radius below which we are in the low-amplitude regime, i.e., the optimal input distribution is composed of a single shell. Furthermore, Theorem 5 gives an implicit and an explicit upper bound on the number of mass points of the secrecy-capacity-achieving input distribution when n = 1.
In Section 4, we derive the secrecy capacity expression for the low-amplitude regime in Theorem 6. We also investigate its behavior when the number of antennas n goes to infinity.
Section 5 extends the investigation of the secrecy capacity beyond the low-amplitude regime. We numerically estimate both the optimal input pmf and the resulting capacity via an algorithmic procedure based on the KKT conditions introduced in Lemma 2. Sections 6-9 provide the proof for Theorem 3 and Theorem 4-6, respectively. Finally, Section 10 concludes the paper.

Notation
We use bold letters for vectors (x) and uppercase letters for random variables (X). We denote by x the Euclidean norm of the vector x. Given a vector x ∈ R n and a scalar a, with a little abuse of notation, we denote a · e 1 + x by a + x , where e 1 = [1, 0, . . . , 0] is the first vector in the standard basis of the Euclidean vector space R n . Given a random variable X, its probability density function (pdf), pmf, and cumulative distribution function are denoted by f X , P X , and F X , respectively. The support set of P X is denoted and defined as We denote by N (µ, Σ) a multivariate Gaussian distribution with mean vector µ and covariance matrix Σ. The pdf of a Gaussian random variable with zero mean and variance σ 2 is denoted by φ σ (·). We denote by χ 2 n (λ) the noncentral chi-square distribution with n degrees of freedom and with noncentrality parameter λ. We represent the n × 1 vector of zeros by 0 n and the n × n identity matrix by I n . Furthermore, we represent by D the relative entropy. The minimum mean squared error is denoted by The modified Bessel function of the first kind of order v ≥ 0 is denoted by The following ratio of the Bessel functions is commonly used in this work: Finally, the number of zeros (counted in accordance with their multiplicities) of a function f : R → R on the interval I is denoted by N(I, f ). Similarly, if f : C → C is a function on the complex domain, N(D, f ) denotes the number of its zeros within the region D.

Oscillation Theorem
In this work, we often need to upper bound the number of oscillations of a function, i.e., its number of sign changes. This is useful, for example, to bound the number of zeros of a function or the number of roots of an equation. To be more precise, let us define the number of sign changes as follows.
Definition 1 (Sign Changes of a Function). The number of sign changes of a function ξ : Ω → R is given by where N {ξ(y i )} m i=1 is the number of sign changes of the sequence {ξ(y i )} m i=1 .
Definition 2 (Totally Positive Kernel). A function f : I 1 × I 2 → R is said to be a totally positive kernel of order n if det [ f (x i , y j )] m i,j=1 > 0 for all 1 ≤ m ≤ n, for all x 1 < · · · < x m ∈ I 1 , and y 1 < · · · < y m ∈ I 2 . If f is a totally positive kernel of order n for all n ∈ N, then f is a strictly totally positive kernel.
In [26], Karlin noticed that some integral transformations have a variation-diminishing property, which is described in the following theorem. Theorem 1 (Oscillation Theorem). Given domains I 1 and I 2 , let p : I 1 × I 2 → R be a strictly totally positive kernel. For an arbitrary y, suppose p(·, y) : I 1 → R is an n-times differentiable function. Assume that µ is a measure on I 2 , and let ξ : I 2 → R be a function with S (ξ) = n. For x ∈ I 1 , define If Ξ : I 1 → R is an n-times differentiable function, then either N(I 1 , Ξ) ≤ n, or Ξ ≡ 0.
The above theorem says that the number of zeros of a function Ξ, which is the output of the integral transformation, is less than the number of sign changes of the function ξ, which is the input to the integral transformation.

Low-Amplitude Regime
In this work, a low-amplitude regime is defined as follows.
One of the main objectives of this work is to characterizeR n (σ 2 1 , σ 2 2 ).

Connections to Other Optimization Problems
The distribution P X R occurs in a variety of statistical and information-theoretic applications. For example, consider the following two optimization problems: max P X : X∈B 0 (R) max P X : X∈B 0 (R) mmse(X|X + N), (11) where N ∼ N (0 n , σ 2 I n ). The first problem seeks to characterize the capacity of the point-topoint channel under an amplitude constraint, and the second problem seeks to find the largest minimum mean squared error under the assumption that the signal has bounded amplitude; the interested reader is referred to [27][28][29] for a detailed background on both problems. Similarly to the wiretap channel, we can define the low-amplitude regime for both problems as the largest R such that P X R is optimal and denote these byR ptp n (σ 2 ) and R MMSE n (σ 2 ). We now argue that bothR ptp n (σ 2 ) andR MMSE n (σ 2 ) can be seen as a special case of the wiretap solution. Hence, the wiretap channel provides an interesting unification and generalization of these two problems.
First, note that the point-to-point solution can be recovered from the wiretap by simply specializing the wiretap channel to the point-to-point channel, that is, Second, to see that the MMSE solution can be recovered from the wiretap, recall that by the I-MMSE relationship [16] we have that max P X : X∈B 0 (R) where Z is standard Gaussian. Now, note that if we choose σ 2 2 = σ 2 1 + , then by the mean value theorem we arrive at max P X : X∈B 0 (R) where lim →0 + o( )/ = 0. Consequently, for a small enough > 0,

KKT Conditions
Let us define the secrecy density for the vector Gaussian wiretap channel as where D(· ·) is the relative entropy. For the scalar case (n = 1), the KKT conditions are necessary and sufficient to ensure that P X is capacity-achieving [21].
Proof. The first part of Lemma 1 was shown in [21]. The proof of (21) goes as follows: where N ∼ N (0, σ 2 2 − σ 2 1 ) and (24) hold by noticing that φ σ 2 (y − x) can be reformulated as the convolution of Gaussian pdfs E[φ σ 1 (y − x − N)]; in (25) we applied the change in variable y → y + N. This concludes the proof.
The convexity of the optimization problem is also guaranteed for the vector wiretap model in (1) with n > 1. Then, the results of Lemma 1 can be extended to the vector case as follows.
where for x ∈ R n and where Proof. This is a straightforward vector extension of Lemma 1.
Thanks to the spherical symmetry of the additive noise distributions and of P X , the secrecy density Ξ(x; P X ) can be expressed as a function of x only. Therefore, we denote the secrecy density in spherical coordinates by Ξ( x ; P X ), and give a rigorous definition in (A9).

A New Sufficient Condition on the Optimality of P X R
Our first main result provides a sufficient condition for the optimality of P X R .

Theorem 2. If
then P X R is secrecy-capacity-achieving.
Proof. Let us consider the equivalent definition of the secrecy density in spherical coordinates (A9). Note that if the derivative of Ξ( x ; P X R ) makes at most one sign change, from negative to positive, then the maximum of x → Ξ( x ; P X R ) occurs at either x = 0 or x = R. From Lemma A1 in the Appendix B, the derivative of Ξ is as given below where Q 2 n+2 is a noncentral chi-square random variable with n + 2 degrees of freedom and noncentrality parameter where W ∼ N (0 n+2 , (σ 2 2 − σ 2 1 )I n+2 ). A calculation related to (33) was erroneously performed in [27]. However, this error does not change the results of [27] as only the sign of the derivative is important and not the value itself. Note that Ξ (0; P X R ) = 0 and that Ξ ( x ; P X R ) > 0 for a sufficiently large x ; in fact, we have where (36) follows from 0 ≤ h n 2 (x) ≤ 1 for x ≥ 0; (37) follows by noticing that R ; and finally, (38) holds by h n 2 (x) ≤ 1. Then, to show that Ξ( x ; P X R ) is maximized in x = R, we need to prove that Ξ ( x ; P X R ) changes sign at most once. To that end, we need Karlin's oscillation theorem presented in Section 2.1. By using (33), the fact that the pdf of a chi-square is a positive defined kernel [26], and Theorem 1, the number of sign changes of Ξ ( x ; P X R ) is upperbounded by the number of sign changes of G σ 1 ,σ 2 ,R,n (y) = M 2 (y) − M 1 (y), (39) for y ∈ R + . Note that G σ 1 ,σ 2 ,R,n (y) ≥ − 1 where the inequality in (40) follows from h n 2 (x) ≥ 0 for x ≥ 0, and (41) follows from h n 2 (x) ≤ x n for x ≥ 0 and n ∈ N. We conclude by noting that (41) is nonnegative, hence has no sign change, for for all y ∈ R + , thus guaranteeing that P X R is secrecy-capacity-achieving.

Remark 1.
As a consequence of the proof of Theorem 2, for any R ≥ 0, σ 2 ≥ σ 1 ≥ 0 and n ∈ N, if G σ 1 ,σ 2 ,R,n (y) has at most one sign change, then P X R is secrecy-capacity-achieving if, and only if, for all Because of the difficulty in evaluating analytical properties of (39), proving that G σ 1 ,σ 2 ,R,n has at most one sign change does not seem easy. However, in Appendix A, we show via extensive numerical evaluations that G σ 1 ,σ 2 ,R,n changes sign at most once for any n, R, σ 1 , σ 2 that we tried.

Characterizing the Low-Amplitude Regime
Let us characterize the low-amplitude regime as follows.
where Z ∼ N (0 n , I n ). If G σ 1 ,σ 2 ,R,n of (39) has at most one sign change, the input X R is secrecycapacity-achieving if, and only if, R ≤R n (σ 2 1 , σ 2 2 ), whereR n (σ 2 1 , σ 2 2 ) is given as the solution of Remark 2. Note that (45) always has a solution. To see this, observe that f (0) = 1 The solution to (45) needs to be found numerically. To avoid any loss of accuracy in the numerical evaluation of h v (x) for large values of x, we used the exponential scaling provided in the MATLAB implementation of I v (x). Since evaluating f (R) is rather straightforward and not time-consuming, we opted for a binary search algorithm.
In Table 1, we show the values ofR n (1, σ 2 2 ) for some values of σ 2 2 and n. Moreover, we report the values ofR ptp n (1) andR MMSE n (1) from [27] in the first and the last row, respectively. As predicted by (12), we can appreciate the close match of theR ptp n (1) row with the one of R n (1, 1000). Similarly, the agreement between theR MMSE n (1) row and theR n (1, 1.001) row is justified by (16).

Large n Asymptotics
We now use the result in Theorem 3 to characterize the asymptotic behavior of R n (σ 2 1 , σ 2 2 ). In particular, it is shown thatR n (σ 2 1 , σ 2 2 ) increases as √ n.

Scalar Case (n = 1)
For the scalar case, the optimal input distribution P X is discrete. In this regime, we provide an implicit and an explicit upper bound on the number of support points of the optimal input probability mass function (pmf) P X . Theorem 5. Let Y 1 and Y 2 be the secrecy-capacity-achieving output distributions at the legitimate and malicious receivers, respectively, and let Moreover, an explicit upper bound on the number of support points of P X is obtained by using where ρ = (2e + 1) 2 σ 2 +σ 1 The upper bounds in Theorem 5 are generalizations of the upper bounds on the number of points presented in [30] in the context of a point-to-point AWGN channel with an amplitude constraint. Indeed, if we let σ 2 → ∞, while keeping σ 1 and R fixed, then the wiretap channel reduces to the AWGN point-to-point channel.
To find a lower bound on the number of mass points, a possible approach consists of the following steps: where the above uses the nonnegativity of the entropy and the fact that entropy is maximized by a uniform distribution. Furthermore, by using a suboptimal uniform (continuous) distribution on [−R, R] as an input and the entropy power inequality, the secrecy capacity is lower-bounded by Combining the bounds in (55) and (56), we arrive at the following lower bound on the number of points: At this point, one needs to determine the behavior of I(X ; Y 2 ). A trivial lower bound on |supp(P X )| can be found by lower-bounding I(X ; Y 2 ) by zero. However, this lower bound on |supp(P X )| does not grow with R, while the upper bound does increase with R. A possible way of establishing a lower bound that increases in R is by showing that However, because not much is known about the structure of the optimal input distribution P X , it is not immediately evident how one can establish such an approximation or whether it is valid.

Secrecy Capacity Expression in the Low-Amplitude Regime
The result in Theorem 3 can also be used to establish the secrecy capacity for all R ≤R n (σ 2 1 , σ 2 2 ), as is performed next.

Large n Asymptotics
It is important to note that asR n (σ 2 1 , σ 2 2 ) grows as √ n, according to Theorem 4, when we keep R constant and increase the number of antennas to infinity, the low-amplitude regime becomes the only regime. The next theorem characterizes the secrecy capacity in this 'massive-MIMO' regime (i.e., where R is fixed and n goes to infinity).

Theorem 7.
Consider the expression in (58) and fix R ≥ 0 and σ 2 1 ≤ σ 2 2 , then Remark 3. The result in Theorem 7 is reminiscent of the capacity in the wideband regime [31,Ch. 9], where the capacity increases linearly in the signal-to-noise ratio. Similarly, Theorem 7 shows that in the large antenna regime, the secrecy capacity grows linearly with the difference in the single-to-noise ratio between the legitimate user and the eavesdropper.
In Theorem 7, R was held fixed. It is also interesting to study the case when R is a function of n. Specifically, it is interesting to study the case when R = c √ n for some coefficient c.
Notice that (60) is equivalent to the secrecy capacity of a vector Gaussian wiretap channel subject to an average power constraint. Gaussian wiretap channels under average power constraints have been extensively investigated [3,32] and, for an average power constraint E[ X 2 ] ≤ P, the resulting secrecy capacity is given by [3] C G (σ 2 1 , σ 2 2 , P, n) = n 2 log 1 + P/σ 2 Thus, the result in (60) can be restated as In other words, for the regime considered in Theorem 8, for a large enough n the secrecy capacity under the amplitude constraint R n = c √ n behaves as the secrecy capacity under the average power constraint c 2 .

Beyond the Low-Amplitude Regime
To evaluate the secrecy capacity and find the optimal distribution P X beyondR n we rely on numerical estimations. We remark that, as pointed out in [25], the secrecy-capacityachieving distribution is isotropic and consists of finitely many co-centric shells. Keeping this in mind, we can find the optimal input distribution P X by just optimizing over P X with X ≤ R.

Numerical Algorithm
In the case of scalar Gaussian wiretap channels, the secrecy capacity and the optimal input pmf can be estimated via the algorithm described in [33], i.e., a numerical procedure that takes inspiration from the deterministic annealing algorithm sketched in [34]. Let us denote byĈ s (σ 2 1 , σ 2 2 , R, n) the numerical estimate of the secrecy capacity, and byP X , the estimate of the optimal pmf on the input norm. To numerically evaluateĈ s (σ 2 1 , σ 2 2 , R, n) andP X , we extend to the vector case the algorithm in [33]. Our extension is defined in Algorithm 1. The input parameters of the main function are the noise variances σ 2 1 and σ 2 2 , the radius R, the vectors ρ and p being, respectively, the mass points positions and probabilities of a tentative input pmf, the number of iterations in the while loop N c , and finally, a tolerance ε to set the precision of the secrecy capacity estimate. (ρ, p) ← ADD-POINT(ρ, p) 12: end if 13: until valid = True 14:P X ← (ρ, p) 15:Ĉ s σ 2 1 , σ 2 2 , R, n ← I s ( X ;P X ) 16: returnP X ,Ĉ s σ 2 1 , σ 2 2 , R, n 17: end procedure At its core, the numerical procedure iteratively refines its estimate of P X by running a gradient ascent algorithm to update the vector ρ and a variant of the Blahut-Arimoto algorithm [35] to update p.
The GRADIENT ASCENT procedure uses the secrecy information as the objective function and stops either when ρ has reached convergence or at a given maximum number of iterations. Let us denote by I s ( X ; P X ) the secrecy information as a function of the input norm. Notice that, given a tentative pmfP X of mass points ρ, probabilities p, and |supp(P X )| = K, we have where Ξ(t;P X ) is the secrecy density, with respect to the input norm, defined in (A9) and where p i and ρ i are, respectively, the ith element of p and ρ. Then, the GRADIENT ASCENT updates are given by where the partial derivatives are defined in Appendix E and α is the step size in the gradient ascent. We remark that, to ensure convergence to a local maximum, we use the gradient ascent algorithm in a backtracking line search version [36]. By suitably adjusting the step size α at each iteration, the backtracking line search version guarantees us that each new update of ρ provides a nondecreasing associated secrecy information, compared to the previous update of ρ.
The BLAHUT-ARIMOTO function runs a variant of the Blahut-Arimoto algorithm. For the scalar case, an example of the Blahut-Arimoto optimization, applied to wiretap channels, is given in [37]. Similar results can be extended to the case of vector wiretap channels. Given the current probabilities p i 's, the updates are obtained by evaluating and finally, by normalizing each p i and assigning them to the entries of the vector p Similarly to GRADIENT ASCENT, the BLAHUT-ARIMOTO procedure stops either when the values of p have reached a stable convergence or after a set number of updates.
Since the joint optimization of ρ and p is not numerically feasible, we need to reiterate both the BLAHUT-ARIMOTO and the GRADIENT ASCENT procedures a given number of times, namely N c . The parameter N c is chosen empirically in such a way that ρ and p become fairly stable, and therefore we can expect to have reached joint convergence for both of them.
Then, the KKT VALIDATION procedure ensures that the values of ρ and p are indeed close to the optimal ones. We check the optimality ofP X by verifying whether the KKT conditions in Lemma 2 are satisfied. Since the algorithm has to verify the KKT conditions numerically, i.e., with finite precision, we find it more convenient to check the negated version of (28), where a tolerance parameter ε is introduced that trades off accuracy with computational burden. Specifically,P X is not an optimal input pmf if any of the following conditions are satisfied: Note that in (67), in place of the secrecy capacity C s (σ 2 1 , σ 2 2 , R, n), which is unknown, we used the secrecy information given by the tentative pmfP X , i.e., I s ( X ;P X ). Condition (67a) is derived by negating (28a): there exists a t ∈ supp(P X ), such that Ξ(t;P X ) is ε-away from the secrecy information I s ( X ;P X ). Condition (67b) is the negated version of (28b): there exists a t ∈ [0, R] such that Ξ(t;P X ) is at least ε-larger than the secrecy information I s ( X ;P X ). With some abuse of notation, we refer to (67) as to the ε-KKT conditions. If the tentative pmfP X does not pass the check of the ε-KKT conditions, then the algorithm checks whether a new point has to be added to the pmf.
The ADD POINT procedure evaluates the position of the new mass point The point ρ new is appended to the vector ρ and the probabilities p are set to be equiprobable. The whole procedure is repeated until KKT VALIDATION gives a positive outcome, and at that point the algorithm returnsP X as the optimal pmf estimate andĈ s (σ 2 1 , σ 2 2 , R, n) as the secrecy capacity estimate.

Remark 4.
In this work, we focus on the secrecy capacity and on the secrecy-capacity-achieving input distribution. However, it is possible to study other points of the rate-equivocation region of the degraded wiretap Gaussian channel by suitably changing the KKT conditions, as reported in [21], Equations (33) and (34). With the due modifications, the proposed optimization algorithm can find the optimal input distribution for any point of the rate-equivocation region.

Numerical Results
In Figure 2, we show with black dots the numerical estimateĈ s (σ 2 1 , σ 2 2 , R, n) versus R, evaluated via Algorithm 1, for σ 2 1 = 1, σ 2 2 = 1.5, 10, n = 2, 4, and tolerance ε = 10 −6 . For the same values of σ 2 1 , σ 2 2 , and n we also show, with the red lines, the analytical lowamplitude regime secrecy capacity C s (σ 2 1 , σ 2 2 , R, n) versus R from Theorem 6. In addition, we show with blue dotted lines the secrecy capacity under the average power constraint E X 2 ≤ R 2 : where the inequality follows by noting that the average power constraint E X 2 ≤ R 2 is weaker than the amplitude constraint X ≤ R. Finally, the dashed vertical lines showR n , i.e., the upper limit of the low-amplitude regime, for the considered values of σ 2 1 , σ 2 2 , and n.
In Figure 3, we consider discrete values for R and for each value of R we plot the corresponding estimated pmfP X , evaluated via Algorithm 1, for σ 2 1 = 1, σ 2 2 = 1.5, n = 2, 8, and tolerance ε = 10 −6 . The figure shows, at each R, the normalized amplitude of support points in the estimated pmf, while the size of the circles qualitatively shows the probability associated with each support point. Similarly, Figure 4 shows the evolution of the pmf estimate for σ 2 1 = 1, σ 2 2 = 10, n = 2, 8, and ε = 10 −6 . It is interesting to notice how in both Figures 3 and 4 when a new mass point is added to the pmf, it appears in zero. Moreover, the mass point of radius R always seems to be optimal.
Finally, Figure 5 shows the output distributions of the legitimate user and of the eavesdropper in the case of σ 2 1 = 1, σ 2 2 = 10, n = 2, and for two values of R. At the top of the figure, the distributions are shown for R = 2.25, which is a value close toR 2 (1, 10). At the bottom of the figure, the distributions are shown for R = 7.5. For both values of R, the legitimate user sees an output distribution where the co-centric rings of the input distribution are easily distinguishable. On the other hand, as expected, the output distribution seen by the eavesdropper is close to a Gaussian.

Proof of Theorem 3 Estimation Theoretic Representation
By Remark 1, if G σ 1 ,σ 2 ,R,n has at most one sign change, P X R is secrecy-capacity-achieving if, and only if, for all x = R We seek to re-write the condition (70) in the estimation theoretic form. To that end, we need the following representation of the relative entropy [38]: where and where Another fact that will be important for our expression is see, for example [27], for the proof. Next, using (71) and (75) note that for any x = R we have that for i ∈ {1, 2} where (77) follows from Moreover, for x = 0, it holds Now, note that by using the definition of Ξ(x; P X R ) in (30), (78), and (81) we have that for and Consequently, the necessary and sufficient condition in Theorem 2 can be equivalently written as NowR n (σ 2 1 , σ 2 2 ) will be the largest R that satisfies (86), which concludes the proof of Theorem 3.

Proof of Theorem 4
The objective of the proof is to understand how the condition in (45) behaves as n → ∞.
To study the large n behavior, we need to the following bounds on the h ν [39,40] where Now let R = c √ n for some c > 0. The goal is to understand the behavior of as n goes to infinity. First, let and note that where (92) follows from the dominated convergence theorem, and (93) follows since, by the law of large numbers we have, almost surely, Second, let where (97) follows from the dominated convergence theorem and where (98) follows since, by the strong law of large numbers we have, almost surely, Combining (93) and (98) with (45), we arrive at where (104) follows from using (21); (105) follows from applying Karlin's oscillation Theorem 1 and the fact that the Gaussian pdf is a strictly totally positive kernel, which was shown in [26]; (107) is proved in Lemma A3 in the Appendix B; and (108) follows because g(·) is an analytic function in (−L, L). The implicit upper bound (49) of Theorem 5 follows from (107) and (108).

Explicit Upper Bound
The key to finding an explicit upper bound on the number of zeros will be the following complex-analytic result. [41]). Let L, s, t be positive numbers, such that s > 1. For the complex valued function f = 0, which is analytic on |z| < (st + s + t)L, its number of zeros N(D L , f ) within the disk D L = {z : |z| ≤ L} satisfies
Proof. Starting from (107), we can write where in step (114), we applied Rolle's theorem, and in step (115), we used the fact that multiplying by a strictly positive function (i.e., σ 2 1 f Y 1 ) does not change the number of zeros. The first derivative of g can be computed as follows: where in the last step, we used the well-known Tweedie's formula (see for example [42,43]): An alternative expression for the first term in the right-hand side (RHS) of (116) is as follows: where f N (n) = φ √ σ 2 2 −σ 2 1 (n). The proof is concluded by letting To apply Tijdeman's number of zeros Lemma, upper and lower bounds to the maximum module of the complex analytic extension of h over the disk D L = {z : |z| ≤ L} are proposed in Lemmas A4 and A5 in the Appendix B. Using those bounds, we can provide an upper bound on the number of mass points as follows: where (124) follows because extending to a larger domain can only increase the number of zeros; (125) follows from the Tijdeman's Number of Zeros Lemma; (126) follows from choosing s = e and t = 1 and using bounds in Lemmas A4 and A5; (128) follows from using the value of L in (A38); (129) using the bound (a + b) 2 ≤ 2(a 2 + b 2 ) and defining (131o) and (130) follows from the fact that the b 1 , b 3 , b 4 , and b 6 coefficients do not depend on R and the fact that the coefficients b 2 , b 5 , and b 4 , while they do depend on R through C s , do not grow with R. The fact that C s does not grow with R follows from the bound in (69). Finally, the explicit upper bound on the number of support points of P X in (52) is a consequence of (130).

Proof of Theorem 6
Using the KKT conditions in (28), we have that for x = [R, 0, . . . , 0] where the last expression was computed in (83). This concludes the proof.

Conclusions
This paper has focused on the secrecy capacity of the n-dimensional vector Gaussian wiretap channel under the peak power (or amplitude constraint) in a so-called low (but not vanishing) amplitude regime. In this regime, the optimal input distribution P X R is supported on a single n-dimensional sphere of radius R. The paper has identified the largestR n , such that the distribution P X R is optimal. In addition, the asymptotic ofR n has been completely characterized as dimension n approaches infinity. As a by-product of the analysis, the capacity in the low-amplitude regime has also been characterized in a more or less closed form. The paper has also provided a number of supporting numerical examples. Implicit and explicit upper bounds have been proposed on the number of mass points for the optimal input distribution P X in the scalar case with n = 1.
There are several interesting future directions. For example, one interesting direction would be to determine a regime in which a mixture of a mass point at zero and P X R is optimal. It would also be interesting to establish a lower bound on the number of mass points in the support of the optimal input distribution when n = 1. We note that such a lower bound was obtained for a point-to-point channel in [30]. We finally remark that the extension of the results of this paper to nondegraded wiretap channels is not trivial and also constitutes an interesting but ambitious future direction.
Author Contributions: A.F., L.B. and A.D. contributed equally to this work. All authors have read and agreed to the published version of the manuscript. Part of this work was presented at the 2021 IEEE Information Theory Workshop [44], at the 2022 IEEE International Symposium on Information Theory [45], at the 2022 IEEE International Mediterranean Conference on Communications and Networking [33], and in the PhD dissertation in [46].
Funding: This research received no external funding.

Institutional Review Board Statement: Not applicable
Data Availability Statement: Datasets for the numerical results provided in this work are available at [1].

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix B. Derivative of the Secrecy-Density
Lemma A1. The derivative of the secrecy density for the input P X R is where Q 2 n+2 is a noncentral chi-square random variable with n + 2 degrees of freedom and noncentrality parameter where W ∼ N (0 n+2 , (σ 2 2 − σ 2 1 )I n+2 ).
Proof. We start with the secrecy density expressed in spherical coordinates. A quick way to obtain the information densities in this coordinate system is to note that: where (A5) holds by [47], Lemma 6.17, and by independence between Y i and Y i Y i ; the term h λ (·) is a differential entropy-like quantity for random vectors on the n-dimensional unit sphere ( [47], Lemma 6.16); (A6) holds because Y i Y i is uniform on the unit sphere and thanks to [47], Lemma 6.15; the term Γ(z) is the gamma function; and in (A7) we have N i ∼ N (0 n , I n ). It is now required to write the secrecy density as follows: for j ∈ {1, 2}. The term f χ 2 n (λ) (y) is the noncentral chi-square pdf with n degrees of freedom and noncentrality parameter λ.
Given two values ρ 1 , ρ 2 with ρ 1 > ρ 2 , write where we have integrated by parts and where F χ 2 n (λ) (y) is the cumulative distribution function of χ 2 n (λ). Now notice that , the integrand function in (A13) is always positive. We can introduce an auxiliary output random variable Q j , for j ∈ {1, 2}, with pdf for y > 0, to rewrite (A12) as follows: We evaluate the derivative in (A15) as: where, in (A16), we used in (A17), we used the relationship and (A20) follows from the recurrence relationship Putting together (A15) and (A20), we find We are now in the position to compute the derivative of the information density as where The final result is obtained by letting and by specializing the result to the input P X R .

Appendix C. Proof of Theorem 7
To study the large n behavior, we need the following bounds on the function h ν [39,40]: where (A93) follows from the dominated convergence theorem, since |h ν | ≤ 1; (A94) follows from using (A90); (A96) follows from using the strong law of large numbers to note that (A98) ∂ ∂ρ l I s ( X ; P X ) =