Energy optimization for distributions on the sphere and improvement to the Welch bounds

For any Borel probability measure on $\mathbb{R}^n$, we may define a family of eccentricity tensors. This new notion, together with a tensorization trick, allows us to prove an energy minimization property for rotationally invariant probability measures. We use this theory to give a new proof of the Welch bounds, and to improve upon them for collections of real vectors. In addition, we are able to give elementary proofs for two theorems characterizing probability measures optimizing one-parameter families of energy integrals on the sphere. We are also able to explain why a phase transition occurs for optimizers of these two families.


Introduction
Amongst all Borel probability measures µ on R n having the same radial distribution, we seek a minimizer for the energy integral In this paper, we will introduce a tensorization trick, thereby proving that the integral is minimized by the rotationally invariant measure, µ rot . More precisely, for any integer k, we define the k-th eccentricity tensor of a measure µ. The gap between I k (µ) and I k (µ rot ) is then given by the squared Euclidean norm of this tensor. Specializing to Borel probability measures on the sphere, we see that (1) is minimized by the uniform measure. Moreover, we may also adapt the proof to obtain an analogous result for the uniform measure on the sphere in C n . These facts have several interesting applications, the first of which concerns the well-known Welch bounds in the compressed sensing literature. Using the complex case of our result, we recover the original Welch bounds, while using the real case, we are able to improve upon them for collections of real vectors. In our opinion, this proof is more illuminating than the existing ones. It shows one view the Welch bounds as saying that the average cross-correlation of signal sets cannot beat that of the uniform distribution.
Next, we are able to obtain new proofs of Björck's theorem from the 1950s and the recent theorem by Bilyk-Dai-Matzke. These theorems characterize optimizers of two one-parameter families of energy integrals, and were proved using methods from potential theory and spherical harmonics. Our methods have the benefit of being more elementary. Furthermore, our proof scheme for both theorems is very similar, and sheds light on the phase transition phenomenon discussed in [2]. Indeed, we are able to show why the phase transition occurs, and why it happens for different parameter values for the two families.
The plan of the rest of this paper is as follows. In Section 2, we define the eccentricity tensors and use the tensorization trick to prove the energy minimization property of rotationally invariant measures. In Section 3, we discuss the Welch bounds, show how they may be improved, and present some consequences of this improvement. In Section 4, we show how our results imply the two theorems on energy optimization on the sphere, and discuss their relevance to the phase transition phenomenon.

Eccentricity tensors and the tensorization trick
In this section, we shall introduce the tensorization trick, define eccentricity tensors, and prove that rotationally invariant measures minimize (1). For notational as well as intuition purposes, however, it is more convenient to work with random vectors than with measures. We hence do so for the rest of this paper, being careful to assert the independence of collections of random vectors where necessary. The tensorization trick is to write the integral (1) as the squared norm of the k-th moment tensor of µ. Notation 2.1. Let X be a random vector in R n . For any positive integer k, let M k X := EX ⊗k denote its k-th moment tensor if all entries are finite.
Recall the following fact from linear algebra. For any positive integer k, we may identify the k-th tensor product T k (R n ) = R n ⊗ · · · ⊗ R n with R n k by picking as a basis the vectors {e i1 ⊗ e i2 ⊗ · · · ⊗ e i k } 1≤i1,...i k ≤n . With this choice, the Euclidean inner product between any two pure tensors u 1 ⊗ · · · ⊗ u k and v 1 ⊗ · · · ⊗ v k can be written as In particular, for power tensors u ⊗k and v ⊗k , we have the formula Now let X and Y be two independent random vectors. Equation (2) allows us to rewrite the k-th moment of their inner product as an inner product between their k-th moment tensors. Namely, we have For independent copies X, X ′ of the same random vector having distribution µ, M k We next introduce the notion of the rotation symmetrization of a random vector.
Definition 2.2. For any random vector X in R n , let X rot denote a random vector that is independent of X, has the same radial distribution as X, and whose distribution is rotationally invariant. We call X rot the rotational symmetrization of X.
Comparing the moment tensors of a random vector and those of its rotational symmetrization give rise to what we shall call eccentricity tensors. Definition 2.3. Let X be a random vector in R n with finite moments of all orders. For any positive integer k, define its k-th eccentricity tensor to be Since X d = X rot if and only if X is rotationally invariant, we see that the eccentricity tensors of X are quantitative measures of how far its distribution is from being rotationally invariant. This interpretation is further supported by the following observation. Lemma 2.4 (Orthogonality). Let X be a random vector in R n with finite moments of all orders. Its eccentricity tensors are orthogonal to the moment tensors of its rotational symmetrization. In other words, for any positive integer k, Proof. Let Q be a random orthogonal matrix chosen according to the Haar measure on O(n). For any fixed vector v ∈ R n , Qv is uniformly distributed on the sphere of radius v 2 , so if Y is any random vector independent of Q, applying Q to Y preserves its radial distribution but makes QY rotationally invariant. Now choose Q to be independent of X and X rot . Our previous discussion implies that We use this to compute where X ′ rot is an independent copy of X rot . We may then apply identities (3) and (4) to rewrite the above equation as Subtracting the right hand side from the left hand side gives (6), from which (7) is an immediate corollary.
The minimization of the integral (1) by rotationally invariant measures is then an easy consequence of the previous lemma.
Theorem 2.5. Let X be a random vector in R n with finite moments of all orders. Then a) (Minimization) If X ′ is an independent copy of X, and X rot , X ′ rot are independent copies of its rotational symmetrization, we have for any positive integer k. b) (Uniqueness) Furthermore, if equality holds in (10) for all k and we further assume that X has a subexponential distribution 1 , then X is rotationally invariant.
Proof. Using identity (4), we rewrite the first claim as , and this follows immediately from equation (7).
If equality holds for all positive integers k, then by (7), E k X = 0 for all k, implying that X and X rot have the same moment tensors of all orders. By our assumption on the distribution of the norm and Lemma 2.6 to come, X and X rot have the same distribution.
Lemma 2.6. Let X be a subexponential random vector in R n . Then the distribution of X is determined by its moment tensors.
Proof. Let K = X ψ1 denote the subexponential norm of X. We then have the following moment growth condition [7]: In particular, every marginal is integrable, so the characteristic function φ X (v) := Ee i X,v is continuous at v = 0. This means that X is determined by its characteristic function [4]. Next, condition (11) implies that for each v ∈ S n−1 , the function t → Ee it X,v can be written as a power series with coefficients E X,v k k!
Corollary 2.7. Let θ have the uniform distribution on the sphere S n−1 , and let X be any random vector taking values on the sphere. Then for any positive integer k. Furthermore if equality holds for all k, X has the uniform distribution.
Proof. The inequality and the characterization statement follows immediately from Theorem 2.5. The computation for Eθ k 1 is the content of the next lemma. Lemma 2.8 (Moments of spherical marginals). Let θ be uniformly distributed on the sphere S n−1 . Then for any unit vector v ∈ S n−1 and any positive integer k, we have Proof. There are several ways to prove this identity. We shall prove this by computing gaussian integrals. Let γ and g denote standard gaussians in 1 dimension and n dimensions respectively. Then using the rotational invariance of g, we have Rearranging gives We then compute where ω n is the volume of the sphere S n−1 . It is well known that Substituting these back into (14) gives This yields the denominator in (13). A similar calculation for Eγ 2k yields the numerator.

Applications to dictionary incoherence and the Welch bounds
Given a collection of m unit vectors Z = {z 1 , z 2 , . . . , z m } in C n , we are often interested in the quantity If we think of the vectors as dictionary elements, then c max measures the incoherence or maximum crosscorrelation of the dictionary. It is well known in the compressed sensing literature that the larger the value of c max , the worse the collection Z performs when we try to recover a sparse representation of a vector as a linear combination of the z j 's [6]. As such, it is an important question in the design of communication systems to know how well we can do theoretically, and how we may find collections that achieve the theoretical minimum.
In 1974, Welch gave a family of lower bounds on c max in terms of m and n.
Theorem 3.1 (Welch, 1974 [8]). Let Z and c max be defined as above. Then for each positive integer k, we have Welch proved this theorem by bounding the average cross-correlation.
By separating the diagonal terms from the sum and rearranging the summands, it is easy to see how (16) implies (15). Welch's original proof of (16) was combinatorial in nature. In 2012, Datta et al. provided a geometric proof based on examining the Gram matrix associated to Z and dimension counting. Both arguments are agnostic to whether the vectors are real or complex, and as far as we know, there is no attempt in the literature to investigate whether the bound can be improved if we further assume that the vectors are real.
Using the energy minimization property of rotationally invariant distributions, we are able to show that this is indeed the case. .
we see that the new bound (17) is equal to the old one (16) for k = 1, and is strictly larger for k > 1.
Proof. Let X be uniformly distributed on the set {x 1 , x 2 , . . . , x m }. Corollary 2.7 applies and we have E X, X ′ 2k ≥ 1 · 3 · · · (2k − 1) n · (n + 2) · · · (n + 2k − 2) for any positive integer k. On the other hand, we also have Let us illustrate the improved bound by revisiting an example from [5]. Although the improved bounds do not hold for complex collections of vectors, we are nonetheless able to recover the original Welch bounds using the same circle of ideas and making a few adjustments. Definition 3.6. For any random vector X in C n , let X uni denote a random vector that is independent of X, has the same radial distribution as X, and whose distribution is invariant under unitary transformations. We call X uni the unitary symmetrization of X.
With this definition, we can state the following complex version of Theorem 2.5.
Theorem 3.7. Let X be a random vector in C n with finite moments of all orders. Then if X ′ is an independent copy of X, and X uni , X ′ uni are independent copies of its unitary symmetrization, we have for any positive integer k.
Proof. By considering the moment tensors M 2k X := EX ⊗k ⊗ (X * ) ⊗k , we may define a complex version of eccentricity tensors. Next, we replace Q ∼ Haar(O(n)) with U ∼ Haar(U (n)) in Lemma 2.4 to prove an orthogonality result analogous to (7). With this result, (18) follows immediately.
We are now able to complete the proof of (16) with the help of the following version of Lemma 2.8.

Lemma 3.8 (Moments of complex spherical marginals).
Let θ be uniformly distributed on the complex sphere S 2n−1 ⊂ C n . Then for any unit vector v ∈ S 2n−1 and any positive integer k, we have Proof. Let γ and g denote standard complex gaussians in 1 dimension and n dimensions respectively. Then Since |γ| is the norm of a two-dimensional standard real gaussian, while g 2 is the norm of a 2n-dimensional standard real gaussian, (19) follows from the calculations of gaussian integrals done in Lemma 2.8.  [5] characterized sets Z achieving equality in the k-th Welch average cross-correlation bound (16) as those for which Z (k) forms a tight frame for Sym k (H). Since our results show that this bound is not tight when H is a real Hilbert space and k > 1, we have proved that tight frames of the form Z (k) do not exist for symmetric spaces of real tensors with k > 1. Indeed, this also holds true for generalized frames as defined by the same authors.

Applications to energy optimization on the sphere
In a recent paper [2], Bilyk et al. presented a theorem characterizing probability measures minimizing geodesic distance energy integrals. This is an analogue of Björck's theorem from 1956 which characterized probability measures minimizing energy integrals based on Euclidean distance [3]. Björck proved his theorem by considering Riesz potentials, while Bilyk et al. proved their result using spherical harmonic expansions and the hermisphere Stolarsky principle. In this section, we show how to derive both results using the tensorization trick and the energy minimization property of the uniform distribution on the sphere.
where d(x, y) denotes the geodesic distance between x and y. The maximizers of this energy integral over Borel probability measures on S n−1 can be characterized as follows: is maximized if and only if µ = 1 2 (δ p + δ −p ), i.e. the mass is supported equally by two antipodal points.
Proof. Observe that the geodesic distance d(x, y) is simply the angle between x and y. As such, we have d(x, y) = arccos( x, y ). We may thus rewrite (20) as where X and X ′ are independent random vectors with distribution µ.
Let us start by proving part b). It is an exercise to show that the even derivatives of arccos vanish at 0, while the odd derivatives are strictly negative at 0. For −1 < t < 1 may hence write arccos as its Taylor series where a 2k+1 > 0 for all k. We claim that in fact, the above formula holds for all t in the closed interval [−1, 1], and furthermore that the series is absolutely convergent. This is the content of Lemma 4.2 to come.
As a result, we may use Fubini to interchange sums and expectations, thereby writing Since E X, X ′ 2k+1 ≥ 0 for each k by identity (4), this last expression is maximized if and only if E X, X ′ 2k+1 = 0 for every non-negative integer k. By the same identity, this happens if and only if all odd moments of X are zero, i.e. if and only if X is centrally symmetric. This proves the case δ = 1. Now let 0 < δ < 1. We claim that for −1 ≤ t ≤ 1, we may write where a k > 0 for all k > 0, and that the series is absolutely convergent. Lemma 4.3 (to come) tells us that the Taylor series of arccos(t) δ has this form, which combined with Lemma 4.2 proves this claim. As such, we may again use Fubini to write By identity (4), E X, X ′ k ≥ 0 for any distribution, while by Corollary 2.7, the uniform measure uniquely minimizes all of these moments simultaneously. As such, we see that it is the unique maximizer of G δ (µ). The remaining case where δ > 1 is easy and does not require a proof using our methods. For completeness, we repeat the proof given by the original authors [2]. Since d(x, y) ≤ π 2 , we have The first inequality is tight whenever d(x, y) only takes the values π 2 and 0, while by part b), the second inequality becomes equality when µ is centrally symmetric. Together, these imply that µ = 1 2 (δ p + δ −p ) for some p ∈ S n−1 . Proof. By subtracting off polynomials and negating the function if necessary, we may assume without loss of generality that the Taylor series for f (t) is given by ∞ k=0 c k t k where c k ≥ 0 for all k. By the monotone convergence theorem, together with our assumptions on f , we have As such, the series ∞ k=0 c k is absolutely convergent, and the Taylor series is also absolutely convergent on the closed interval [−1, 1]. Finally, we can apply the dominated convergence theorem to see that f (−1) = ∞ k=0 c k (−1) k . Proof. Let F (t) = f (t) α . By induction, one may observe that for any positive integer k, F (k) (t) is a sum of 2 k−1 terms of the form where 1 ≤ j ≤ k, and n = (n 0 , n 1 , . . . , n j−1 ) is a vector of positive integers. If there is some index i such that f (ni) (0) = 0, then g n (0) = 0. Otherwise, j−1 i=0 f (ni) (0) is a product of j negative numbers and so has sign (−1) j . On the other hand, our assumption on α imply that is a product of one positive number and j − 1 negative numbers, and so has sign (−1) j−1 . As such, g n (0) ≤ 0.
Finally, notice that F (k) (0) always contains the term Since we have assumed that f ′ (0) < 0, this term is strictly negative. As such, F (k) (0) is also negative, as was to be shown.
In the course of proving the previous theorem, we have in fact proved the following more general result.
where a k ≥ 0 for all k > 0. Then the energy integral is maximized over all Borel probability measures on S n−1 by the uniform measure. Furthermore, if a k > 0 for all k > 0, then the maximizer is unique.
Let us see how we may apply this more general theorem to recover Björck's original result.
(2) δ = 2: E δ (µ) is maximized if and only if the center of mass of µ is at the origin.
Proof. We rewrite (26) as where X and X ′ are independent random vectors with distribution µ. The easy case δ > 2 is proved exactly as in Theorem 4.1. The case δ = 2 is also clear, for we may write X − X ′ 2 = 2 − 2 X, X ′ , and by identity (4), E 2 (µ) = 2 − E X, X ′ = 2 − EX 2 2 . This is maximized if and only if EX = 0.
For 0 < δ < 2, we set f (t) = 2 − 2t and F (t) = f (t) δ/2 . Then f and α = δ/2 satisfy the hypotheses of Lemma 4.3, so F (k) (0) < 0 for all positive integers k. This, together with Lemma 4.2, implies that F satisfies the hypothesis of Theorem 4.4. Since E δ (µ) = S n−1 S n−1 (2 − 2 x, y ) δ/2 dµ(x)dµ(y) = S n−1 S n−1 F ( x, y )dµ(x)dµ(y), we can conclude that E δ (µ) is uniquely maximized by the uniform measure. Remark 4.6. In their paper [2], Bilyk et al. remarked that while the Euclidean and geodesic distances are both metrics on the sphere, the phase transition for the behavior of their energy integrals is different. In the Euclidean case, Björck's theorem shows that it occurs at δ = 2, while in the geodesic case, Bilyk et al.'s theorem shows that it occurs at δ = 1. This peculiar phenomenon is explained by our unified proof of both results.
In both cases, the existence of a phase transition as we let δ decrease to 0 is asserted by Lemma 4.3 and Theorem 4.4. If the integrand satisfies the hypotheses of Lemma 4.3 for some δ 0 , then for all 0 < δ < δ 0 , the integrand will satisfy the hypothesis of Theorem 4.4, from which we can conclude that the unique maximizer is the uniform measure. For the Euclidean integral, we have δ 0 = 2, while for the geodesic integral, we have δ 0 = 1.