Column normalization of a random measurement matrix

In this note we answer a question of G. Lecu\'{e}, by showing that column normalization of a random matrix with iid entries need not lead to good sparse recovery properties, even if the generating random variable has a reasonable moment growth. Specifically, for every $2 \leq p \leq c_1\log d$ we construct a random vector $X \in R^d$ with iid, mean-zero, variance $1$ coordinates, that satisfies $\sup_{t \in S^{d-1}} \|\|_{L_q} \leq c_2\sqrt{q}$ for every $2\leq q \leq p$. We show that if $m \leq c_3\sqrt{p}d^{1/p}$ and $\tilde{\Gamma}:R^d \to R^m$ is the column-normalized matrix generated by $m$ independent copies of $X$, then with probability at least $1-2\exp(-c_4m)$, $\tilde{\Gamma}$ does not satisfy the exact reconstruction property of order $2$.


Introduction
Sparse Recovery is one of the most important research topics in modern signal processing. It focuses on the possibility of identifying a sparse signali.e., a signal that is supported on relatively few coordinates in R d relative to the standard basis-using linear measurements. We refer the reader to the books [2,3] for extensive surveys on sparse recovery and related topics.
In a basic sparse recovery problem one pre-selects an m×d matrix Γ that generates the given data. For an unknown (sparse) vector v, the coordinates of the vector Γv are the m linear measurements of v one may use for recovery. The hope is that for a well chosen Γ, the resulting m linear measurements would be enough to identify v, and because v is sparse, the number of measurements required for recovery should be significantly smaller than the dimension d.
One of the main achievements of the theory of sparse recovery was the introduction of a convex optimization problem called basis pursuit, which is an effective recovery procedure: it selects t ∈ R d that solves the minimization problem min t 1 subject to Γv = Γt, (1.1) where we denote by x p = ( d j=1 |x j | p ) 1/p . Extensive effort has been devoted to the question of finding conditions on the measurement matrix Γ that ensure the recovery of any sparse vector; more accurately, one would like to guarantee that for every s-sparse vector v, the ℓ 1 minimization problem (1.1) has a unique solution-v itself. Because measurements are 'expensive', one would like to find matrices Γ that satisfy the exact recovery property of order s with the smallest number of measurements (rows) possible. One may show that if Γ satisfies the exact reconstruction property of order s, then it must have at least m ∼ s log(ed/s) rows. Moreover, typical realizations of various random matrices with ∼ s log(ed/s) rows indeed satisfy the exact reconstruction property of order s (see, e.g., [3]). Therefore, the optimal number of measurements required for the exact reconstruction property of order s is m ∼ s log(ed/s), and that number serves as the benchmark for an optimal measurement matrix.
The question we are interested in has to do with the normalization of the columns of the measurement matrix. It is often assumed in literature that the columns of Γ have unit Euclidean norm (see, for example, [2] and [3] and references therein); i.e., if {e 1 , ..., e d } is the standard basis in R d then Γe j 2 = 1 for 1 ≤ j ≤ d. Column normalization appears frequently in various notions used in the study of the exact reconstruction property. Among these well-studied notions are coherence [3]; the restricted eigenvalues condition [1]; and the compatibility condition [2]. Moreover, in many real-world applications, measurement matrices with normalized columns tends to perform better than matrices whose columns have not been normalized.
While column normalization seems a natural idea, it adds substantial technical difficulties when studying random measurement matrices: normalizing the columns of a matrix with independent rows introduces additional dependencies. Despite the added difficulties, the results of [5] highlight the possibility that column normalization may still have a significant role to play in the context of random measurement matrices, particularly in heavy-tailed situations.
To formulate the results of [5] and explain their connection to columnnormalization we need the following definition: . Also, we denote by X = (x j ) d j=1 a vector with d independent copies of x; thus the rows of Γ are m independent copies of X.
The following result from [5] is a construction of random matrices generated by seemingly nice random variables, but despite that the matrices exhibit poor reconstruction properties. Theorem 1.3 There exist absolute constants c 1 , c 2 and c 3 for which the following holds. For every 2 < p ≤ c 1 log d there is a mean-zero, variance one random variable x that satisfies • For every 2 ≤ q ≤ p and every t ∈ S d−1 , • If m ≤ c 3 √ p(d/ log d) 1/p then with probability 1/2, Γ does not satisfy the exact reconstruction property of order 1. Theorem 1.3 implies that without assuming that each X, t has a subgaussian moment growth 1 up to the p-moment for p close to log d, the resulting measurement matrix is suboptimal. Indeed, under a modest assumption, say that X, t L 4 ≤ c X, t L 2 for every t ∈ R d , the recovery of 1-sparse vectors requires at least (d/ log d) 1/4 measurements. And, if p = (log d)/(β log log d) for β large enough, then the number of measurements required for the recovery of 1-sparse vectors is at least ∼ log cβ d, which is suboptimal when cβ > 1.
To put Theorem 1.3 in some perspective, it is complemented by a positive result, once linear forms have enough subgaussian moments. Theorem 1.4 Let x be a mean-zero, variance one random variable. Assume that for every 2 ≤ q ≤ c 4 log d and every t ∈ S d−1 , It follows from Theorem 1.4 that if X has a slightly better moment growth condition than in Theorem 1.3-a subgaussian growth up to p ∼ log dthe random measurement matrix generated by x satisfies the exact reconstruction property of order s, for the optimal number of measurements m ∼ s log(ed/s).
The connection with column-normalization arises from the main observation used in the proof of Theorem 1.4: and s 1 = ⌊α 2 (s − 1)/(4β 2 )⌋ − 1, then Γ satisfies the exact reconstruction property of order s 1 . Lemma 1.5 gives a clear motivation for considering column-normalized random measurement matrices, and that motivation grows stronger when taking into account the proof of Theorem 1.4. It turns out that the 'bottleneck' in the proof is the upper bound on max 1≤j≤d Γe j 2 , while guaranteeing (a) requires a rather minimal small-ball assumption. Therefore, the seemingly more restrictive condition (a) is almost universally true (see [7,5] for more details) and (b) is the only place in which the moment growth assumption is used in the proof of Theorem 1.4.
Clearly, column normalization resolves the issue of an upper estimate on max 1≤j≤d Γe j 2 . That, and the fact that (a) is true under minimal assumptions has led G. Lecué [4] to ask whether with column normalization, the moment growth condition (1.2) can be relaxed significantly, leading to a much stronger version of Theorem 1.4. Question 1.6 Let x be a mean-zero, variance 1 random variable, set Γ to be the m × d matrix generated by x and letΓ be the column-normalized matrix generated by x. Thus, the entries ofΓ arẽ If X, t L 4 ≤ L X, t L 2 for every t ∈ R d and m = c(L)s log(ed/s), does Γ satisfy the exact reconstruction property of order s with high probability?
Our main result is a version of Theorem 1.3 for a column-normalized matrix generated by well chosen random variable, showing that the answer to question 1.6 is negative. Theorem 1.7 There exist absolute constants c 1 , c 2 and c 3 for which the following holds. For every 2 ≤ p ≤ log d there is a symmetric, variance 1 random variable x with the following properties: • If x 1 , ..., x d are independent copies of x and X = (x j ) d j=1 , then for every t ∈ S d−1 and every 2 ≤ q ≤ p, X, t Lq ≤ c 1 √ q X, t L 2 .
• If m ≤ c 2 √ pd 1/p , then with probability at least 1 − 2 exp(−c 3 m), the m × d column-normalized matrix generated by x does not satisfy the exact reconstruction property of order 2. Theorem 1.7 answers Question 1.6 in the negative: column normalization does not improve the poor behaviour described in Theorem 1.3. Indeed, for p = 4, linear forms X, t satisfy an L 2 − L 4 norm equivalence, but the recovery of 2-sparse vectors usingΓ requires at least m ∼ d 1/4 measurements -significantly larger than the optimal number of measurements, m ∼ log d. Moreover, if β > 1 and p = (log d)/β log log d, then although X, t Lq √ q X, t L 2 for every 2 ≤ q ≤ p, the recovery of 2-sparse vectors using Γ requires at least m ∼ log cβ d measurements, which, again, is suboptimal when cβ > 1.  2 Proof of Theorem 1.7 Let ε be a symmetric, {−1, 1}-valued random variable, set η to be a {0, 1}valued random variable with mean δ and let R > 0; the values of δ and R will be specified later. Let x = ε · max{1, ηR}, let x 1 , ..., x d be independent copies of x and set X = (x 1 , ..., x d ).
Let us identify conditions under which X satisfies the first part of Theorem 1.7.

Lemma 2.1
There exists an absolute constant c 0 for which the following holds. Assume that δ < 1/2 and that there is L ≥ 1 such that for every 2 ≤ q ≤ p, Rδ 1/q ≤ L √ q. Then for every t ∈ R d and every 2 ≤ q ≤ p, Moreover, for every t ∈ R d , X, t L 2 = c 1 t 2 , and 1/ √ 2 ≤ c 1 ≤ 2L. In particular, X/c 1 is an isotropic random vector and for every t ∈ R d , X, t exhibits a c 0 L-subgaussian moment growth up to the p-th moment.
The proof of Lemma 2.1 is based on a simple comparison argument: Lemma 2.2 Let x 1 , ..., x d be centred, independent random variables and assume z 1 , ..., z d are also centred and independent. If p is even and for every 1 ≤ j ≤ d and every 1 ≤ q ≤ p, x i Lq ≤ L z i Lq , then for every t ∈ R d , Proof. By a standard symmetrization argument we may also assume that z 1 , ..., z d and x 1 , ..., x d are symmetric. Therefore, Proof of Lemma 2.1. Observe that x = ε max{1, Rη} is mean-zero and that Ex 2 = 1 · (1 − δ) + R 2 δ. Hence, if δ ≤ 1/2 and R 2 δ ≤ 2L 2 then 1/2 ≤ Ex 2 ≤ 4L 2 -and the "moreover" part of the claim follows.
Turning to the first part of the claim, let x 1 , ..., x d be independent copies of x, set g to be a standard gaussian random variable and let g 1 , ..., g d be independent copies of g. Recall that for every 2 ≤ q ≤ p, Rδ 1/q ≤ L √ q, and observe that Therefore, (x 1 , ..., x d ) and (g 1 , ..., g d ) satisfy the conditions of Lemma 2.2 with a constant c 1 L. Applying Lemma 2.2, it follows that for every t ∈ S d−1 and every 2 ≤ q ≤ p, The key part in the construction is the following lemma which describes the typical structure of the matrix generated by x, (1) there are indices j 1 = j 2 ∈ {1, ..., d} and 1 ≤ ℓ ≤ m such that η ℓj 1 = η ℓj 2 = 1 and for i = ℓ, η ℓj 1 = η ℓj 2 = 0; (2) there is a subset J ⊂ {1, ..., d} of cardinality |J| = 2m such that η ij = 0 for every j ∈ J and 1 ≤ i ≤ m;

Corollary 2.4 If Γ satisfies Lemma 2.3 then its column-normalized versioñ Γ does not satisfy the exact reconstruction property of order 2.
Proof. Using the notation of Lemma 2.3 and by its first part, Γe j 1 2 = Γe j 2 2 = (R 2 + m − 1) 1/2 ; hence, if we denote by {f 1 , ..., f m } the standard basis of R m ,Γ If ε ℓj 1 = ε ℓj 2 set v = (e j 1 + e j 2 )/2; otherwise, set v = (e j 1 − e j 2 )/2. In either case, v is 2-sparse. Let w =Γv and observe that the coordinates of w satisfy that w ℓ = 0 and w 2 i ≤ Next, let J be the set of coordinates given by the second part of Lemma 2.3. Clearly, j 1 , j 2 ∈ J and is an m × 2m Bernoulli matrix. Therefore, Observe thatΓB J 1 =Γ J B J 1 and by the third part of for an absolute constant c. Observe that EY = mδ(1 − δ) m−1 and that if Y 1 , ..., Y d are independent copies of Y and EY ≥ 2m/d then with probability at least 1 − 2 exp(−c 1 m), |{i : Y i = 1}| > m. In particular, on that event, the matrix (η ij ) 1≤i≤m,1≤j≤d has at least two identical columns, each with a single entry of 1. Therefore, the first part of Lemma 2.3 holds if For the second part of the lemma, let Z be the indictor of the event η i = 0 for every 1 ≤ i ≤ m and note that EZ = (1 − δ) m . If Z 1 , ..., Z d are independent copies of Z and EZ ≥ 4m/d then with probability at least 1 − 2 exp(−c 2 m), |{i : then with probability at least 1 − 2 exp(−c 2 m), there is J ⊂ {1, ..., d} and for every j ∈ J and every 1 ≤ i ≤ m, η ij = 0.
Turning to the third part of the lemma, and by applying the second part, we have that for (i, j) ∈ {1, ..., m} × J, x ij = ε ij . Let Γ J = (ε ij ) 1≤i≤m,j∈J and recall that (ε ij ) are independent of (η ij ). Therefore, by It follows from Corollary 2.4 that with probability at least 1 − 2 exp(−c 3 m), the column-normalized matrixΓ generated by x = ε max{1, Rη} does not satisfy the exact reconstruction property of order 2. To complete that proof, all that remains is to show that x also satisfies the assumptions of Lemma 2.1: that Rδ 1/q ≤ L √ q for every 2 ≤ q ≤ p and for an absolute constant L. To that end, let φ(x) = √ x(d/c 1 ) 1/x and observe that φ(x) is decreasing when 2 ≤ x ≤ 2 log(d/c 1 ); hence, φ(p)/φ(q) ≤ 1 for every 2 ≤ q ≤ p as long as p ≤ 2 log(d/c 1 ). Therefore, if we set L = c 1 then Rδ 1/q ≤ √ q for every q ≤ p, as required.
A Proof of Theorem 1.9 The proof is a direct consequence of the argument used in the proof of Theorem 1.4. Thanks to column normalization,Γ satisfies (b) in Lemma 1.5 for β = 1. All that is left to verify is (a) for α which is a constant that depends only on L.
The proof of Theorem 1.4 shows that if Γ has m ≥ c 1 (L)s log(ed/s) independent rows that are distributed as X then with probability at least and (a) from Lemma 1.5 is verified for the matrixΓ for α = c 6 (L).