Random sections of $\ell_p$-ellipsoids, optimal recovery and Gelfand numbers of diagonal operators

We study the circumradius of a random section of an $\ell_p$-ellipsoid, $0<p\le \infty$, and compare it with the minimal circumradius over all sections with subspaces of the same codimension. Our main result is an upper bound for random sections, which we prove using techniques from asymptotic geometric analysis if $1\leq p \leq \infty$ and compressed sensing if $0<p \leq 1$. This can be interpreted as a bound on the quality of random (Gaussian) information for the recovery of vectors from an $\ell_p$-ellipsoid for which the radius of optimal information is given by the Gelfand numbers of a diagonal operator. In the case where the semiaxes decay polynomially and $1\le p\le \infty$, we conjecture that, as the amount of information increases, the radius of random information either decays like the radius of optimal information or is bounded from below by a constant, depending on whether the exponent of decay is larger than the critical value $1-\frac{1}{p}$ or not. If $1\leq p\leq 2$, we prove this conjecture by providing a matching lower bound. This extends the recent work of Hinrichs et al. [Random sections of ellipsoids and the power of random information, Trans. Amer. Math. Soc., 2021+] for the case $p=2$.


Introduction, motivation and main results
The diameter of a section of a symmetric convex body K ⊂ R m with a (random) subspace has been an object of interest at least since the study of Gelfand numbers of operators between finite-dimensional Banach spaces [6,9,10,15,17,18,29,30].These numbers measure the smallest circumradius of the intersection with a subspace of a fixed (co)dimension.
Often we are not in a position to exhibit optimal subspaces and thus it seems reasonable to first try to understand intersections with typical subspaces.Along this way we are led to the study of the diameter of intersections with random subspaces which are uniformly distributed on the Grassmannian manifold with respect to the Haar probability measure.
Connected to the field of asymptotic geometric analysis, there is a large body of work on this topic initiated by Giannopoulos and V. D. Milman in [7,8] with particular focus on subspace dimension proportional to the dimension of the body (see also [26]).In subsequent work Litvak, Pajor and Tomczak-Jaegermann [22] have shown that on the scale of proportional subspaces typical intersections are not much larger than minimal intersections.It is important to note that, as pointed out in [7,Example 2.2], one cannot expect these bounds to be sharp in full generality, in particular not for ellipsoids with highly incomparable semi-axes.
Mendelson, Pajor and Tomczak-Jaegermann [23] studied the intimately related problem of approximate reconstruction of vectors from a symmetric convex body K ⊂ R m using random Gaussian measurements.Approximation using random information underlies the success of the field of compressed sensing, dealing with the reconstruction of sparse vectors (see, e.g., [3,5]).Somewhat related is the approximation of functions using samples at random points, which is studied in the context of learning theory (see, e.g., the book [24]) and information-based complexity [13,14].It is the latter, specifically the work [14], that serves as further motivation for this paper.There the effectiveness of Gaussian information for recovering vectors in an ellipsoid in the Euclidean norm has been studied, which is related to the approximation of functions with decaying generalized Fourier coefficients, e.g., Korobov spaces [20].In a more geometric parlance, the main result of [14] is that the equivalent problem of determining whether the circumradius of a random section of the ellipsoid is close to minimal has a positive solution depending on the square-summability of its semiaxes.In a nutshell, we seek to extend these results and study random sections of generalized ℓ p -ellipsoids.Such ellipsoids have also been studied, for instance, in [16] with focus on the asymptotic volume distribution of sections of such ellipsoids as the dimension of the underlying space tends to infinity, and in [32] where it is shown that such ellipsoids are examples where Dudley's integral bound for Gaussian processes is not sharp.

Radii of random sections and optimal recovery
We aim at understanding the circumradius, or equivalently the diameter, of random sections of generalized ellipsoids and, in particular, whether it is comparable to the minimal circumradius of all sections of the same dimension or not.Given 0 < p ≤ ∞ our object of interest is the ℓ p -ellipsoid where Note that for 0 < p < 1 the set E m p,σ is not convex but still the unit ball of a quasi-normed space.Before we present our results, we introduce the closely related problem of recovery using linear information.
Assume we want to learn an unknown x ∈ R m given the information that x ∈ E m p,σ , that is, we have some control over the decay of the coordinates of x.Further, suppose we are given the linear information N n x ∈ R n , where N n ∈ R n×m and n (< m) can be considerably smaller than m.The best we can do using the given knowledge about x can be measured by the worst-case error (also known as the radius of the information where the recovery mapping ϕ : R n → R m can be an arbitrary mapping allowed to depend on N n .The abuse of notation will be justified in a moment.It follows from elementary results (see, e.g., [25,Lemma 4 where the kernel ker N n is an (m − n)-dimensional subspace of R m and so of codimension n, i.e., it belongs to the Grassmannian manifold G m,m−n , if we assume the rows of N n to be linearly independent.This is the reason we call rad(E m p,σ , N n ) the radius of the information holds up to a factor of 2, see, e.g., Lemma 3. Random information will be given by a random matrix G n,m ∈ R n×m with i.i.d.standard Gaussian entries.It follows from rotation invariance that the distribution of ker G n,m is equal to the Haar probability measure on the Grassmannian G m,m−n .Thus, we may define the circumradius of the intersection of E m p,σ with a random subspace of codimension n via the random quantity where of a typical intersection of E m p,σ by a random subspace?The special case p = 2 has been dealt with in [14], where it has been shown that holds with exponentially high probability (in n), where C ∈ (0, ∞) is an absolute constant.
Further, if the semiaxes σ = (σ j ) j∈N satisfy σ 2 = ∞, then, with an absolute constant with exponentially high probability, provided that m (> n) is large enough compared to n.
Since rad(E m 2,σ , n) = σ n+1 , there is a dichotomy for the usefulness of Gaussian information compared to optimal information or, in more geometric parlance, the circumradius of a random section compared to the minimal one.We seek to extend this result to the class of ℓ p -ellipsoids E m p,σ with p = 2.

Radii of random sections -main results
We are not able to answer the above question in full for the general case, in part due to the fact that the minimal radius rad(E m p,σ , n) is not known exactly for 1 ≤ p < 2, see the Appendix for more on optimal sections.This is one of the reasons why, apart from the following two theorems, we also present results for the important case of polynomially decaying semiaxes.
Here and in what follows, for two non-negative reals a α and b α depending on some parameter α from an index set I, we write a α b α , or equivalently b α a α , if there exists a constant C ∈ (0, ∞) such that a α ≤ C b α for all α ∈ I.If both a α b α and a α b α hold, we write a α ≍ b α .If the constant may depend on some parameter β, we shall write a α β b α and a α β b α instead or, if both estimates hold, a α ≍ β b α .As usual, given 1 ≤ p ≤ ∞, we shall denote the Hölder conjugate of p by p * so that 1 p + 1 p * = 1.The first result provides an upper bound on the radius of random information with high probability and is in the spirit of the results obtained in [14].
Theorem A. For all m ∈ N and 1 ≤ n < m, we have and k ≍ n for p = 1 while k ≍ n p * for p > 1.
The proof relies on a famous theorem of Gordon [11] on subspaces escaping through a mesh, for which we first need to control the mean width of 'rounded' versions of our ellipsoids arising from intersections with a Euclidean ball of a suitable radius.The idea of cutting away the peaky regions of a convex body in this way to obtain improved bounds on its mean width is well known, see for example [23,Section 2].Then, in order to bound this quantity for the ℓ p -ellipsoids, we adapt an approach already used in [14].There the main approach had been a random matrix one, but it seems this approach cannot be adapted to our situation without loosing something compared to Theorem A.
Remark 1. Theorem A extends the upper bound (2).It shows that if σ p * < ∞ we can expect the radius of random sections to decay at least as fast as n −1/2 .This also gives a bound on the minimal radius, see Corollary 3 formulated in terms of Gelfand numbers.
Remark 2. In the context of suprema of Gaussian processes we want to mention that ℓ p -ellipsoids with slowly decaying semiaxes are examples where Dudley's upper bound is loose.This has been observed, for instance, by van Handel in [32].For more information, we refer to the discussion at the end of Section 2, where we exhibit a bound also depending on p, which is not present in [32].
Employing methods commonly used in the field of compressed sensing, e.g., in a work of Foucart, Pajor, Rauhut and Ullrich [4] on the Gelfand widths of ℓ p -balls in the quasi-Banach regime 0 < p ≤ 1, we deduce the following upper bound for the radius of random information when 0 < p ≤ 1 and the semiaxes have polynomial decay.
In fact, we shall prove a slightly more general result, Theorem C below, which also yields new bounds on Gelfand numbers of diagonal operators in the quasi-Banach regime (see Corollary 4 in the Appendix).

The power of random information for polynomial decaydiscussion
In the following we discuss the consequences of our results for ellipsoids with polynomially decaying semiaxes σ j = j −λ , j ∈ N, for some λ > 0. It turns out that, at least when 1 ≤ p ≤ 2, we can show a dichotomy for the radius of a random section in comparison to the minimal section.Roughly speaking, we have the following equivalence, which will be made precise by the conjecture at the end of this subsection, To illustrate this, we first provide known results on the minimal radius rad(E m p,σ , n), which we deduce from results on Gelfand numbers of diagonal operators (see the Appendix for the latter).
Let 1 ≤ p ≤ ∞.The behavior of the minimal radius is known exactly when p ≥ 2 but can only be deduced up to subpolynomial factors when 1 ≤ p < 2. To make this precise, we define the rate of polynomial decay (in n) of an infinite array a = (a n,m ) m∈N,1≤n<m of real numbers by decay(a) := sup{̺ ≥ 0 : ∃C ∈ (0, ∞) with a n,m ≤ Cn −̺ for all m ∈ N and 1 ≤ n < m}.

Now, let 1
s := ( 1 2 − 1 p ) + := max{ 1 2 − 1 p , 0} and σ j = j −λ , j ∈ N, for some λ > 1 s .The minimal radius does not decay if λ ≤ 1 s .We have, see Corollary 5, We can now compare this with our bounds on the radii of random sections.Theorem A gives for the above choice of σ, for any λ > 1 p * , m ∈ N and 1 ≤ n < m, with probability 1 − c 1 exp(−c 2 n).This means, if 1 ≤ p ≤ ∞ and λ > 1 p * , then the polynomial decay rate of random information is, by (3), equal to Similar to the above, decay(rad(E m p,σ , G n,m )) is defined to be the supremum over all ̺ ≥ 0 such there exist Thus, the bound of Theorem A on the decay rate is optimal for λ > 1 p * .If λ ≤ 1 p * , however, it does not yield a useful result.Instead, we have a lower bound on the radius of random information, Proposition 1 below, which shows that if 1 < p ≤ 2 the radius of random information does not decay if m is large enough.Proposition 1.Let 1 < p ≤ 2 and σ j = j −λ , j ∈ N, for some λ with 0 < λ < 1 p * .Then, for any ε ∈ (0, 1), n ∈ N and m > n large enough, we have In other words, if 1 < p ≤ 2 and the semiaxes decay too slowly compared to 1 p * , random information is asymptotically as good as no information at all.Remark 3. The boundary case λ = 1 p * is not covered by Proposition 1.However, its statement remains true if σ j = j −1/p * a j , j ∈ N, with a j → ∞ as j → ∞.This can be deduced from Proposition 4 in Section 4, from which Proposition 1 follows.
We obtain the following corollary on the polynomial decay.
Above the line λ = 1 − 1 p , where 1 ≤ p ≤ ∞, we just deduced that Theorem A yields that random information is optimal up to an additional logarithmic factor if p = 1.
The decay rate is equal to λ + 1 p − 1 2 , see (5).As noted above, below and including the line λ = 1 2 − 1 p , where 2 ≤ p ≤ ∞, optimal information does not decay at all, in other words, information is useless and does not help to recover vectors.Geometrically, this corresponds to the fact that, no matter how large the codimension n(< m) of a subspace is, the section with E m p,σ has a radius bounded from below.In the square, that is for p ≥ 2 and λ ≤ 1 2 , it follows from Theorem 5 in [14] that, no matter how large we choose n, if m is large enough, then with high probability rad(E m p,σ , G n,m ) is bounded below by a constant.That is, decay(rad(E m p,σ , G n,m )) = 0 and so random information is useless.By Corollary 1 this also holds for the triangle given by 1 < p < 2 and 0 < λ < 1 − 1 p .Finally, on the right-hand side of the dashed line where p = 1, that is, where 0 < p < 1, Theorem B provides an upper bound with decay rate λ + 1 p − 1 2 , which depends on m.We do not have a corresponding lower bound for optimal information in this region.
We pose the following conjecture claiming that there is a threshold of decay separating regimes of completely different behavior of random information.
By the discussion prior to the conjecture, it is verified except for the two cases As a matter of fact, it seems reasonable to conjecture that decay(rad(E m p,σ , G n,m )) = decay(rad(E m p,σ , n)) as long as (σ j ) j∈N p * < ∞, while decay(rad(E m p,σ , G n,m )) = 0 whenever (σ j ) j∈N p * = ∞.We leave this as an open problem for future investigation.

Organization of the paper
We end this section with an overview of the remainder of this article.The proof of Theorem A is carried out in Section 2. Theorem B will be proved in Section 3, which also contains the necessary background on sparse approximation.Section 4 provides a proof of Proposition 1. Finally, in the Appendix we present known results on the optimal radius rad(E m p,σ , n) which are deduced via Gelfand numbers of diagonal operators.
2 An upper bound via an M * -estimate -the case In this section, we will prove Theorem A. Our approach is based on estimates on the mean width of the intersection of the ℓ p -ellipsoid E m p,σ with a Euclidean ball, which we obtain using Gordon's M * -estimate.

An
Let K ⊂ R m be a convex body and h K : S m−1 → R, u → sup y∈K u, y be its support function.The (half) mean width of K is given by where S m−1 := {x ∈ R m : x 2 = 1} is the Euclidean unit sphere and σ m−1 the normalized surface measure on it.Let g 1 , g 2 , . . .be independent standard Gaussian random variables.
Then it is known that the mean width can be expressed through the expected supremum of a suitable Gaussian process (see, e.g., [1, Lemma 9.1.3]),namely, where c m ≍ √ m.We shall use Gordon's theorem on subspaces escaping through a mesh [11] in the form stated in [1, Theorem 9.3.8]with γ = 1 2 there.
Proposition 2. Let K ⊂ R m be a convex body containing the origin in its interior.For any 1 ≤ n < m there exists a subset of the Grassmannian G m,m−n with Haar measure at least 1 − 7 2 exp(− 1 72 a 2 n ) such that for any subspace E n in this set and all x ∈ K ∩ E n we have where, for each k ∈ N, We first bound M * (E m p,σ ∩ ̺B m 2 ), where ̺ > 0 will be chosen suitably later, and then apply Proposition 2 to E m p,σ ∩ ̺B m 2 and translate the result to our setting.First, we present an elementary estimate for ℓ q -norms of structured Gaussian random vectors.
j=1 with independent standard Gaussian random variables g 1 , . . ., g k , then Further, where Proof.For 1 ≤ q < ∞ the upper bound follows from Jensen's inequality and the lower bound follows from , where X ′ = (|b j g j |) k j=1 .The asymptotics for q = ∞ are taken from [31, Lemmas 2.3 and 2.4].
We will combine this with (6) to estimate the mean width of the intersection as stated in the following proposition.A similar approach was used in [12] for ℓ p -balls.Proposition 3. Let m ∈ N.For any 0 ≤ k < m and ̺ > 0, Proof.We shall use the representation (6) and first bound the supremum.For all x ∈ R m and y ∈ E m p,σ ∩ ̺B m 2 , it follows from Hölder's inequality that, for every 0 where the first sum is empty if k = 0. Combining this estimate with Lemma 1, we obtain By the previously stated asympotics for a k in (7) and γ p * in Lemma 1, we obtain the statement for p > 1.
If p = 1, then we deduce from Lemma 1 that, for some suitable C ∈ (0, ∞), This completes the proof.

The proof of Theorem A
With the M * -estimates on rounded versions of our ellipsoids from the previous subsection, we are now ready to prove the upper bound on the radius of random information.
Proof of Theorem A. It follows from Gordon's M * -estimate (Proposition 2) applied to the convex body E m p,σ ∩ ρB m 2 that, with probability as claimed, a random subspace E n of codimension n chosen uniformly according to the Haar probability on G m,m−n satisfies We start with the case p > 1. Inserting the bound obtained in Proposition 3, we obtain a constant C ∈ (0, ∞) such that, for any 0 ≤ k < m and 1 ≤ n < m, and so in particular that rad(E m p,σ , E n ) < ̺ for all m ∈ N and 1 ≤ n < m.The latter is so because a set which has circumradius smaller than ̺ when intersected with ̺B m 2 must necessarily have itself circumradius smaller than ̺.Noting that the kernel of a Gaussian random matrix in R n×m is uniformly distributed on the Grassmannian G m,m−n and that In both cases, k + 1 ≍ n p * .The proof for p = 1 is carried out analogously.
We conclude this section by stating a bound on the supremum of a Gaussian process indexed by vectors in an ℓ p -ellipsoid.The result can be read off from the proof of Proposition 3. In view of the dependence on the parameter p, it improves upon a bound of van Handel in [32].
Corollary 2. For all m ∈ N, we have In [32] van Handel deduced this result for 1 ≤ p < ∞ with an unspecified constant from the majorizing measure theorem and noted in [32,Remark 3.4] that his approach is not sufficiently accurate to recover the correct behavior in p.In Corollary 2, we obtain an upper bound on the behavior in p and thus complement his result.
Employing estimates for entropy numbers of diagonal operators (see, e.g., [32]), it can be deduced from Corollary 2 that the ellipsoid E m p,σ with semiaxes satisfying is an example, where Dudley's bound fails to be sharp if the dimension becomes large.
Here, ℓ p,q is a Lorentz space as defined in the Appendix.
3 An upper bound via compressed sensing techniques -the case 0 < p < 1 In this section we prove Theorem B using techniques from compressed sensing in the spirit of Foucart, Pajor, Rauhut and Ullrich [4] who have given upper and lower bounds for the Gelfand widths of ℓ p -balls in ℓ q with 0 < p ≤ 1 and p < q ≤ 2. They build upon work by Donoho [3] and others.Our proof is an extension to ℓ p -ellipsoids.Before we present it, we shall explain some of the relevant concepts used in compressed sensing for the recovery of sparse vectors.We refer the reader to the monograph [5] for more information.Note that ∆ p is a mapping from R n to R m which depends on N n .If the matrix N n satisfies the restricted isometry property with a small restricted isometry constant δ 2s (N n ) of order 2s, which is the smallest δ > 0 such that for all 2s-sparse x ∈ R m , then s-sparse vectors x ∈ R m can be recovered exactly, i.e., x = ∆ p (N n x).It is widely known that Gaussian matrices satisfy this with high probability.See, for example, Theorem 9.2 in [5], which we adapt in the following lemma.
Lemma 2. For every δ ∈ (0, 1) there exist constants We will prove a more general version of Theorem B, where q will be allowed to be smaller than 2. To this end, we introduce a notation for the radius of a section of E m p,σ measured in the ℓ q -(quasi-)norm, 0 < q ≤ ∞.Given any subspace E n of R m with codimension n, we define The following extension of the equality (1) to the quasi-Banach space setting will be useful.It is the analogue of [4, Proposition 1.2] for individual matrices/subspaces.For convenience we provide a short proof.
for all N n ∈ R n×m , where rad X (K, E) := sup x∈K∩E x X for any set E ⊂ R m and the infimum runs over all mappings ϕ : R n → R m .
Proof.For the lower bound take ϕ arbitrary.For any x ∈ K ∩ker N n also −x ∈ K ∩ker N n and holds.Moreover, the symmetry of Together with (8) this proves the lower bound.
For the upper bound we specify a map ϕ by ϕ(y) = z for any y ∈ N n (K), where z ∈ K with N n z = y is arbitrary.Then We will use this together with the following lemma on best sparse approximation of vectors in an ℓ p -ellipsoid.For ℓ q -approximation of vectors in ℓ p -balls by sparse vectors it is known that, for 0 < p ≤ q ≤ ∞, for all m ∈ N and 1 ≤ s ≤ m (see, e.g., [33]).If p = q, the approximation error cannot be expected to decay, whereas for ℓ p -ellipsoids we have the following lemma for the special case of polynomially decaying σ.The proof is an adaption of the proof for ℓ p -balls.

Inserting this bound above yields
The lower bound is achieved by a vector on the boundary of E m p,σ having its support on the first 2s coordinates and equal entries on these.

The proof of Theorem B
With the results of the previous subsection at our disposal, we are now prepared to prove the following generalization of Theorem B.
Then there exist constants C, D ∈ (0, ∞) such that, for all m ∈ N and all 1 ≤ n < m with probability at least 1 − 2 exp(−Cn).
Proof.Let m ∈ N, 1 ≤ n < m and N n := n −1/2 G n,m .By 3, for all realizations, where we specified ϕ(y) = ∆ p (n −1/2 y) = arg min z p subject to N n z = n −1/2 y for We follow the proof of [4, Theorem 3. It follows, see (3.5) and (3.6) in [4], that there exists a constant C ∈ (0, ∞) such that with the same probability, sup With Lemma 4 the proof is complete if we can show that s ≤ m/2.Indeed, since the If n is too small for Theorem C to apply, that is, n < D log(em/n), we only have the trivial pointwise bound 4 A lower bound -the case 1 < p ≤ 2 We use the following lemma from [14,Lemma 25] to prove the lower bound of Proposition 1 for slowly decaying semiaxes in the case of 1 < p ≤ 2.
Lemma 5.For any ε ∈ (0, 1) it holds that, for all m ∈ N and 1 ≤ n < m, From this we can now deduce the lower bound as presented in Proposition 1.We prove a slightly more general bound holding not just for polynomially decaying semiaxes.
Plugging in semiaxes of polynomial decay then proves Proposition 1.
Proposition 4. Let 1 < p ≤ 2.Then, for any ε ∈ (0, 1) and all m ∈ N and 1 ≤ n < m with n ≤ εσ 2 m m 2/p * , we have Proof.By Lemma 5, with probability at least 1 − ε, we find x ∈ R m with We estimate and by means of Hölder's inequality, we obtain In this case, we can normalize such that x := x/(1 + 1 σ 1 ) satisfies x ∈ E m p,σ , G n,m x = 0, and which completes the proof.
Appendix -Gelfand numbers of diagonal operators, optimal radius, and polynomial semiaxes Let 0 < p ≤ ∞.We write ℓ p for the space of p-summable sequences and denote its (quasi-)norm by • p .For 0 < p, t ≤ ∞, we define a Lorentz (quasi-)norm by x p,t := j 1/p−1/t x * j t , where (x * j ) j∈N is the non-increasing rearrangement of (|x j |) j∈N with the convention that 1/∞ := 0. We write ℓ p,t for the space of sequences with finite Lorentz (quasi-)norm • p,t .Note that ℓ p,p = ℓ p and ℓ p,t ⊂ ℓ r,t for every 0 < p < r ≤ ∞.
Let 0 < q ≤ ∞ and σ = (σ j ) j∈N be a non-increasing non-negative sequence, i.e., σ 1 ≥ σ 2 ≥ • • • ≥ 0. To σ we can associate the diagonal operator D σ : ℓ p → ℓ q , x = (x j ) j∈N → (σ j x j ) j∈N , which, for any m ∈ N, can be considered as an operator from ℓ m p to ℓ m q .Then the image D σ (B m p ) = E m p,σ is an ℓ p -ellipsoid.Let 1 ≤ n < m and consider an information mapping We have that rad where is the (n + 1)-st Gelfand number of D σ : ℓ m p → ℓ m q .Here, the infimum ranges over all subspaces of R m with codimension at most n.For general background on Gelfand numbers and other s-numbers, we refer the reader to [19] and [27].
Although we will need only the case q = 2, it is natural to state the following result in a more general form, which can be found in [27,Section 11.11] for q ≥ 1 but the proof is in fact also valid for all q > 0.
, where 1 r = 1 q − 1 p if q < p and r = ∞ if q = p.
In addition to Proposition 5, we have for all 0 < p, q ≤ ∞ that , where 1 r = ( 1 q − 1 p ) + .This shows that σ r < ∞ is necessary to ensure that the operators D σ : ℓ m p → ℓ m q , m ∈ N, are uniformly bounded.All of the above extends to the infinitedimensional case in a canonical way.We state a result taken from Buchmann [2], where one implication goes back to Linde [21,Theorem 5].Proposition 6.Let 1 ≤ p, q ≤ ∞ and r > 0 with 1 r > ( 1 q − 1 p ) + as well as 0 < t ≤ ∞.Then By means of (10), Propositions 5 and 6 apply to the radius of optimal information.
Note that some cases are missing, for example if q = 2, there is a gap for 1 r = 1 p * .In this case, we can deduce from an infinite-dimensional version of Theorem A the following corollary.Let us note that bounding the Gelfand numbers of operators into ℓ 2 via M * -estimates has been done before, e.g., in [26].
To the best of our knowledge, for 0 < p < 1 or 0 < q < 1 the asymptotic behavior of Gelfand numbers of diagonal operators is unknown.At least for the case of polynomial sequences, we can deduce the following result from Theorem C and the analogue of (10) for 0 < q < 2.
To study the decay of Gelfand numbers of diagonal operators arising from a polynomially decaying sequence, the concept of a diagonal limit order has been introduced by Pietsch (see, e.g., [28, 6.2.5.3]).The definition of decay given in Section 1.3 is basically a finite-dimensional analogue of it.As a corollary to Proposition 6, we have the following result.: otherwise.
For p ≥ 2 this also follows from Proposition 5 showing that rad(E m p,σ , n) p,λ n −λ+1/2−1/p for all m > n with a matching lower bound for m, say, larger than 2n.
Proof.We only prove the first case since the other case is analogous.To show that decay(rad(E m p,σ , n)) ≥ λp * /2, it is sufficient by (10) to find C ∈ (0, ∞) such that, for all ̺ < λp * /2 large enough, c n,m := c n (D σ : ℓ m p → ℓ m q ) ≤ Cn −̺ for all m ∈ N and 1 ≤ n ≤ m.
This is satisfied if the sequence of Gelfand numbers c n (D σ : ℓ p → ℓ q ) ≥ c n,m (n ∈ N) belongs to ℓ u,∞ with u = 1/̺.By Proposition 6, this holds if σ ∈ ℓ r,∞ with a certain r > 1/λ, which is true by assumption.
For the other inequality we assume that (11) holds for some ̺ > λp * /2.Choosing m large enough compared to n, see the proof of Corollary 3, we deduce from (11) that for some ̺ > λp * /2 and every n ∈ N, For every r < 1/λ and 0 < t < ∞, it follows from σ ∈ ℓ r,t and Proposition 6 that

3. 1
Elements from compressed sensing and bounds on the best s-term approximation Let m, s ∈ N with 1 ≤ s ≤ m and let 0 < p ≤ 1.A vector z ∈ R m is called s-sparse if at most s of its coordinates are non-zero.The error of best s-term approximation of x ∈ R m in the ℓ p -(quasi-)norm is σ s (x) p := inf{ x − z p : z is s-sparse}.Given the information N n x = y, where N n ∈ R n×m , sparse vectors can be reconstructed via ℓ p -minimization, that is, ∆ p (y) := arg min z p subject to N n z = y.

Remark 4 .Remark 5 .
Let us note that the proof does not work in the case where p > 1 as can already be seen in[4, Theorem 3.2].Moreover, in the case p = 1 the bound derived from Theorem C is worse than the bound given by Theorem A already for m n 2 , i.e., for small codimension.Nonetheless, if m is proportional to n, the bound from Theorem C improves upon (4) obtained from Theorem A. Theorem C provides a bound on Gelfand numbers of diagonal operators in the quasi-Banach regime, see Corollary 4 in the Appendix.